public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax
@ 2023-11-24  7:02 Cui, Lili
  2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
                   ` (10 more replies)
  0 siblings, 11 replies; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu

Make const_1_mode print $1 in AT&T syntax, otherwise
there will be correctness issues when it is extended
to support APX NDD,

gas/ChangeLog:

        * testsuite/gas/i386/intel.d: Adjust testcase.
        * testsuite/gas/i386/lfence-load.d: Ditto.
        * testsuite/gas/i386/noreg16-data32.d: Ditto.
        * testsuite/gas/i386/noreg16.d: Ditto.
        * testsuite/gas/i386/noreg32-data16.d: Ditto.
        * testsuite/gas/i386/noreg32.d: Ditto.
        * testsuite/gas/i386/noreg64-data16.d: Ditto.
        * testsuite/gas/i386/noreg64-rex64.d: Ditto.
        * testsuite/gas/i386/noreg64.d: Ditto.
        * testsuite/gas/i386/opcode-suffix.d: Ditto.
        * testsuite/gas/i386/opcode.d: Ditto.
        * testsuite/gas/i386/x86-64-lfence-load.d: Ditto.
        * testsuite/gas/i386/x86-64-opcode.d: Ditto.

opcodes/ChangeLog:

        * i386-dis.c (OP_I): Make const_1_mode print $1 in AT&T syntax.
---
 gas/testsuite/gas/i386/intel.d              |  6 ++--
 gas/testsuite/gas/i386/lfence-load.d        |  2 +-
 gas/testsuite/gas/i386/noreg16-data32.d     | 32 ++++++++++-----------
 gas/testsuite/gas/i386/noreg16.d            | 32 ++++++++++-----------
 gas/testsuite/gas/i386/noreg32-data16.d     | 32 ++++++++++-----------
 gas/testsuite/gas/i386/noreg32.d            | 32 ++++++++++-----------
 gas/testsuite/gas/i386/noreg64-data16.d     | 32 ++++++++++-----------
 gas/testsuite/gas/i386/noreg64-rex64.d      | 32 ++++++++++-----------
 gas/testsuite/gas/i386/noreg64.d            | 32 ++++++++++-----------
 gas/testsuite/gas/i386/opcode-suffix.d      |  6 ++--
 gas/testsuite/gas/i386/opcode.d             | 10 +++----
 gas/testsuite/gas/i386/x86-64-lfence-load.d |  2 +-
 gas/testsuite/gas/i386/x86-64-opcode.d      |  6 ++--
 opcodes/i386-dis.c                          |  2 ++
 14 files changed, 130 insertions(+), 128 deletions(-)

diff --git a/gas/testsuite/gas/i386/intel.d b/gas/testsuite/gas/i386/intel.d
index bc212893853..c3e45c2e38c 100644
--- a/gas/testsuite/gas/i386/intel.d
+++ b/gas/testsuite/gas/i386/intel.d
@@ -208,8 +208,8 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	cd 90 [ 	]*int    \$0x90
 [ 	]*[a-f0-9]+:	ce [ 	]*into
 [ 	]*[a-f0-9]+:	cf [ 	]*iret
-[ 	]*[a-f0-9]+:	d0 90 90 90 90 90 [ 	]*rclb   -0x6f6f6f70\(%eax\)
-[ 	]*[a-f0-9]+:	d1 90 90 90 90 90 [ 	]*rcll   -0x6f6f6f70\(%eax\)
+[ 	]*[a-f0-9]+:	d0 90 90 90 90 90 [ 	]*rclb   \$1,-0x6f6f6f70\(%eax\)
+[ 	]*[a-f0-9]+:	d1 90 90 90 90 90 [ 	]*rcll   \$1,-0x6f6f6f70\(%eax\)
 [ 	]*[a-f0-9]+:	d2 90 90 90 90 90 [ 	]*rclb   %cl,-0x6f6f6f70\(%eax\)
 [ 	]*[a-f0-9]+:	d3 90 90 90 90 90 [ 	]*rcll   %cl,-0x6f6f6f70\(%eax\)
 [ 	]*[a-f0-9]+:	d4 90 [ 	]*aam    \$0x90
@@ -527,7 +527,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	66 ca 90 90 [ 	]*lretw  \$0x9090
 [ 	]*[a-f0-9]+:	66 cb [ 	]*lretw
 [ 	]*[a-f0-9]+:	66 cf [ 	]*iretw
-[ 	]*[a-f0-9]+:	66 d1 90 90 90 90 90 [ 	]*rclw   -0x6f6f6f70\(%eax\)
+[ 	]*[a-f0-9]+:	66 d1 90 90 90 90 90 [ 	]*rclw   \$1,-0x6f6f6f70\(%eax\)
 [ 	]*[a-f0-9]+:	66 d3 90 90 90 90 90 [ 	]*rclw   %cl,-0x6f6f6f70\(%eax\)
 [ 	]*[a-f0-9]+:	66 e5 90 [ 	]*in     \$0x90,%ax
 [ 	]*[a-f0-9]+:	66 e7 90 [ 	]*out    %ax,\$0x90
diff --git a/gas/testsuite/gas/i386/lfence-load.d b/gas/testsuite/gas/i386/lfence-load.d
index 33ebef5432f..eb94bdcbb68 100644
--- a/gas/testsuite/gas/i386/lfence-load.d
+++ b/gas/testsuite/gas/i386/lfence-load.d
@@ -83,7 +83,7 @@ Disassembly of section .text:
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	58                   	pop    %eax
  +[a-f0-9]+:	0f ae e8             	lfence
- +[a-f0-9]+:	66 d1 11             	rclw   \(%ecx\)
+ +[a-f0-9]+:	66 d1 11             	rclw   \$1,\(%ecx\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	f7 01 01 00 00 00    	testl  \$0x1,\(%ecx\)
  +[a-f0-9]+:	0f ae e8             	lfence
diff --git a/gas/testsuite/gas/i386/noreg16-data32.d b/gas/testsuite/gas/i386/noreg16-data32.d
index 7561b549ebb..237e25dd0e1 100644
--- a/gas/testsuite/gas/i386/noreg16-data32.d
+++ b/gas/testsuite/gas/i386/noreg16-data32.d
@@ -96,43 +96,43 @@ Disassembly of section .text:
  *[a-f0-9]+:	f3 0f ae 27          	ptwrite \(%bx\)
  *[a-f0-9]+:	66 ff 37             	pushl  \(%bx\)
  *[a-f0-9]+:	66 06                	pushl  %es
- *[a-f0-9]+:	66 d1 17             	rcll   \(%bx\)
+ *[a-f0-9]+:	66 d1 17             	rcll   \$1,\(%bx\)
  *[a-f0-9]+:	66 c1 17 02          	rcll   \$0x2,\(%bx\)
  *[a-f0-9]+:	66 d3 17             	rcll   %cl,\(%bx\)
- *[a-f0-9]+:	66 d1 17             	rcll   \(%bx\)
- *[a-f0-9]+:	66 d1 1f             	rcrl   \(%bx\)
+ *[a-f0-9]+:	66 d1 17             	rcll   \$1,\(%bx\)
+ *[a-f0-9]+:	66 d1 1f             	rcrl   \$1,\(%bx\)
  *[a-f0-9]+:	66 c1 1f 02          	rcrl   \$0x2,\(%bx\)
  *[a-f0-9]+:	66 d3 1f             	rcrl   %cl,\(%bx\)
- *[a-f0-9]+:	66 d1 1f             	rcrl   \(%bx\)
- *[a-f0-9]+:	66 d1 07             	roll   \(%bx\)
+ *[a-f0-9]+:	66 d1 1f             	rcrl   \$1,\(%bx\)
+ *[a-f0-9]+:	66 d1 07             	roll   \$1,\(%bx\)
  *[a-f0-9]+:	66 c1 07 02          	roll   \$0x2,\(%bx\)
  *[a-f0-9]+:	66 d3 07             	roll   %cl,\(%bx\)
- *[a-f0-9]+:	66 d1 07             	roll   \(%bx\)
- *[a-f0-9]+:	66 d1 0f             	rorl   \(%bx\)
+ *[a-f0-9]+:	66 d1 07             	roll   \$1,\(%bx\)
+ *[a-f0-9]+:	66 d1 0f             	rorl   \$1,\(%bx\)
  *[a-f0-9]+:	66 c1 0f 02          	rorl   \$0x2,\(%bx\)
  *[a-f0-9]+:	66 d3 0f             	rorl   %cl,\(%bx\)
- *[a-f0-9]+:	66 d1 0f             	rorl   \(%bx\)
+ *[a-f0-9]+:	66 d1 0f             	rorl   \$1,\(%bx\)
  *[a-f0-9]+:	66 83 1f 01          	sbbl   \$0x1,\(%bx\)
  *[a-f0-9]+:	66 81 1f 89 00 00 00 	sbbl   \$0x89,\(%bx\)
  *[a-f0-9]+:	66 81 1f 34 12 00 00 	sbbl   \$0x1234,\(%bx\)
  *[a-f0-9]+:	66 af                	scas   %es:\(%di\),%eax
  *[a-f0-9]+:	66 af                	scas   %es:\(%di\),%eax
- *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
+ *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
  *[a-f0-9]+:	66 c1 27 02          	shll   \$0x2,\(%bx\)
  *[a-f0-9]+:	66 d3 27             	shll   %cl,\(%bx\)
- *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
- *[a-f0-9]+:	66 d1 3f             	sarl   \(%bx\)
+ *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
+ *[a-f0-9]+:	66 d1 3f             	sarl   \$1,\(%bx\)
  *[a-f0-9]+:	66 c1 3f 02          	sarl   \$0x2,\(%bx\)
  *[a-f0-9]+:	66 d3 3f             	sarl   %cl,\(%bx\)
- *[a-f0-9]+:	66 d1 3f             	sarl   \(%bx\)
- *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
+ *[a-f0-9]+:	66 d1 3f             	sarl   \$1,\(%bx\)
+ *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
  *[a-f0-9]+:	66 c1 27 02          	shll   \$0x2,\(%bx\)
  *[a-f0-9]+:	66 d3 27             	shll   %cl,\(%bx\)
- *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
- *[a-f0-9]+:	66 d1 2f             	shrl   \(%bx\)
+ *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
+ *[a-f0-9]+:	66 d1 2f             	shrl   \$1,\(%bx\)
  *[a-f0-9]+:	66 c1 2f 02          	shrl   \$0x2,\(%bx\)
  *[a-f0-9]+:	66 d3 2f             	shrl   %cl,\(%bx\)
- *[a-f0-9]+:	66 d1 2f             	shrl   \(%bx\)
+ *[a-f0-9]+:	66 d1 2f             	shrl   \$1,\(%bx\)
  *[a-f0-9]+:	66 ab                	stos   %eax,%es:\(%di\)
  *[a-f0-9]+:	66 ab                	stos   %eax,%es:\(%di\)
  *[a-f0-9]+:	66 83 2f 01          	subl   \$0x1,\(%bx\)
diff --git a/gas/testsuite/gas/i386/noreg16.d b/gas/testsuite/gas/i386/noreg16.d
index 86f852fb4ca..e4149b03a6e 100644
--- a/gas/testsuite/gas/i386/noreg16.d
+++ b/gas/testsuite/gas/i386/noreg16.d
@@ -95,43 +95,43 @@ Disassembly of section .text:
  *[a-f0-9]+:	f3 0f ae 27          	ptwrite \(%bx\)
  *[a-f0-9]+:	ff 37                	push   \(%bx\)
  *[a-f0-9]+:	06                   	push   %es
- *[a-f0-9]+:	d1 17                	rclw   \(%bx\)
+ *[a-f0-9]+:	d1 17                	rclw   \$1,\(%bx\)
  *[a-f0-9]+:	c1 17 02             	rclw   \$0x2,\(%bx\)
  *[a-f0-9]+:	d3 17                	rclw   %cl,\(%bx\)
- *[a-f0-9]+:	d1 17                	rclw   \(%bx\)
- *[a-f0-9]+:	d1 1f                	rcrw   \(%bx\)
+ *[a-f0-9]+:	d1 17                	rclw   \$1,\(%bx\)
+ *[a-f0-9]+:	d1 1f                	rcrw   \$1,\(%bx\)
  *[a-f0-9]+:	c1 1f 02             	rcrw   \$0x2,\(%bx\)
  *[a-f0-9]+:	d3 1f                	rcrw   %cl,\(%bx\)
- *[a-f0-9]+:	d1 1f                	rcrw   \(%bx\)
- *[a-f0-9]+:	d1 07                	rolw   \(%bx\)
+ *[a-f0-9]+:	d1 1f                	rcrw   \$1,\(%bx\)
+ *[a-f0-9]+:	d1 07                	rolw   \$1,\(%bx\)
  *[a-f0-9]+:	c1 07 02             	rolw   \$0x2,\(%bx\)
  *[a-f0-9]+:	d3 07                	rolw   %cl,\(%bx\)
- *[a-f0-9]+:	d1 07                	rolw   \(%bx\)
- *[a-f0-9]+:	d1 0f                	rorw   \(%bx\)
+ *[a-f0-9]+:	d1 07                	rolw   \$1,\(%bx\)
+ *[a-f0-9]+:	d1 0f                	rorw   \$1,\(%bx\)
  *[a-f0-9]+:	c1 0f 02             	rorw   \$0x2,\(%bx\)
  *[a-f0-9]+:	d3 0f                	rorw   %cl,\(%bx\)
- *[a-f0-9]+:	d1 0f                	rorw   \(%bx\)
+ *[a-f0-9]+:	d1 0f                	rorw   \$1,\(%bx\)
  *[a-f0-9]+:	83 1f 01             	sbbw   \$0x1,\(%bx\)
  *[a-f0-9]+:	81 1f 89 00          	sbbw   \$0x89,\(%bx\)
  *[a-f0-9]+:	81 1f 34 12          	sbbw   \$0x1234,\(%bx\)
  *[a-f0-9]+:	af                   	scas   %es:\(%di\),%ax
  *[a-f0-9]+:	af                   	scas   %es:\(%di\),%ax
- *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
+ *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
  *[a-f0-9]+:	c1 27 02             	shlw   \$0x2,\(%bx\)
  *[a-f0-9]+:	d3 27                	shlw   %cl,\(%bx\)
- *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
- *[a-f0-9]+:	d1 3f                	sarw   \(%bx\)
+ *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
+ *[a-f0-9]+:	d1 3f                	sarw   \$1,\(%bx\)
  *[a-f0-9]+:	c1 3f 02             	sarw   \$0x2,\(%bx\)
  *[a-f0-9]+:	d3 3f                	sarw   %cl,\(%bx\)
- *[a-f0-9]+:	d1 3f                	sarw   \(%bx\)
- *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
+ *[a-f0-9]+:	d1 3f                	sarw   \$1,\(%bx\)
+ *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
  *[a-f0-9]+:	c1 27 02             	shlw   \$0x2,\(%bx\)
  *[a-f0-9]+:	d3 27                	shlw   %cl,\(%bx\)
- *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
- *[a-f0-9]+:	d1 2f                	shrw   \(%bx\)
+ *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
+ *[a-f0-9]+:	d1 2f                	shrw   \$1,\(%bx\)
  *[a-f0-9]+:	c1 2f 02             	shrw   \$0x2,\(%bx\)
  *[a-f0-9]+:	d3 2f                	shrw   %cl,\(%bx\)
- *[a-f0-9]+:	d1 2f                	shrw   \(%bx\)
+ *[a-f0-9]+:	d1 2f                	shrw   \$1,\(%bx\)
  *[a-f0-9]+:	ab                   	stos   %ax,%es:\(%di\)
  *[a-f0-9]+:	ab                   	stos   %ax,%es:\(%di\)
  *[a-f0-9]+:	83 2f 01             	subw   \$0x1,\(%bx\)
diff --git a/gas/testsuite/gas/i386/noreg32-data16.d b/gas/testsuite/gas/i386/noreg32-data16.d
index 1ec6b9e8670..e3ae2116bb1 100644
--- a/gas/testsuite/gas/i386/noreg32-data16.d
+++ b/gas/testsuite/gas/i386/noreg32-data16.d
@@ -103,44 +103,44 @@ Disassembly of section .text:
  *[a-f0-9]+:	f3 0f ae 20          	ptwrite \(%eax\)
  *[a-f0-9]+:	66 ff 30             	pushw  \(%eax\)
  *[a-f0-9]+:	66 06                	pushw  %es
- *[a-f0-9]+:	66 d1 10             	rclw   \(%eax\)
+ *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%eax\)
  *[a-f0-9]+:	66 c1 10 02          	rclw   \$0x2,\(%eax\)
  *[a-f0-9]+:	66 d3 10             	rclw   %cl,\(%eax\)
- *[a-f0-9]+:	66 d1 10             	rclw   \(%eax\)
- *[a-f0-9]+:	66 d1 18             	rcrw   \(%eax\)
+ *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%eax\)
+ *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%eax\)
  *[a-f0-9]+:	66 c1 18 02          	rcrw   \$0x2,\(%eax\)
  *[a-f0-9]+:	66 d3 18             	rcrw   %cl,\(%eax\)
- *[a-f0-9]+:	66 d1 18             	rcrw   \(%eax\)
- *[a-f0-9]+:	66 d1 00             	rolw   \(%eax\)
+ *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%eax\)
+ *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%eax\)
  *[a-f0-9]+:	66 c1 00 02          	rolw   \$0x2,\(%eax\)
  *[a-f0-9]+:	66 d3 00             	rolw   %cl,\(%eax\)
- *[a-f0-9]+:	66 d1 00             	rolw   \(%eax\)
- *[a-f0-9]+:	66 d1 08             	rorw   \(%eax\)
+ *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%eax\)
+ *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%eax\)
  *[a-f0-9]+:	66 c1 08 02          	rorw   \$0x2,\(%eax\)
  *[a-f0-9]+:	66 d3 08             	rorw   %cl,\(%eax\)
- *[a-f0-9]+:	66 d1 08             	rorw   \(%eax\)
+ *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%eax\)
  *[a-f0-9]+:	66 83 18 01          	sbbw   \$0x1,\(%eax\)
  *[a-f0-9]+:	66 81 18 89 00       	sbbw   \$0x89,\(%eax\)
  *[a-f0-9]+:	66 81 18 34 12       	sbbw   \$0x1234,\(%eax\)
  *[a-f0-9]+:	66 81 18 78 56       	sbbw   \$0x5678,\(%eax\)
  *[a-f0-9]+:	66 af                	scas   %es:\(%edi\),%ax
  *[a-f0-9]+:	66 af                	scas   %es:\(%edi\),%ax
- *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
+ *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
  *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%eax\)
  *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%eax\)
- *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
- *[a-f0-9]+:	66 d1 38             	sarw   \(%eax\)
+ *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
+ *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%eax\)
  *[a-f0-9]+:	66 c1 38 02          	sarw   \$0x2,\(%eax\)
  *[a-f0-9]+:	66 d3 38             	sarw   %cl,\(%eax\)
- *[a-f0-9]+:	66 d1 38             	sarw   \(%eax\)
- *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
+ *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%eax\)
+ *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
  *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%eax\)
  *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%eax\)
- *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
- *[a-f0-9]+:	66 d1 28             	shrw   \(%eax\)
+ *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
+ *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%eax\)
  *[a-f0-9]+:	66 c1 28 02          	shrw   \$0x2,\(%eax\)
  *[a-f0-9]+:	66 d3 28             	shrw   %cl,\(%eax\)
- *[a-f0-9]+:	66 d1 28             	shrw   \(%eax\)
+ *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%eax\)
  *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%edi\)
  *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%edi\)
  *[a-f0-9]+:	66 83 28 01          	subw   \$0x1,\(%eax\)
diff --git a/gas/testsuite/gas/i386/noreg32.d b/gas/testsuite/gas/i386/noreg32.d
index 9dbef908ce7..8bb08ca73c6 100644
--- a/gas/testsuite/gas/i386/noreg32.d
+++ b/gas/testsuite/gas/i386/noreg32.d
@@ -101,44 +101,44 @@ Disassembly of section .text:
  *[a-f0-9]+:	f3 0f ae 20          	ptwrite \(%eax\)
  *[a-f0-9]+:	ff 30                	push   \(%eax\)
  *[a-f0-9]+:	06                   	push   %es
- *[a-f0-9]+:	d1 10                	rcll   \(%eax\)
+ *[a-f0-9]+:	d1 10                	rcll   \$1,\(%eax\)
  *[a-f0-9]+:	c1 10 02             	rcll   \$0x2,\(%eax\)
  *[a-f0-9]+:	d3 10                	rcll   %cl,\(%eax\)
- *[a-f0-9]+:	d1 10                	rcll   \(%eax\)
- *[a-f0-9]+:	d1 18                	rcrl   \(%eax\)
+ *[a-f0-9]+:	d1 10                	rcll   \$1,\(%eax\)
+ *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%eax\)
  *[a-f0-9]+:	c1 18 02             	rcrl   \$0x2,\(%eax\)
  *[a-f0-9]+:	d3 18                	rcrl   %cl,\(%eax\)
- *[a-f0-9]+:	d1 18                	rcrl   \(%eax\)
- *[a-f0-9]+:	d1 00                	roll   \(%eax\)
+ *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%eax\)
+ *[a-f0-9]+:	d1 00                	roll   \$1,\(%eax\)
  *[a-f0-9]+:	c1 00 02             	roll   \$0x2,\(%eax\)
  *[a-f0-9]+:	d3 00                	roll   %cl,\(%eax\)
- *[a-f0-9]+:	d1 00                	roll   \(%eax\)
- *[a-f0-9]+:	d1 08                	rorl   \(%eax\)
+ *[a-f0-9]+:	d1 00                	roll   \$1,\(%eax\)
+ *[a-f0-9]+:	d1 08                	rorl   \$1,\(%eax\)
  *[a-f0-9]+:	c1 08 02             	rorl   \$0x2,\(%eax\)
  *[a-f0-9]+:	d3 08                	rorl   %cl,\(%eax\)
- *[a-f0-9]+:	d1 08                	rorl   \(%eax\)
+ *[a-f0-9]+:	d1 08                	rorl   \$1,\(%eax\)
  *[a-f0-9]+:	83 18 01             	sbbl   \$0x1,\(%eax\)
  *[a-f0-9]+:	81 18 89 00 00 00    	sbbl   \$0x89,\(%eax\)
  *[a-f0-9]+:	81 18 34 12 00 00    	sbbl   \$0x1234,\(%eax\)
  *[a-f0-9]+:	81 18 78 56 34 12    	sbbl   \$0x12345678,\(%eax\)
  *[a-f0-9]+:	af                   	scas   %es:\(%edi\),%eax
  *[a-f0-9]+:	af                   	scas   %es:\(%edi\),%eax
- *[a-f0-9]+:	d1 20                	shll   \(%eax\)
+ *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
  *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%eax\)
  *[a-f0-9]+:	d3 20                	shll   %cl,\(%eax\)
- *[a-f0-9]+:	d1 20                	shll   \(%eax\)
- *[a-f0-9]+:	d1 38                	sarl   \(%eax\)
+ *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
+ *[a-f0-9]+:	d1 38                	sarl   \$1,\(%eax\)
  *[a-f0-9]+:	c1 38 02             	sarl   \$0x2,\(%eax\)
  *[a-f0-9]+:	d3 38                	sarl   %cl,\(%eax\)
- *[a-f0-9]+:	d1 38                	sarl   \(%eax\)
- *[a-f0-9]+:	d1 20                	shll   \(%eax\)
+ *[a-f0-9]+:	d1 38                	sarl   \$1,\(%eax\)
+ *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
  *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%eax\)
  *[a-f0-9]+:	d3 20                	shll   %cl,\(%eax\)
- *[a-f0-9]+:	d1 20                	shll   \(%eax\)
- *[a-f0-9]+:	d1 28                	shrl   \(%eax\)
+ *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
+ *[a-f0-9]+:	d1 28                	shrl   \$1,\(%eax\)
  *[a-f0-9]+:	c1 28 02             	shrl   \$0x2,\(%eax\)
  *[a-f0-9]+:	d3 28                	shrl   %cl,\(%eax\)
- *[a-f0-9]+:	d1 28                	shrl   \(%eax\)
+ *[a-f0-9]+:	d1 28                	shrl   \$1,\(%eax\)
  *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%edi\)
  *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%edi\)
  *[a-f0-9]+:	83 28 01             	subl   \$0x1,\(%eax\)
diff --git a/gas/testsuite/gas/i386/noreg64-data16.d b/gas/testsuite/gas/i386/noreg64-data16.d
index f1e67096a58..802eb4053d3 100644
--- a/gas/testsuite/gas/i386/noreg64-data16.d
+++ b/gas/testsuite/gas/i386/noreg64-data16.d
@@ -106,44 +106,44 @@ Disassembly of section .text:
  *[a-f0-9]+:	66 0f a1             	popw   %fs
  *[a-f0-9]+:	66 ff 30             	pushw  \(%rax\)
  *[a-f0-9]+:	66 0f a0             	pushw  %fs
- *[a-f0-9]+:	66 d1 10             	rclw   \(%rax\)
+ *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%rax\)
  *[a-f0-9]+:	66 c1 10 02          	rclw   \$0x2,\(%rax\)
  *[a-f0-9]+:	66 d3 10             	rclw   %cl,\(%rax\)
- *[a-f0-9]+:	66 d1 10             	rclw   \(%rax\)
- *[a-f0-9]+:	66 d1 18             	rcrw   \(%rax\)
+ *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%rax\)
+ *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%rax\)
  *[a-f0-9]+:	66 c1 18 02          	rcrw   \$0x2,\(%rax\)
  *[a-f0-9]+:	66 d3 18             	rcrw   %cl,\(%rax\)
- *[a-f0-9]+:	66 d1 18             	rcrw   \(%rax\)
- *[a-f0-9]+:	66 d1 00             	rolw   \(%rax\)
+ *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%rax\)
+ *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%rax\)
  *[a-f0-9]+:	66 c1 00 02          	rolw   \$0x2,\(%rax\)
  *[a-f0-9]+:	66 d3 00             	rolw   %cl,\(%rax\)
- *[a-f0-9]+:	66 d1 00             	rolw   \(%rax\)
- *[a-f0-9]+:	66 d1 08             	rorw   \(%rax\)
+ *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%rax\)
+ *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%rax\)
  *[a-f0-9]+:	66 c1 08 02          	rorw   \$0x2,\(%rax\)
  *[a-f0-9]+:	66 d3 08             	rorw   %cl,\(%rax\)
- *[a-f0-9]+:	66 d1 08             	rorw   \(%rax\)
+ *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%rax\)
  *[a-f0-9]+:	66 83 18 01          	sbbw   \$0x1,\(%rax\)
  *[a-f0-9]+:	66 81 18 89 00       	sbbw   \$0x89,\(%rax\)
  *[a-f0-9]+:	66 81 18 34 12       	sbbw   \$0x1234,\(%rax\)
  *[a-f0-9]+:	66 81 18 78 56       	sbbw   \$0x5678,\(%rax\)
  *[a-f0-9]+:	66 af                	scas   %es:\(%rdi\),%ax
  *[a-f0-9]+:	66 af                	scas   %es:\(%rdi\),%ax
- *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
+ *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
  *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%rax\)
  *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%rax\)
- *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
- *[a-f0-9]+:	66 d1 38             	sarw   \(%rax\)
+ *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
+ *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%rax\)
  *[a-f0-9]+:	66 c1 38 02          	sarw   \$0x2,\(%rax\)
  *[a-f0-9]+:	66 d3 38             	sarw   %cl,\(%rax\)
- *[a-f0-9]+:	66 d1 38             	sarw   \(%rax\)
- *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
+ *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%rax\)
+ *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
  *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%rax\)
  *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%rax\)
- *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
- *[a-f0-9]+:	66 d1 28             	shrw   \(%rax\)
+ *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
+ *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%rax\)
  *[a-f0-9]+:	66 c1 28 02          	shrw   \$0x2,\(%rax\)
  *[a-f0-9]+:	66 d3 28             	shrw   %cl,\(%rax\)
- *[a-f0-9]+:	66 d1 28             	shrw   \(%rax\)
+ *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%rax\)
  *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%rdi\)
  *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%rdi\)
  *[a-f0-9]+:	66 83 28 01          	subw   \$0x1,\(%rax\)
diff --git a/gas/testsuite/gas/i386/noreg64-rex64.d b/gas/testsuite/gas/i386/noreg64-rex64.d
index cd8679e626a..e33851d8093 100644
--- a/gas/testsuite/gas/i386/noreg64-rex64.d
+++ b/gas/testsuite/gas/i386/noreg64-rex64.d
@@ -105,44 +105,44 @@ Disassembly of section .text:
  *[a-f0-9]+:	f3 48 0f ae 20       	ptwriteq \(%rax\)
  *[a-f0-9]+:	48 ff 30             	rex\.W push \(%rax\)
  *[a-f0-9]+:	48 0f a0             	rex\.W push %fs
- *[a-f0-9]+:	48 d1 10             	rclq   \(%rax\)
+ *[a-f0-9]+:	48 d1 10             	rclq   \$1,\(%rax\)
  *[a-f0-9]+:	48 c1 10 02          	rclq   \$0x2,\(%rax\)
  *[a-f0-9]+:	48 d3 10             	rclq   %cl,\(%rax\)
- *[a-f0-9]+:	48 d1 10             	rclq   \(%rax\)
- *[a-f0-9]+:	48 d1 18             	rcrq   \(%rax\)
+ *[a-f0-9]+:	48 d1 10             	rclq   \$1,\(%rax\)
+ *[a-f0-9]+:	48 d1 18             	rcrq   \$1,\(%rax\)
  *[a-f0-9]+:	48 c1 18 02          	rcrq   \$0x2,\(%rax\)
  *[a-f0-9]+:	48 d3 18             	rcrq   %cl,\(%rax\)
- *[a-f0-9]+:	48 d1 18             	rcrq   \(%rax\)
- *[a-f0-9]+:	48 d1 00             	rolq   \(%rax\)
+ *[a-f0-9]+:	48 d1 18             	rcrq   \$1,\(%rax\)
+ *[a-f0-9]+:	48 d1 00             	rolq   \$1,\(%rax\)
  *[a-f0-9]+:	48 c1 00 02          	rolq   \$0x2,\(%rax\)
  *[a-f0-9]+:	48 d3 00             	rolq   %cl,\(%rax\)
- *[a-f0-9]+:	48 d1 00             	rolq   \(%rax\)
- *[a-f0-9]+:	48 d1 08             	rorq   \(%rax\)
+ *[a-f0-9]+:	48 d1 00             	rolq   \$1,\(%rax\)
+ *[a-f0-9]+:	48 d1 08             	rorq   \$1,\(%rax\)
  *[a-f0-9]+:	48 c1 08 02          	rorq   \$0x2,\(%rax\)
  *[a-f0-9]+:	48 d3 08             	rorq   %cl,\(%rax\)
- *[a-f0-9]+:	48 d1 08             	rorq   \(%rax\)
+ *[a-f0-9]+:	48 d1 08             	rorq   \$1,\(%rax\)
  *[a-f0-9]+:	48 83 18 01          	sbbq   \$0x1,\(%rax\)
  *[a-f0-9]+:	48 81 18 89 00 00 00 	sbbq   \$0x89,\(%rax\)
  *[a-f0-9]+:	48 81 18 34 12 00 00 	sbbq   \$0x1234,\(%rax\)
  *[a-f0-9]+:	48 81 18 78 56 34 12 	sbbq   \$0x12345678,\(%rax\)
  *[a-f0-9]+:	48 af                	scas   %es:\(%rdi\),%rax
  *[a-f0-9]+:	48 af                	scas   %es:\(%rdi\),%rax
- *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
+ *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
  *[a-f0-9]+:	48 c1 20 02          	shlq   \$0x2,\(%rax\)
  *[a-f0-9]+:	48 d3 20             	shlq   %cl,\(%rax\)
- *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
- *[a-f0-9]+:	48 d1 38             	sarq   \(%rax\)
+ *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
+ *[a-f0-9]+:	48 d1 38             	sarq   \$1,\(%rax\)
  *[a-f0-9]+:	48 c1 38 02          	sarq   \$0x2,\(%rax\)
  *[a-f0-9]+:	48 d3 38             	sarq   %cl,\(%rax\)
- *[a-f0-9]+:	48 d1 38             	sarq   \(%rax\)
- *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
+ *[a-f0-9]+:	48 d1 38             	sarq   \$1,\(%rax\)
+ *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
  *[a-f0-9]+:	48 c1 20 02          	shlq   \$0x2,\(%rax\)
  *[a-f0-9]+:	48 d3 20             	shlq   %cl,\(%rax\)
- *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
- *[a-f0-9]+:	48 d1 28             	shrq   \(%rax\)
+ *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
+ *[a-f0-9]+:	48 d1 28             	shrq   \$1,\(%rax\)
  *[a-f0-9]+:	48 c1 28 02          	shrq   \$0x2,\(%rax\)
  *[a-f0-9]+:	48 d3 28             	shrq   %cl,\(%rax\)
- *[a-f0-9]+:	48 d1 28             	shrq   \(%rax\)
+ *[a-f0-9]+:	48 d1 28             	shrq   \$1,\(%rax\)
  *[a-f0-9]+:	48 ab                	stos   %rax,%es:\(%rdi\)
  *[a-f0-9]+:	48 ab                	stos   %rax,%es:\(%rdi\)
  *[a-f0-9]+:	48 83 28 01          	subq   \$0x1,\(%rax\)
diff --git a/gas/testsuite/gas/i386/noreg64.d b/gas/testsuite/gas/i386/noreg64.d
index 354d89069ae..2afdef38f92 100644
--- a/gas/testsuite/gas/i386/noreg64.d
+++ b/gas/testsuite/gas/i386/noreg64.d
@@ -107,44 +107,44 @@ Disassembly of section .text:
  *[a-f0-9]+:	f3 0f ae 20          	ptwritel \(%rax\)
  *[a-f0-9]+:	ff 30                	push   \(%rax\)
  *[a-f0-9]+:	0f a0                	push   %fs
- *[a-f0-9]+:	d1 10                	rcll   \(%rax\)
+ *[a-f0-9]+:	d1 10                	rcll   \$1,\(%rax\)
  *[a-f0-9]+:	c1 10 02             	rcll   \$0x2,\(%rax\)
  *[a-f0-9]+:	d3 10                	rcll   %cl,\(%rax\)
- *[a-f0-9]+:	d1 10                	rcll   \(%rax\)
- *[a-f0-9]+:	d1 18                	rcrl   \(%rax\)
+ *[a-f0-9]+:	d1 10                	rcll   \$1,\(%rax\)
+ *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%rax\)
  *[a-f0-9]+:	c1 18 02             	rcrl   \$0x2,\(%rax\)
  *[a-f0-9]+:	d3 18                	rcrl   %cl,\(%rax\)
- *[a-f0-9]+:	d1 18                	rcrl   \(%rax\)
- *[a-f0-9]+:	d1 00                	roll   \(%rax\)
+ *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%rax\)
+ *[a-f0-9]+:	d1 00                	roll   \$1,\(%rax\)
  *[a-f0-9]+:	c1 00 02             	roll   \$0x2,\(%rax\)
  *[a-f0-9]+:	d3 00                	roll   %cl,\(%rax\)
- *[a-f0-9]+:	d1 00                	roll   \(%rax\)
- *[a-f0-9]+:	d1 08                	rorl   \(%rax\)
+ *[a-f0-9]+:	d1 00                	roll   \$1,\(%rax\)
+ *[a-f0-9]+:	d1 08                	rorl   \$1,\(%rax\)
  *[a-f0-9]+:	c1 08 02             	rorl   \$0x2,\(%rax\)
  *[a-f0-9]+:	d3 08                	rorl   %cl,\(%rax\)
- *[a-f0-9]+:	d1 08                	rorl   \(%rax\)
+ *[a-f0-9]+:	d1 08                	rorl   \$1,\(%rax\)
  *[a-f0-9]+:	83 18 01             	sbbl   \$0x1,\(%rax\)
  *[a-f0-9]+:	81 18 89 00 00 00    	sbbl   \$0x89,\(%rax\)
  *[a-f0-9]+:	81 18 34 12 00 00    	sbbl   \$0x1234,\(%rax\)
  *[a-f0-9]+:	81 18 78 56 34 12    	sbbl   \$0x12345678,\(%rax\)
  *[a-f0-9]+:	af                   	scas   %es:\(%rdi\),%eax
  *[a-f0-9]+:	af                   	scas   %es:\(%rdi\),%eax
- *[a-f0-9]+:	d1 20                	shll   \(%rax\)
+ *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
  *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%rax\)
  *[a-f0-9]+:	d3 20                	shll   %cl,\(%rax\)
- *[a-f0-9]+:	d1 20                	shll   \(%rax\)
- *[a-f0-9]+:	d1 38                	sarl   \(%rax\)
+ *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
+ *[a-f0-9]+:	d1 38                	sarl   \$1,\(%rax\)
  *[a-f0-9]+:	c1 38 02             	sarl   \$0x2,\(%rax\)
  *[a-f0-9]+:	d3 38                	sarl   %cl,\(%rax\)
- *[a-f0-9]+:	d1 38                	sarl   \(%rax\)
- *[a-f0-9]+:	d1 20                	shll   \(%rax\)
+ *[a-f0-9]+:	d1 38                	sarl   \$1,\(%rax\)
+ *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
  *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%rax\)
  *[a-f0-9]+:	d3 20                	shll   %cl,\(%rax\)
- *[a-f0-9]+:	d1 20                	shll   \(%rax\)
- *[a-f0-9]+:	d1 28                	shrl   \(%rax\)
+ *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
+ *[a-f0-9]+:	d1 28                	shrl   \$1,\(%rax\)
  *[a-f0-9]+:	c1 28 02             	shrl   \$0x2,\(%rax\)
  *[a-f0-9]+:	d3 28                	shrl   %cl,\(%rax\)
- *[a-f0-9]+:	d1 28                	shrl   \(%rax\)
+ *[a-f0-9]+:	d1 28                	shrl   \$1,\(%rax\)
  *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%rdi\)
  *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%rdi\)
  *[a-f0-9]+:	83 28 01             	subl   \$0x1,\(%rax\)
diff --git a/gas/testsuite/gas/i386/opcode-suffix.d b/gas/testsuite/gas/i386/opcode-suffix.d
index 946a0a4d7a0..ca6af50c9cf 100644
--- a/gas/testsuite/gas/i386/opcode-suffix.d
+++ b/gas/testsuite/gas/i386/opcode-suffix.d
@@ -206,8 +206,8 @@ Disassembly of section .text:
  *[0-9a-f]+:	cd 90[ 	]+int[ 	]+\$0x90
  *[0-9a-f]+:	ce[ 	]+into
  *[0-9a-f]+:	cf[ 	]+iretl
- *[0-9a-f]+:	d0 90 90 90 90 90[ 	]+rclb[ 	]+-0x6f6f6f70\(%eax\)
- *[0-9a-f]+:	d1 90 90 90 90 90[ 	]+rcll[ 	]+-0x6f6f6f70\(%eax\)
+ *[0-9a-f]+:	d0 90 90 90 90 90[ 	]+rclb[ 	]+\$1,-0x6f6f6f70\(%eax\)
+ *[0-9a-f]+:	d1 90 90 90 90 90[ 	]+rcll[ 	]+\$1,-0x6f6f6f70\(%eax\)
  *[0-9a-f]+:	d2 90 90 90 90 90[ 	]+rclb[ 	]+%cl,-0x6f6f6f70\(%eax\)
  *[0-9a-f]+:	d3 90 90 90 90 90[ 	]+rcll[ 	]+%cl,-0x6f6f6f70\(%eax\)
  *[0-9a-f]+:	d4 90[ 	]+aam[ 	]+\$0x90
@@ -523,7 +523,7 @@ Disassembly of section .text:
  *[0-9a-f]+:	66 ca 90 90[ 	]+lretw[ 	]+\$0x9090
  *[0-9a-f]+:	66 cb[ 	]+lretw
  *[0-9a-f]+:	66 cf[ 	]+iretw
- *[0-9a-f]+:	66 d1 90 90 90 90 90[ 	]+rclw[ 	]+-0x6f6f6f70\(%eax\)
+ *[0-9a-f]+:	66 d1 90 90 90 90 90[ 	]+rclw[ 	]+\$1,-0x6f6f6f70\(%eax\)
  *[0-9a-f]+:	66 d3 90 90 90 90 90[ 	]+rclw[ 	]+%cl,-0x6f6f6f70\(%eax\)
  *[0-9a-f]+:	66 e5 90[ 	]+inw[ 	]+\$0x90,%ax
  *[0-9a-f]+:	66 e7 90[ 	]+outw[ 	]+%ax,\$0x90
diff --git a/gas/testsuite/gas/i386/opcode.d b/gas/testsuite/gas/i386/opcode.d
index 7631195d8d4..f7af22518e2 100644
--- a/gas/testsuite/gas/i386/opcode.d
+++ b/gas/testsuite/gas/i386/opcode.d
@@ -205,8 +205,8 @@ Disassembly of section .text:
  279:	cd 90 [ 	]*int    \$0x90
  27b:	ce [ 	]*into
  27c:	cf [ 	]*iret
- 27d:	d0 90 90 90 90 90 [ 	]*rclb   -0x6f6f6f70\(%eax\)
- 283:	d1 90 90 90 90 90 [ 	]*rcll   -0x6f6f6f70\(%eax\)
+ 27d:	d0 90 90 90 90 90 [ 	]*rclb   \$1,-0x6f6f6f70\(%eax\)
+ 283:	d1 90 90 90 90 90 [ 	]*rcll   \$1,-0x6f6f6f70\(%eax\)
  289:	d2 90 90 90 90 90 [ 	]*rclb   %cl,-0x6f6f6f70\(%eax\)
  28f:	d3 90 90 90 90 90 [ 	]*rcll   %cl,-0x6f6f6f70\(%eax\)
  295:	d4 90 [ 	]*aam    \$0x90
@@ -522,7 +522,7 @@ Disassembly of section .text:
  869:	66 ca 90 90 [ 	]*lretw  \$0x9090
  86d:	66 cb [ 	]*lretw
  86f:	66 cf [ 	]*iretw
- 871:	66 d1 90 90 90 90 90 [ 	]*rclw   -0x6f6f6f70\(%eax\)
+ 871:	66 d1 90 90 90 90 90 [ 	]*rclw   \$1,-0x6f6f6f70\(%eax\)
  878:	66 d3 90 90 90 90 90 [ 	]*rclw   %cl,-0x6f6f6f70\(%eax\)
  87f:	66 e5 90 [ 	]*in     \$0x90,%ax
  882:	66 e7 90 [ 	]*out    %ax,\$0x90
@@ -610,8 +610,8 @@ Disassembly of section .text:
  +[a-f0-9]+:	f7 c9 04 00 00 00    	test   \$(0x)?0*4,%ecx
  +[a-f0-9]+:	c0 f0 02             	shl    \$0x2,%al
  +[a-f0-9]+:	c1 f0 01             	shl    \$0x1,%eax
- +[a-f0-9]+:	d0 f0                	shl    %al
- +[a-f0-9]+:	d1 f0                	shl    %eax
+ +[a-f0-9]+:	d0 f0                	shl    \$1,%al
+ +[a-f0-9]+:	d1 f0                	shl    \$1,%eax
  +[a-f0-9]+:	d2 f0                	shl    %cl,%al
  +[a-f0-9]+:	d3 f0                	shl    %cl,%eax
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d b/gas/testsuite/gas/i386/x86-64-lfence-load.d
index b4a03db811d..726236826e8 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -90,7 +90,7 @@ Disassembly of section .text:
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	58                   	pop    %rax
  +[a-f0-9]+:	0f ae e8             	lfence
- +[a-f0-9]+:	66 d1 11             	rclw   \(%rcx\)
+ +[a-f0-9]+:	66 d1 11             	rclw   \$1,\(%rcx\)
  +[a-f0-9]+:	0f ae e8             	lfence
  +[a-f0-9]+:	f7 01 01 00 00 00    	testl  \$0x1,\(%rcx\)
  +[a-f0-9]+:	0f ae e8             	lfence
diff --git a/gas/testsuite/gas/i386/x86-64-opcode.d b/gas/testsuite/gas/i386/x86-64-opcode.d
index ee6d0f5f4bd..1b8a9fa9014 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode.d
@@ -335,9 +335,9 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	c0 f0 02             	shl    \$0x2,%al
 [ 	]*[a-f0-9]+:	c1 f0 01             	shl    \$0x1,%eax
 [ 	]*[a-f0-9]+:	48 c1 f0 01          	shl    \$0x1,%rax
-[ 	]*[a-f0-9]+:	d0 f0                	shl    %al
-[ 	]*[a-f0-9]+:	d1 f0                	shl    %eax
-[ 	]*[a-f0-9]+:	48 d1 f0             	shl    %rax
+[ 	]*[a-f0-9]+:	d0 f0                	shl    \$1,%al
+[ 	]*[a-f0-9]+:	d1 f0                	shl    \$1,%eax
+[ 	]*[a-f0-9]+:	48 d1 f0             	shl    \$1,%rax
 [ 	]*[a-f0-9]+:	d2 f0                	shl    %cl,%al
 [ 	]*[a-f0-9]+:	d3 f0                	shl    %cl,%eax
 [ 	]*[a-f0-9]+:	48 d3 f0             	shl    %cl,%rax
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 2e2043d467b..e432b61a6cd 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -12090,6 +12090,8 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
     case const_1_mode:
       if (ins->intel_syntax)
 	oappend (ins, "1");
+      else
+	oappend (ins, "$1");
       return true;
     default:
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
@ 2023-11-24  7:02 ` Cui, Lili
  2023-12-04 16:30   ` Jan Beulich
  2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu

APX uses the REX2 prefix to support EGPR for map0 and map1 of legacy
instructions. We added the NoEgpr flag in i386-gen.c for instructions
that do not support EGPR.

We print the pseudo prefix {rex2} for instructions that are ambiguous,
unlike REX.

gas/ChangeLog:

2023-11-21  Lingling Kong <lingling.kong@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Lili Cui <lili.cui@intel.com>
	    Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c
	(enum i386_error): Add register_type_of_address_mismatch
	and invalid_pseudo_prefix.
	(struct _i386_insn): Add rex2 rex-byte and rex2_encoding for
	gpr32 r16-r31.
	(is_cpu): Add apx_f.
	(register_number): Handle RegRex2 for gpr32.
	(is_apx_rex2_encoding): New func. Test rex2 prefix encoding.
	(build_rex2_prefix): New func. Build legacy insn in
	opcode 0/1 use gpr32 with rex2 prefix.
	(optimize_encoding): Handel add r16-r31 for registers.
	(md_assemble): Handle apx encoding.
	(parse_insn): Handle Prefix_REX2.
	(check_EgprOperands): New func. Check if Egprs operands
	are valid for the instruction
	(match_template):  Handle Egpr operands check.
	(set_rex_rex2):  New func. set i.rex and i.rex2.
	(build_modrm_byte): Ditto.
	(output_insn): Handle rex2 2-byte prefix output.
	(check_register): Handle check egpr illegal without
	target apx, 64-bit mode and with rex_prefix.
	* doc/c-i386.texi: Document .apx.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d: D5 valid
	in 64-bit mode.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-inval-pseudo.l: Add rex2 invalid testcase.
	* testsuite/gas/i386/x86-64-inval-pseudo.s:  Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-pseudos-bad.l: Add illegal rex2 test.
	* testsuite/gas/i386/x86-64-pseudos-bad.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add rex2 test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Run APX tests.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.d: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.s: New test.

include/ChangeLog:

	* opcode/i386.h (REX2_OPCODE): Add REX2_OPCODE.

opcodes/ChangeLog:

	* i386-dis.c (struct instr_info): Add erex for gpr32.
	Add last_erex_prefix for rex2 prefix.
	(USED_REX2): Extend for gpr32.
	(REX2_M): Ditto.
	(PREFIX_REX2): Ditto.
	(ILLEGAL_PREFIX_REX2): Ditto.
	(ckprefix): Ditto.
	(prefix_name): Ditto.
	(print_insn): Ditto.
	(print_register): Ditto.
	(OP_E_memory): Ditto.
	(OP_REG): Ditto.
	(OP_EX): Ditto.
	* i386-gen.c (rex2_disallowed): Some instructions are not allowed rex2 prefix.
	(process_i386_opcode_modifier): Set NoEgpr for VEX and some special instructions.
	(output_i386_opcode): Handle if_entry_needs_special_handle.
	* i386-init.h : Regenerated.
	* i386-mnem.h : Regenerated.
	* i386-opc.h (enum i386_cpu): Add CpuAPX_F.
	(Prefix_NoOptimize): Ditto.
	(Prefix_REX2): Ditto.
	(RegRex2): Ditto.
	* i386-opc.tbl: Add rex2 prefix.
	* i386-reg.tbl: Add egprs (r16-r31).
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          | 164 +++++++++--
 gas/doc/c-i386.texi                           |   6 +-
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |  47 +---
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |  47 +---
 .../gas/i386/x86-64-apx-egpr-inval.l          |  15 +
 .../gas/i386/x86-64-apx-egpr-inval.s          |  18 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 ++++++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 ++++++
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
 .../gas/i386/x86-64-opcode-inval-intel.d      |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.s  |   4 -
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  59 +++-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  58 ++++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  21 ++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  22 ++
 gas/testsuite/gas/i386/x86-64.exp             |   2 +
 include/opcode/i386.h                         |   2 +
 opcodes/i386-dis.c                            | 262 ++++++++++++------
 opcodes/i386-gen.c                            |  55 +++-
 opcodes/i386-opc.h                            |  13 +-
 opcodes/i386-opc.tbl                          |  28 +-
 opcodes/i386-reg.tbl                          |  64 +++++
 24 files changed, 856 insertions(+), 262 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 235e41e7918..638d3aa07c8 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -239,6 +239,7 @@ enum i386_error
     bad_imm4,
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
+    unsupported_EGPR_for_addressing,
     unsupported,
     unsupported_on_arch,
     unsupported_64bit,
@@ -247,6 +248,7 @@ enum i386_error
     invalid_vector_register_set,
     invalid_tmm_register_set,
     invalid_dest_and_src_register_set,
+    invalid_pseudo_prefix,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -354,6 +356,7 @@ struct _i386_insn
     modrm_byte rm;
     rex_byte rex;
     rex_byte vrex;
+    rex_byte rex2;
     sib_byte sib;
     vex_prefix vex;
 
@@ -427,6 +430,9 @@ struct _i386_insn
     /* Prefer the REX byte in encoding.  */
     bool rex_encoding;
 
+    /* Prefer the REX2 prefix in encoding.  */
+    bool rex2_encoding;
+
     /* Disable instruction size optimization.  */
     bool no_optimize;
 
@@ -1161,6 +1167,7 @@ static const arch_entry cpu_arch[] =
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
   VECARCH (avx10.1, AVX10_1, ANY_AVX512F, set),
   SUBARCH (user_msr, USER_MSR, USER_MSR, false),
+  SUBARCH (apx_f, APX_F, APX_F, false),
 };
 
 #undef SUBARCH
@@ -1667,6 +1674,7 @@ _is_cpu (const i386_cpu_attr *a, enum i386_cpu cpu)
     case CpuHLE:      return a->bitfield.cpuhle;
     case CpuAVX512F:  return a->bitfield.cpuavx512f;
     case CpuAVX512VL: return a->bitfield.cpuavx512vl;
+    case CpuAPX_F:    return a->bitfield.cpuapx_f;
     case Cpu64:       return a->bitfield.cpu64;
     case CpuNo64:     return a->bitfield.cpuno64;
     default:
@@ -2338,7 +2346,7 @@ register_number (const reg_entry *r)
   if (r->reg_flags & RegRex)
     nr += 8;
 
-  if (r->reg_flags & RegVRex)
+  if (r->reg_flags & (RegVRex | RegRex2))
     nr += 16;
 
   return nr;
@@ -3865,6 +3873,12 @@ is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || t->opcode_modifier.evex;
 }
 
+static INLINE bool
+is_apx_rex2_encoding (void)
+{
+  return i.rex2 || i.rex2_encoding;
+}
+
 static unsigned int
 get_broadcast_bytes (const insn_template *t, bool diag)
 {
@@ -4120,6 +4134,21 @@ build_evex_prefix (void)
     i.vex.bytes[3] |= i.mask.reg->reg_num;
 }
 
+/* Build (2 bytes) rex2 prefix.
+   | D5h |
+   | m | R4 X4 B4 | W R X B |
+*/
+static void
+build_rex2_prefix (void)
+{
+  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the same.  */
+  i.vex.length = 2;
+  i.vex.bytes[0] = 0xd5;
+  /* For the W R X B bits, the variables of rex prefix will be reused.  */
+  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
+		    | (i.rex2 << 4) | i.rex);
+}
+
 static void
 process_immext (void)
 {
@@ -4385,12 +4414,16 @@ optimize_encoding (void)
 	  i.suffix = 0;
 	  /* Convert to byte registers.  */
 	  if (i.types[1].bitfield.word)
-	    j = 16;
-	  else if (i.types[1].bitfield.dword)
+	    /* There are 40 8-bit registers.  */
 	    j = 32;
+	  else if (i.types[1].bitfield.dword)
+	    /* 32 8-bit registers + 32 16-bit registers.  */
+	    j = 64;
 	  else
-	    j = 48;
-	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
+	    /* 32 8-bit registers + 32 16-bit registers
+	       + 32 32-bit registers.  */
+	    j = 96;
+	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
 	    j += 8;
 	  i.op[1].regs -= j;
 	}
@@ -5275,6 +5308,9 @@ md_assemble (char *line)
 	case unsupported_syntax:
 	  err_msg = _("unsupported syntax");
 	  break;
+	case unsupported_EGPR_for_addressing:
+	  err_msg = _("unsupported extended GPR for addressing");
+	  break;
 	case unsupported:
 	  as_bad (_("unsupported instruction `%s'"),
 		  pass1_mnem ? pass1_mnem : insn_name (current_templates->start));
@@ -5322,6 +5358,9 @@ md_assemble (char *line)
 	case invalid_dest_and_src_register_set:
 	  err_msg = _("destination and source registers must be distinct");
 	  break;
+	case invalid_pseudo_prefix:
+	  err_msg = _("rex2 pseudo prefix cannot be used here");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -5576,6 +5615,13 @@ md_assemble (char *line)
 	  return;
 	}
 
+      /* Check for explicit REX2 prefix.  */
+      if (i.rex2_encoding)
+	{
+	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
+	  return;
+	}
+
       if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
@@ -5615,11 +5661,12 @@ md_assemble (char *line)
 	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
 	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && i.rex != 0))
+	  && (i.rex != 0 || i.rex2 != 0)))
     {
       int x;
 
-      i.rex |= REX_OPCODE;
+      if (!i.rex2)
+	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
 	  /* Look for 8 bit operand that uses old registers.  */
@@ -5630,7 +5677,7 @@ md_assemble (char *line)
 	      /* In case it is "hi" register, give up.  */
 	      if (i.op[x].regs->reg_num > 3)
 		as_bad (_("can't encode register '%s%s' in an "
-			  "instruction requiring REX prefix."),
+			  "instruction requiring REX/REX2 prefix."),
 			register_prefix, i.op[x].regs->reg_name);
 
 	      /* Otherwise it is equivalent to the extended register.
@@ -5642,11 +5689,11 @@ md_assemble (char *line)
 	}
     }
 
-  if (i.rex == 0 && i.rex_encoding)
+  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 && i.rex2_encoding))
     {
       /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
 	 that uses legacy register.  If it is "hi" register, don't add
-	 the REX_OPCODE byte.  */
+	 rex and rex2 prefix.  */
       int x;
       for (x = 0; x < 2; x++)
 	if (i.types[x].bitfield.class == Reg
@@ -5656,6 +5703,7 @@ md_assemble (char *line)
 	  {
 	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
 	    i.rex_encoding = false;
+	    i.rex2_encoding = false;
 	    break;
 	  }
 
@@ -5663,7 +5711,13 @@ md_assemble (char *line)
 	i.rex = REX_OPCODE;
     }
 
-  if (i.rex != 0)
+  if (i.rex2 != 0 || i.rex2_encoding)
+    {
+      build_rex2_prefix ();
+      /* The individual REX.RXBW bits got consumed.  */
+      i.rex &= REX_OPCODE;
+    }
+  else if (i.rex != 0)
     add_prefix (REX_OPCODE | i.rex);
 
   insert_lfence_before ();
@@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
 		  /* {rex} */
 		  i.rex_encoding = true;
 		  break;
+		case Prefix_REX2:
+		  /* {rex2} */
+		  i.rex2_encoding = true;
+		  break;
 		case Prefix_NoOptimize:
 		  /* {nooptimize} */
 		  i.no_optimize = true;
@@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
   return 0;
 }
 
+/* Check if Egprs operands are valid for the instruction.  */
+
+static int
+check_EgprOperands (const insn_template *t)
+{
+  if (!t->opcode_modifier.noegpr)
+    return 0;
+
+  for (unsigned int op = 0; op < i.operands; op++)
+    {
+      if (i.types[op].bitfield.class != Reg
+	  /* Special case for (%dx) while doing input/output op */
+	  || i.input_output_operand)
+	continue;
+
+      if (i.op[op].regs->reg_flags & RegRex2)
+	{
+	  i.error = register_type_mismatch;
+	  return 1;
+	}
+    }
+
+  if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
+      || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
+    {
+      i.error = unsupported_EGPR_for_addressing;
+      return 1;
+    }
+
+  /* Check pseudo prefix {rex2} are valid.  */
+  if (i.rex2_encoding)
+    {
+      i.error = invalid_pseudo_prefix;
+      return 1;
+    }
+
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7107,7 +7204,9 @@ match_template (char mnem_suffix)
       /* Do not verify operands when there are none.  */
       if (!t->operands)
 	{
-	  if (VEX_check_encoding (t))
+	  /* When there are no operands, we still need to use the
+	     check_EgprOperands function to check whether {rex2} is valid.  */
+	  if (VEX_check_encoding (t) || check_EgprOperands (t))
 	    {
 	      specific_error = progress (i.error);
 	      continue;
@@ -7443,6 +7542,13 @@ match_template (char mnem_suffix)
 	  continue;
 	}
 
+      /* Check if EGRPS operands(r16-r31) are valid.  */
+      if (check_EgprOperands (t))
+	{
+	  specific_error = progress (i.error);
+	  continue;
+	}
+
       /* Check whether to use the shorter VEX encoding for certain insns where
 	 the EVEX enconding comes first in the table.  This requires the respective
 	 AVX-* feature to be explicitly enabled.  */
@@ -8340,6 +8446,18 @@ static INLINE void set_rex_vrex (const reg_entry *r, unsigned int rex_bit,
 
   if (r->reg_flags & RegVRex)
     i.vrex |= rex_bit;
+
+  if (r->reg_flags & RegRex2)
+    i.rex2 |= rex_bit;
+}
+
+static INLINE void
+set_rex_rex2 (const reg_entry *r, unsigned int rex_bit)
+{
+  if ((r->reg_flags & RegRex) != 0)
+    i.rex |= rex_bit;
+  if ((r->reg_flags & RegRex2) != 0)
+    i.rex2 |= rex_bit;
 }
 
 static int
@@ -8823,8 +8941,7 @@ build_modrm_byte (void)
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
 		  i.types[op] = operand_type_and_not (i.types[op], anydisp);
 		  i.types[op].bitfield.disp32 = 1;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 	    }
 	  /* RIP addressing for 64bit mode.  */
@@ -8895,8 +9012,7 @@ build_modrm_byte (void)
 
 	      if (!i.tm.opcode_modifier.sib)
 		i.rm.regmem = i.base_reg->reg_num;
-	      if ((i.base_reg->reg_flags & RegRex) != 0)
-		i.rex |= REX_B;
+	      set_rex_rex2 (i.base_reg, REX_B);
 	      i.sib.base = i.base_reg->reg_num;
 	      /* x86-64 ignores REX prefix bit here to avoid decoder
 		 complications.  */
@@ -8934,8 +9050,7 @@ build_modrm_byte (void)
 		  else
 		    i.sib.index = i.index_reg->reg_num;
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 
 	      if (i.disp_operands
@@ -10080,6 +10195,12 @@ output_insn (void)
 	  for (j = ARRAY_SIZE (i.prefix), q = i.prefix; j > 0; j--, q++)
 	    if (*q)
 	      frag_opcode_byte (*q);
+
+	  if (is_apx_rex2_encoding ())
+	    {
+	      frag_opcode_byte (i.vex.bytes[0]);
+	      frag_opcode_byte (i.vex.bytes[1]);
+	    }
 	}
       else
 	{
@@ -14107,6 +14228,13 @@ static bool check_register (const reg_entry *r)
 	i.vec_encoding = vex_encoding_error;
     }
 
+  if (r->reg_flags & RegRex2)
+    {
+      if (!cpu_arch_flags.bitfield.cpuapx_f
+	  || flag_code != CODE_64BIT)
+	return false;
+    }
+
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
       && (!cpu_arch_flags.bitfield.cpu64
 	  || r->reg_type.bitfield.class != RegCR
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 03ee980bef7..53fc6fd6899 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
 @code{avx10.1/256},
 @code{avx10.1/128},
 @code{user_msr},
+@code{apx_f},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -983,6 +984,9 @@ Different encoding options can be specified via pseudo prefixes:
 instructions (x86-64 only).  Note that this differs from the @samp{rex}
 prefix which generates REX prefix unconditionally.
 
+@item
+@samp{@{rex2@}} -- encode with REX2 prefix
+
 @item
 @samp{@{nooptimize@}} -- disable instruction size optimization.
 @end itemize
@@ -1663,7 +1667,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
 @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
 @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
-@item @samp{.tlbsync}
+@item @samp{.tlbsync} @tab @samp{.apx_f}
 @end multitable
 
 Apart from the warning, there are only two other effects on
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
index a2b09d2e74f..56834371133 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
@@ -2,49 +2,4 @@
 #as: --32
 #objdump: -dw -Mx86-64 -Mintel
 #name: x86-64 (ILP32) illegal opcodes (Intel mode)
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <aaa>:
-[ 	]*[a-f0-9]+:	37                   	\(bad\)
-
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+7 <aam1>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+9 <aas>:
-[ 	]*[a-f0-9]+:	3f                   	\(bad\)
-
-0+a <bound>:
-[ 	]*[a-f0-9]+:	62                   	.byte 0x62
-[ 	]*[a-f0-9]+:	10                   	.byte 0x10
-
-0+c <daa>:
-[ 	]*[a-f0-9]+:	27                   	\(bad\)
-
-0+d <das>:
-[ 	]*[a-f0-9]+:	2f                   	\(bad\)
-
-0+e <into>:
-[ 	]*[a-f0-9]+:	ce                   	\(bad\)
-
-0+f <pusha>:
-[ 	]*[a-f0-9]+:	60                   	\(bad\)
-
-0+10 <popa>:
-[ 	]*[a-f0-9]+:	61                   	\(bad\)
-#pass
+#dump: ../x86-64-opcode-inval-intel.d
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
index 5a17b0b412e..b5233a5cf93 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
@@ -2,49 +2,4 @@
 #as: --32
 #objdump: -dw -Mx86-64
 #name: x86-64 (ILP32) illegal opcodes
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <aaa>:
-[ 	]*[a-f0-9]+:	37                   	\(bad\)
-
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+7 <aam1>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+9 <aas>:
-[ 	]*[a-f0-9]+:	3f                   	\(bad\)
-
-0+a <bound>:
-[ 	]*[a-f0-9]+:	62                   	.byte 0x62
-[ 	]*[a-f0-9]+:	10                   	.byte 0x10
-
-0+c <daa>:
-[ 	]*[a-f0-9]+:	27                   	\(bad\)
-
-0+d <das>:
-[ 	]*[a-f0-9]+:	2f                   	\(bad\)
-
-0+e <into>:
-[ 	]*[a-f0-9]+:	ce                   	\(bad\)
-
-0+f <pusha>:
-[ 	]*[a-f0-9]+:	60                   	\(bad\)
-
-0+10 <popa>:
-[ 	]*[a-f0-9]+:	61                   	\(bad\)
-#pass
+#dump: ../x86-64-opcode-inval.d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
new file mode 100644
index 00000000000..0aa079ca29c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -0,0 +1,15 @@
+.*: Assembler messages:
+.*:4: Error: bad register name `%r17d'
+.*:7: Error: unsupported extended GPR for addressing for `xsave'
+.*:8: Error: unsupported extended GPR for addressing for `xsave64'
+.*:9: Error: unsupported extended GPR for addressing for `xrstor'
+.*:10: Error: unsupported extended GPR for addressing for `xrstor64'
+.*:11: Error: unsupported extended GPR for addressing for `xsaves'
+.*:12: Error: unsupported extended GPR for addressing for `xsaves64'
+.*:13: Error: unsupported extended GPR for addressing for `xrstors'
+.*:14: Error: unsupported extended GPR for addressing for `xrstors64'
+.*:15: Error: unsupported extended GPR for addressing for `xsaveopt'
+.*:16: Error: unsupported extended GPR for addressing for `xsaveopt64'
+.*:17: Error: unsupported extended GPR for addressing for `xsavec'
+.*:18: Error: unsupported extended GPR for addressing for `xsavec64'
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
new file mode 100644
index 00000000000..c4d2308a604
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -0,0 +1,18 @@
+# Check Illegal 64bit APX_F instructions
+	.text
+	.arch .noapx_f
+	test    $0x7, %r17d
+	.arch .apx_f
+	test    $0x7, %r17d
+	xsave (%r16, %rbx)
+	xsave64 (%r16, %r31)
+	xrstor (%r16, %rbx)
+	xrstor64 (%r16, %rbx)
+	xsaves (%rbx, %r16)
+	xsaves64 (%r16, %rbx)
+	xrstors (%rbx, %r31)
+	xrstors64 (%r16, %rbx)
+	xsaveopt (%r16, %rbx)
+	xsaveopt64 (%r16, %r31)
+	xsavec (%r16, %rbx)
+	xsavec64 (%r16, %r31)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.d b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
new file mode 100644
index 00000000000..e3cd534da11
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
@@ -0,0 +1,83 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX_F use gpr32 with rex2 prefix
+#source: x86-64-apx-rex2.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f6 c0 07[	 ]+test   \$0x7,%r24b
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 19 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24
+[	 ]*[a-f0-9]+:[	 ]*66 d5 11 f7 c0 07 00[	 ]+test   \$0x7,%r24w
+[	 ]*[a-f0-9]+:[	 ]*44 0f af f8[	 ]+imul   %eax,%r15d
+[	 ]*[a-f0-9]+:[	 ]*d5 c0 af c0[	 ]+imul   %eax,%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 90 62 12[	 ]+punpckldq %mm2,\(%r18\)
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 00[	 ]+lea    \(%rax\),%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 08[	 ]+lea    \(%rax\),%r17d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 10[	 ]+lea    \(%rax\),%r18d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 18[	 ]+lea    \(%rax\),%r19d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 20[	 ]+lea    \(%rax\),%r20d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 28[	 ]+lea    \(%rax\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 30[	 ]+lea    \(%rax\),%r22d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 38[	 ]+lea    \(%rax\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 00[	 ]+lea    \(%rax\),%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 08[	 ]+lea    \(%rax\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 10[	 ]+lea    \(%rax\),%r26d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 18[	 ]+lea    \(%rax\),%r27d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 20[	 ]+lea    \(%rax\),%r28d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 28[	 ]+lea    \(%rax\),%r29d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 30[	 ]+lea    \(%rax\),%r30d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 38[	 ]+lea    \(%rax\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r16,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r17,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r18,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r19,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r21,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r22,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r23,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r24,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r25,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r26,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r27,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r28,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r29,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r30,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r31,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 00[	 ]+lea    \(%r16\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 01[	 ]+lea    \(%r17\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 02[	 ]+lea    \(%r18\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 03[	 ]+lea    \(%r19\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 04 24       	lea    \(%r20\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 45 00       	lea    0x0\(%r21\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 06[	 ]+lea    \(%r22\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 07[	 ]+lea    \(%r23\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 00[	 ]+lea    \(%r24\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 01[	 ]+lea    \(%r25\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 02[	 ]+lea    \(%r26\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 03[	 ]+lea    \(%r27\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 04 24       	lea    \(%r28\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 45 00       	lea    0x0\(%r29\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 06          	lea    \(%r30\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 07          	lea    \(%r31\),%eax
+[	 ]*[a-f0-9]+:[	 ]*4c 8d 38             	lea    \(%rax\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 48 8d 00          	lea    \(%rax\),%r16
+[	 ]*[a-f0-9]+:[	 ]*49 8d 07             	lea    \(%r15\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 18 8d 00          	lea    \(%r16\),%rax
+[	 ]*[a-f0-9]+:[	 ]*4a 8d 04 3d 00 00 00 00 	lea    0x0\(,%r15,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 28 8d 04 05 00 00 00 00 	lea    0x0\(,%r16,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 00          	add    \(%r16\),%r8
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 38          	add    \(%r16\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 0d 00 00 00 00 	mov    0x0\(,%r9,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 35 00 00 00 00 	mov    0x0\(,%r14,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 3a          	sub    \(%r10\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 7d 00       	sub    0x0\(%r13\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 30 8d 44 20 01    	lea    0x1\(%r16,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 76 8d 7c 20 01    	lea    0x1\(%r16,%r28,1\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 12 8d 84 04 81 00 00 00 	lea    0x81\(%r20,%r8,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 57 8d bc 04 81 00 00 00 	lea    0x81\(%r28,%r8,1\),%r31d
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.s b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
new file mode 100644
index 00000000000..543f0f573d4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
@@ -0,0 +1,86 @@
+# Check 64bit instructions with rex2 prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+         test	$0x7, %r24b
+         test	$0x7, %r24d
+         test	$0x7, %r24
+         test	$0x7, %r24w
+## REX2.M bit
+         imull	%eax, %r15d
+         imull	%eax, %r16d
+         punpckldq (%r18), %mm2
+## REX2.R4 bit
+         leal	(%rax), %r16d
+         leal	(%rax), %r17d
+         leal	(%rax), %r18d
+         leal	(%rax), %r19d
+         leal	(%rax), %r20d
+         leal	(%rax), %r21d
+         leal	(%rax), %r22d
+         leal	(%rax), %r23d
+         leal	(%rax), %r24d
+         leal	(%rax), %r25d
+         leal	(%rax), %r26d
+         leal	(%rax), %r27d
+         leal	(%rax), %r28d
+         leal	(%rax), %r29d
+         leal	(%rax), %r30d
+         leal	(%rax), %r31d
+## REX2.X4 bit
+         leal	(,%r16), %eax
+         leal	(,%r17), %eax
+         leal	(,%r18), %eax
+         leal	(,%r19), %eax
+         leal	(,%r20), %eax
+         leal	(,%r21), %eax
+         leal	(,%r22), %eax
+         leal	(,%r23), %eax
+         leal	(,%r24), %eax
+         leal	(,%r25), %eax
+         leal	(,%r26), %eax
+         leal	(,%r27), %eax
+         leal	(,%r28), %eax
+         leal	(,%r29), %eax
+         leal	(,%r30), %eax
+         leal	(,%r31), %eax
+## REX.B4 bit
+         leal	(%r16), %eax
+         leal	(%r17), %eax
+         leal	(%r18), %eax
+         leal	(%r19), %eax
+         leal	(%r20), %eax
+         leal	(%r21), %eax
+         leal	(%r22), %eax
+         leal	(%r23), %eax
+         leal	(%r24), %eax
+         leal	(%r25), %eax
+         leal	(%r26), %eax
+         leal	(%r27), %eax
+         leal	(%r28), %eax
+         leal	(%r29), %eax
+         leal	(%r30), %eax
+         leal	(%r31), %eax
+## REX.W bit
+         leaq	(%rax), %r15
+         leaq	(%rax), %r16
+         leaq	(%r15), %rax
+         leaq	(%r16), %rax
+         leaq	(,%r15), %rax
+         leaq	(,%r16), %rax
+## REX.R3 bit
+         add    (%r16), %r8
+         add    (%r16), %r15
+## REX.X3 bit
+         mov    (,%r9), %r16
+         mov    (,%r14), %r16
+## REX.B3 bit
+	 sub   (%r10), %r31
+	 sub   (%r13), %r31
+
+## SIB
+         leal	1(%r16, %r20), %eax
+         leal	1(%r16, %r28), %r31d
+         leal	129(%r20, %r8), %eax
+         leal	129(%r28, %r8), %r31d
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
index 13ad0fb768f..256e1b9a370 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
@@ -1,10 +1,16 @@
 .*: Assembler messages:
 .*:2: Error: .*
 .*:3: Error: .*
+.*:6: Error: .*
+.*:7: Error: .*
 GAS LISTING .*
 
 
 [ 	]*1[ 	]+\.text
 [ 	]*2[ 	]+\{disp16\} movb \(%ebp\),%al
 [ 	]*3[ 	]+\{disp16\} movb \(%rbp\),%al
+[ 	]*4[ 	]+
+[ 	]*5[ 	]+.*
+[ 	]*6[ 	]+\{rex2\} xsave \(%r15, %rbx\)
+[ 	]*7[ 	]+\{rex2\} xsave64 \(%r15, %rbx\)
 #...
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
index c10b14c2099..ae30476e500 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
@@ -1,4 +1,8 @@
 	.text
 	{disp16} movb (%ebp),%al
 	{disp16} movb (%rbp),%al
+
+	/* Instruction not support APX.  */
+	{rex2} xsave (%r15, %rbx)
+	{rex2} xsave64 (%r15, %rbx)
 	.p2align 4,0
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
index 6ee5b2f95ce..66c4d2cddc0 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
@@ -10,41 +10,33 @@ Disassembly of section .text:
 0+ <aaa>:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
+0+1 <aam0>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
-0+7 <aam1>:
+0+3 <aam1>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
-0+9 <aas>:
+0+5 <aas>:
 [ 	]*[a-f0-9]+:	3f                   	\(bad\)
 
-0+a <bound>:
+0+6 <bound>:
 [ 	]*[a-f0-9]+:	62                   	.byte 0x62
 [ 	]*[a-f0-9]+:	10                   	.byte 0x10
 
-0+c <daa>:
+0+8 <daa>:
 [ 	]*[a-f0-9]+:	27                   	\(bad\)
 
-0+d <das>:
+0+9 <das>:
 [ 	]*[a-f0-9]+:	2f                   	\(bad\)
 
-0+e <into>:
+0+a <into>:
 [ 	]*[a-f0-9]+:	ce                   	\(bad\)
 
-0+f <pusha>:
+0+b <pusha>:
 [ 	]*[a-f0-9]+:	60                   	\(bad\)
 
-0+10 <popa>:
+0+c <popa>:
 [ 	]*[a-f0-9]+:	61                   	\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
index 12f02c1766c..fbb850b56da 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
@@ -9,41 +9,33 @@ Disassembly of section .text:
 0+ <aaa>:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
+0+1 <aam0>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
-0+7 <aam1>:
+0+3 <aam1>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
-0+9 <aas>:
+0+5 <aas>:
 [ 	]*[a-f0-9]+:	3f                   	\(bad\)
 
-0+a <bound>:
+0+6 <bound>:
 [ 	]*[a-f0-9]+:	62                   	.byte 0x62
 [ 	]*[a-f0-9]+:	10                   	.byte 0x10
 
-0+c <daa>:
+0+8 <daa>:
 [ 	]*[a-f0-9]+:	27                   	\(bad\)
 
-0+d <das>:
+0+9 <das>:
 [ 	]*[a-f0-9]+:	2f                   	\(bad\)
 
-0+e <into>:
+0+a <into>:
 [ 	]*[a-f0-9]+:	ce                   	\(bad\)
 
-0+f <pusha>:
+0+b <pusha>:
 [ 	]*[a-f0-9]+:	60                   	\(bad\)
 
-0+10 <popa>:
+0+c <popa>:
 [ 	]*[a-f0-9]+:	61                   	\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.s b/gas/testsuite/gas/i386/x86-64-opcode-inval.s
index 6cbfe7705a8..fbcda3df773 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.s
@@ -2,10 +2,6 @@
 # All the followings are illegal opcodes for x86-64.
 aaa:
 	aaa
-aad0:
-	aad
-aad1:
-	aad $2
 aam0:
 	aam
 aam1:
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
index 3f9f67fcf4b..7e8c04d970b 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
@@ -1,6 +1,55 @@
 .*: Assembler messages:
-.*:3: Error: .*`vmovaps'.*
-.*:4: Error: .*`vmovaps'.*
-.*:5: Error: .*`vmovaps'.*
-.*:6: Error: .*`vmovaps'.*
-.*:7: Error: .*`rorx'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`rorx'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`xsave'.*
+.*:[0-9]+: Error: .*`xsaves'.*
+.*:[0-9]+: Error: .*`xsaves64'.*
+.*:[0-9]+: Error: .*`xsavec'.*
+.*:[0-9]+: Error: .*`xrstors'.*
+.*:[0-9]+: Error: .*`xrstors64'.*
+.*:[0-9]+: Error: .*`mov'.*
+.*:[0-9]+: Error: .*`movabs'.*
+.*:[0-9]+: Error: .*`cmps'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`movs'.*
+.*:[0-9]+: Error: .*`movs'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`jo'.*
+.*:[0-9]+: Error: .*`jno'.*
+.*:[0-9]+: Error: .*`jb'.*
+.*:[0-9]+: Error: .*`jae'.*
+.*:[0-9]+: Error: .*`je'.*
+.*:[0-9]+: Error: .*`jne'.*
+.*:[0-9]+: Error: .*`jbe'.*
+.*:[0-9]+: Error: .*`ja'.*
+.*:[0-9]+: Error: .*`js'.*
+.*:[0-9]+: Error: .*`jns'.*
+.*:[0-9]+: Error: .*`jp'.*
+.*:[0-9]+: Error: .*`jnp'.*
+.*:[0-9]+: Error: .*`jl'.*
+.*:[0-9]+: Error: .*`jge'.*
+.*:[0-9]+: Error: .*`jle'.*
+.*:[0-9]+: Error: .*`jg'.*
+.*:[0-9]+: Error: .*`in'.*
+.*:[0-9]+: Error: .*`in'.*
+.*:[0-9]+: Error: .*`out'.*
+.*:[0-9]+: Error: .*`out'.*
+.*:[0-9]+: Error: .*`jmp'.*
+.*:[0-9]+: Error: .*`loop'.*
+.*:[0-9]+: Error: .*`wrmsr'.*
+.*:[0-9]+: Error: .*`rdtsc'.*
+.*:[0-9]+: Error: .*`rdmsr'.*
+.*:[0-9]+: Error: .*`sysenter'.*
+.*:[0-9]+: Error: .*`sysexit'.*
+.*:[0-9]+: Error: .*`rdpmc'.*
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
index 3b923593a6a..c65b2dc848d 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
@@ -5,3 +5,61 @@ pseudos:
 	{rex} vmovaps %xmm7,%xmm2
 	{rex} vmovaps %xmm17,%xmm2
 	{rex} rorx $7,%eax,%ebx
+	{rex2} vmovaps %xmm7,%xmm2
+	{rex2} xsave (%rax)
+	{rex2} xsaves (%ecx)
+	{rex2} xsaves64 (%ecx)
+	{rex2} xsavec (%ecx)
+	{rex2} xrstors (%ecx)
+	{rex2} xrstors64 (%ecx)
+
+	#All opcodes in the row 0xa* prefixed REX2 are illegal.
+	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
+	{rex2} mov    0x90909090,%al
+	{rex2} movabs 0x1,%al
+	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
+	{rex2} lodsb
+	{rex2} lods   %ds:(%esi),%al
+	{rex2} lodsb   (%esi)
+	{rex2} movs
+	{rex2} movs   (%esi), (%edi)
+	{rex2} scasl
+	{rex2} scas   %es:(%edi),%eax
+	{rex2} scasb   (%edi)
+	{rex2} stosb
+	{rex2} stosb   (%edi)
+	{rex2} stos   %eax,%es:(%edi)
+
+	#All opcodes in the row 0x7* prefixed REX2 are illegal.
+	{rex2} jo     .+2-0x70
+	{rex2} jno    .+2-0x70
+	{rex2} jb     .+2-0x70
+	{rex2} jae    .+2-0x70
+	{rex2} je     .+2-0x70
+	{rex2} jne    .+2-0x70
+	{rex2} jbe    .+2-0x70
+	{rex2} ja     .+2-0x70
+	{rex2} js     .+2-0x70
+	{rex2} jns    .+2-0x70
+	{rex2} jp     .+2-0x70
+	{rex2} jnp    .+2-0x70
+	{rex2} jl     .+2-0x70
+	{rex2} jge    .+2-0x70
+	{rex2} jle    .+2-0x70
+	{rex2} jg     .+2-0x70
+
+	#All opcodes in the row 0x7* prefixed REX2 are illegal.
+	{rex2} in $0x90,%al
+	{rex2} in $0x90
+	{rex2} out $0x90,%al
+	{rex2} out $0x90
+	{rex2} jmp  *%eax
+	{rex2} loop foo
+
+	#All opcodes in the row 0xf3* prefixed REX2 are illegal.
+	{rex2} wrmsr
+	{rex2} rdtsc
+	{rex2} rdmsr
+	{rex2} sysenter
+	{rex2} sysexitl
+	{rex2} rdpmc
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 0cc75ef2457..708c22b5899 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -404,6 +404,18 @@ Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 d3 e0          	{rex2} shl %cl,%eax
+ +[a-f0-9]+:	d5 00 38 ca          	{rex2} cmp %cl,%dl
+ +[a-f0-9]+:	d5 00 b3 01          	{rex2} mov \$(0x)?1,%bl
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
@@ -458,6 +470,15 @@ Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index 08fac8381c6..29a0c3368fc 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -360,6 +360,19 @@ _start:
 	{rex} movaps (%r8),%xmm2
 	{rex} phaddw (%rcx),%mm0
 	{rex} phaddw (%r8),%mm0
+	{rex2} mov %al,%ah
+	{rex2} shl %cl, %eax
+	{rex2} cmp %cl, %dl
+	{rex2} mov $1, %bl
+	{rex2} movl %eax,%ebx
+	{rex2} movl %eax,%r14d
+	{rex2} movl %eax,(%r8)
+	{rex2} movaps %xmm7,%xmm2
+	{rex2} movaps %xmm7,%xmm12
+	{rex2} movaps (%rcx),%xmm2
+	{rex2} movaps (%r8),%xmm2
+	{rex2} pmullw %mm0,%mm6
+
 
 	movb (%rbp),%al
 	{disp8} movb (%rbp),%al
@@ -422,6 +435,15 @@ _start:
 	{rex} movaps xmm2,XMMWORD PTR [r8]
 	{rex} phaddw mm0,QWORD PTR [rcx]
 	{rex} phaddw mm0,QWORD PTR [r8]
+	{rex2} mov ah,al
+	{rex2} mov ebx,eax
+	{rex2} mov r14d,eax
+	{rex2} mov DWORD PTR [r8],eax
+	{rex2} movaps xmm2,xmm7
+	{rex2} movaps xmm12,xmm7
+	{rex2} movaps xmm2,XMMWORD PTR [rcx]
+	{rex2} movaps xmm2,XMMWORD PTR [r8]
+	{rex2} pmullw mm6,mm0
 
 	mov al, BYTE PTR [rbp]
 	{disp8} mov al, BYTE PTR [rbp]
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index a7f5547017f..2be0df0e981 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -363,6 +363,8 @@ run_dump_test "x86-64-avx512f-rcigrne-intel"
 run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
+run_list_test "x86-64-apx-egpr-inval"
+run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/include/opcode/i386.h b/include/opcode/i386.h
index dec7652c1cc..a6af3d54da0 100644
--- a/include/opcode/i386.h
+++ b/include/opcode/i386.h
@@ -112,6 +112,8 @@
 /* x86-64 extension prefix.  */
 #define REX_OPCODE	0x40
 
+#define REX2_OPCODE	0xd5
+
 /* Non-zero if OPCODE is the rex prefix.  */
 #define REX_PREFIX_P(opcode) (((opcode) & 0xf0) == REX_OPCODE)
 
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index e432b61a6cd..d402d575a3a 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -144,6 +144,11 @@ struct instr_info
   /* Bits of REX we've already used.  */
   uint8_t rex_used;
 
+  /* REX2 prefix for the current instruction use gpr32(r16-r31). */
+  unsigned char rex2;
+  /* Bits of REX2 we've already used.  */
+  unsigned char rex2_used;
+
   bool need_modrm;
   unsigned char need_vex;
   bool has_sib;
@@ -169,6 +174,7 @@ struct instr_info
   signed char last_data_prefix;
   signed char last_addr_prefix;
   signed char last_rex_prefix;
+  signed char last_rex2_prefix;
   signed char last_seg_prefix;
   signed char fwait_prefix;
   /* The active segment register prefix.  */
@@ -272,10 +278,18 @@ struct dis_private {
       ins->rex_used |= REX_OPCODE;			\
   }
 
+#define USED_REX2(value)				\
+  {							\
+    if ((ins->rex2 & value))				\
+      ins->rex2_used |= value;				\
+  }
 
 #define EVEX_b_used 1
 #define EVEX_len_used 2
 
+/* M0 in rex2 prefix represents map0 or map1.  */
+#define REX2_M 0x8
+
 /* Flags stored in PREFIXES.  */
 #define PREFIX_REPZ 1
 #define PREFIX_REPNZ 2
@@ -289,6 +303,7 @@ struct dis_private {
 #define PREFIX_DATA 0x200
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
+#define PREFIX_REX2 0x1000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -370,6 +385,7 @@ fetch_error (const instr_info *ins)
 #define PREFIX_IGNORED_DATA	(PREFIX_DATA << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_ADDR	(PREFIX_ADDR << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_LOCK	(PREFIX_LOCK << PREFIX_IGNORED_SHIFT)
+#define PREFIX_REX2_ILLEGAL	(PREFIX_REX2 << PREFIX_IGNORED_SHIFT)
 
 /* Opcode prefixes.  */
 #define PREFIX_OPCODE		(PREFIX_REPZ \
@@ -1888,23 +1904,23 @@ static const struct dis386 dis386[] = {
   { "outs{b|}",		{ indirDXr, Xb }, 0 },
   { X86_64_TABLE (X86_64_6F) },
   /* 70 */
-  { "joH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 78 */
-  { "jsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 80 */
   { REG_TABLE (REG_80) },
   { REG_TABLE (REG_81) },
@@ -1942,23 +1958,23 @@ static const struct dis386 dis386[] = {
   { "sahf",		{ XX }, 0 },
   { "lahf",		{ XX }, 0 },
   /* a0 */
-  { "mov%LB",		{ AL, Ob }, 0 },
-  { "mov%LS",		{ eAX, Ov }, 0 },
-  { "mov%LB",		{ Ob, AL }, 0 },
-  { "mov%LS",		{ Ov, eAX }, 0 },
-  { "movs{b|}",		{ Ybr, Xb }, 0 },
-  { "movs{R|}",		{ Yvr, Xv }, 0 },
-  { "cmps{b|}",		{ Xb, Yb }, 0 },
-  { "cmps{R|}",		{ Xv, Yv }, 0 },
+  { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
+  { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "movs{R|}",		{ Yvr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "cmps{b|}",		{ Xb, Yb }, PREFIX_REX2_ILLEGAL },
+  { "cmps{R|}",		{ Xv, Yv }, PREFIX_REX2_ILLEGAL },
   /* a8 */
-  { "testB",		{ AL, Ib }, 0 },
-  { "testS",		{ eAX, Iv }, 0 },
-  { "stosB",		{ Ybr, AL }, 0 },
-  { "stosS",		{ Yvr, eAX }, 0 },
-  { "lodsB",		{ ALr, Xb }, 0 },
-  { "lodsS",		{ eAXr, Xv }, 0 },
-  { "scasB",		{ AL, Yb }, 0 },
-  { "scasS",		{ eAX, Yv }, 0 },
+  { "testB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "testS",		{ eAX, Iv }, PREFIX_REX2_ILLEGAL },
+  { "stosB",		{ Ybr, AL }, PREFIX_REX2_ILLEGAL },
+  { "stosS",		{ Yvr, eAX }, PREFIX_REX2_ILLEGAL },
+  { "lodsB",		{ ALr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "lodsS",		{ eAXr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
+  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
   /* b0 */
   { "movB",		{ RMAL, Ib }, 0 },
   { "movB",		{ RMCL, Ib }, 0 },
@@ -2014,23 +2030,23 @@ static const struct dis386 dis386[] = {
   { FLOAT },
   { FLOAT },
   /* e0 */
-  { "loopneFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "loopeFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "loopFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "jEcxzH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "inB",		{ AL, Ib }, 0 },
-  { "inG",		{ zAX, Ib }, 0 },
-  { "outB",		{ Ib, AL }, 0 },
-  { "outG",		{ Ib, zAX }, 0 },
+  { "loopneFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "loopeFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "loopFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "jEcxzH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "inB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "inG",		{ zAX, Ib }, PREFIX_REX2_ILLEGAL },
+  { "outB",		{ Ib, AL }, PREFIX_REX2_ILLEGAL },
+  { "outG",		{ Ib, zAX }, PREFIX_REX2_ILLEGAL },
   /* e8 */
   { X86_64_TABLE (X86_64_E8) },
   { X86_64_TABLE (X86_64_E9) },
   { X86_64_TABLE (X86_64_EA) },
-  { "jmp",		{ Jb, BND }, 0 },
-  { "inB",		{ AL, indirDX }, 0 },
-  { "inG",		{ zAX, indirDX }, 0 },
-  { "outB",		{ indirDX, AL }, 0 },
-  { "outG",		{ indirDX, zAX }, 0 },
+  { "jmp",		{ Jb, BND }, PREFIX_REX2_ILLEGAL },
+  { "inB",		{ AL, indirDX }, PREFIX_REX2_ILLEGAL },
+  { "inG",		{ zAX, indirDX }, PREFIX_REX2_ILLEGAL },
+  { "outB",		{ indirDX, AL }, PREFIX_REX2_ILLEGAL },
+  { "outG",		{ indirDX, zAX }, PREFIX_REX2_ILLEGAL },
   /* f0 */
   { Bad_Opcode },	/* lock prefix */
   { "int1",		{ XX }, 0 },
@@ -2107,12 +2123,12 @@ static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F2E) },
   { PREFIX_TABLE (PREFIX_0F2F) },
   /* 30 */
-  { "wrmsr",		{ XX }, 0 },
-  { "rdtsc",		{ XX }, 0 },
-  { "rdmsr",		{ XX }, 0 },
-  { "rdpmc",		{ XX }, 0 },
-  { "sysenter",		{ SEP }, 0 },
-  { "sysexit%LQ",	{ SEP }, 0 },
+  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
+  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
   { Bad_Opcode },
   { "getsec",		{ XX }, 0 },
   /* 38 */
@@ -2197,23 +2213,23 @@ static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F7E) },
   { PREFIX_TABLE (PREFIX_0F7F) },
   /* 80 */
-  { "joH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jv, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 88 */
-  { "jsH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jv, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 90 */
   { "seto",		{ Eb }, 0 },
   { "setno",		{ Eb }, 0 },
@@ -2406,22 +2422,30 @@ static const char intel_index16[][6] = {
 
 static const char att_names64[][8] = {
   "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi",
-  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15"
+  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15",
+  "%r16", "%r17", "%r18", "%r19", "%r20", "%r21", "%r22", "%r23",
+  "%r24", "%r25", "%r26", "%r27", "%r28", "%r29", "%r30", "%r31"
 };
 static const char att_names32[][8] = {
   "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi",
-  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d"
+  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d",
+  "%r16d", "%r17d", "%r18d", "%r19d", "%r20d", "%r21d", "%r22d", "%r23d",
+  "%r24d", "%r25d", "%r26d", "%r27d", "%r28d", "%r29d", "%r30d", "%r31d"
 };
 static const char att_names16[][8] = {
   "%ax", "%cx", "%dx", "%bx", "%sp", "%bp", "%si", "%di",
-  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w"
+  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w",
+  "%r16w", "%r17w", "%r18w", "%r19w", "%r20w", "%r21w", "%r22w", "%r23w",
+  "%r24w", "%r25w", "%r26w", "%r27w", "%r28w", "%r29w", "%r30w", "%r31w"
 };
 static const char att_names8[][8] = {
   "%al", "%cl", "%dl", "%bl", "%ah", "%ch", "%dh", "%bh",
 };
 static const char att_names8rex[][8] = {
   "%al", "%cl", "%dl", "%bl", "%spl", "%bpl", "%sil", "%dil",
-  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b"
+  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b",
+  "%r16b", "%r17b", "%r18b", "%r19b", "%r20b", "%r21b", "%r22b", "%r23b",
+  "%r24b", "%r25b", "%r26b", "%r27b", "%r28b", "%r29b", "%r30b", "%r31b"
 };
 static const char att_names_seg[][4] = {
   "%es", "%cs", "%ss", "%ds", "%fs", "%gs", "%?", "%?",
@@ -2810,9 +2834,9 @@ static const struct dis386 reg_table[][8] = {
     { Bad_Opcode },
     { "cmpxchg8b", { { CMPXCHG8B_Fixup, q_mode } }, 0 },
     { Bad_Opcode },
-    { "xrstors", { FXSAVE }, 0 },
-    { "xsavec", { FXSAVE }, 0 },
-    { "xsaves", { FXSAVE }, 0 },
+    { "xrstors", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsavec", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsaves", { FXSAVE }, PREFIX_REX2_ILLEGAL },
     { MOD_TABLE (MOD_0FC7_REG_6) },
     { MOD_TABLE (MOD_0FC7_REG_7) },
   },
@@ -3384,7 +3408,7 @@ static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_4_MOD_0 */
   {
-    { "xsave",	{ FXSAVE }, 0 },
+    { "xsave",	{ FXSAVE }, PREFIX_REX2_ILLEGAL },
     { "ptwrite{%LQ|}", { Edq }, 0 },
   },
 
@@ -3402,7 +3426,7 @@ static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_6_MOD_0 */
   {
-    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE },
+    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { "clrssbsy",	{ Mq }, PREFIX_OPCODE },
     { "clwb",	{ Mb }, PREFIX_OPCODE },
   },
@@ -4196,19 +4220,19 @@ static const struct dis386 x86_64_table[][2] = {
 
   /* X86_64_E8 */
   {
-    { "callP",		{ Jv, BND }, 0 },
-    { "call@",		{ Jv, BND }, 0 }
+    { "callP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
+    { "call@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
   },
 
   /* X86_64_E9 */
   {
-    { "jmpP",		{ Jv, BND }, 0 },
-    { "jmp@",		{ Jv, BND }, 0 }
+    { "jmpP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
+    { "jmp@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
   },
 
   /* X86_64_EA */
   {
-    { "{l|}jmp{P|}", { Ap }, 0 },
+    { "{l|}jmp{P|}", { Ap }, PREFIX_REX2_ILLEGAL },
   },
 
   /* X86_64_0F00_REG_6 */
@@ -8184,7 +8208,7 @@ static const struct dis386 mod_table[][2] = {
   },
   {
     /* MOD_0FAE_REG_5 */
-    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE },
+    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { PREFIX_TABLE (PREFIX_0FAE_REG_5_MOD_3) },
   },
   {
@@ -8387,6 +8411,24 @@ ckprefix (instr_info *ins)
 	    return ckp_okay;
 	  ins->last_rex_prefix = i;
 	  break;
+	/* REX2 must be the last prefix. */
+	case 0xd5:
+	  if (ins->address_mode == mode_64bit)
+	    {
+	      if (ins->last_rex_prefix >= 0)
+		return ckp_bogus;
+
+	      ins->codep++;
+	      if (!fetch_code (ins->info, ins->codep + 1))
+		return ckp_fetch_error;
+	      unsigned char rex2_payload = *ins->codep;
+	      ins->rex2 = rex2_payload >> 4;
+	      ins->rex = (rex2_payload & 0xf) | REX_OPCODE;
+	      ins->codep++;
+	      ins->last_rex2_prefix = i;
+	      ins->all_prefixes[i] = REX2_OPCODE;
+	    }
+	  return ckp_okay;
 	case 0xf3:
 	  ins->prefixes |= PREFIX_REPZ;
 	  ins->last_repz_prefix = i;
@@ -8554,6 +8596,8 @@ prefix_name (enum address_mode mode, uint8_t pref, int sizeflag)
       return "bnd";
     case NOTRACK_PREFIX:
       return "notrack";
+    case REX2_OPCODE:
+      return "rex2";
     default:
       return NULL;
     }
@@ -9202,6 +9246,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     .last_data_prefix = -1,
     .last_addr_prefix = -1,
     .last_rex_prefix = -1,
+    .last_rex2_prefix = -1,
     .last_seg_prefix = -1,
     .fwait_prefix = -1,
   };
@@ -9366,13 +9411,18 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  if (*ins.codep == 0x0f)
+  /* REX2.M in rex2 prefix represents map0 or map1.  */
+  if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
     {
       unsigned char threebyte;
 
-      ins.codep++;
-      if (!fetch_code (info, ins.codep + 1))
-	goto fetch_error_out;
+      if (!ins.rex2)
+	{
+	  ins.codep++;
+	  if (!fetch_code (info, ins.codep + 1))
+	    goto fetch_error_out;
+	}
+
       threebyte = *ins.codep;
       dp = &dis386_twobyte[threebyte];
       ins.need_modrm = twobyte_has_modrm[threebyte];
@@ -9528,7 +9578,15 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  switch (dp->prefix_requirement)
+  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
+      && ins.last_rex2_prefix >= 0)
+    {
+      i386_dis_printf (info, dis_style_text, "(bad)");
+      ret = ins.end_codep - priv.the_buffer;
+      goto out;
+    }
+
+  switch (dp->prefix_requirement & ~PREFIX_REX2_ILLEGAL)
     {
     case PREFIX_DATA:
       /* If only the data prefix is marked as mandatory, its absence renders
@@ -9587,6 +9645,10 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       && !ins.need_vex && ins.last_rex_prefix >= 0)
     ins.all_prefixes[ins.last_rex_prefix] = 0;
 
+  /* Check if the REX2 prefix is used.  */
+  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 7))
+    ins.all_prefixes[ins.last_rex2_prefix] = 0;
+
   /* Check if the SEG prefix is used.  */
   if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
 		       | PREFIX_FS | PREFIX_GS)) != 0
@@ -9615,7 +9677,10 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
+	if (ins.all_prefixes[i] == REX2_OPCODE)
+	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
+	else
+	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
@@ -11160,8 +11225,11 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     ins->illegal_masking = true;
 
   USED_REX (rexmask);
+  USED_REX2 (rexmask);
   if (ins->rex & rexmask)
     reg += 8;
+  if (ins->rex2 & rexmask)
+    reg += 16;
 
   switch (bytemode)
     {
@@ -11169,7 +11237,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case b_swap_mode:
       if (reg & 4)
 	USED_REX (0);
-      if (ins->rex)
+      if (ins->rex || ins->rex2)
 	names = att_names8rex;
       else
 	names = att_names8;
@@ -11385,6 +11453,8 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
   int riprel = 0;
   int shift;
 
+  add += (ins->rex2 & REX_B) ? 16 : 0;
+
   if (ins->vex.evex)
     {
 
@@ -11489,6 +11559,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
     shift = 0;
 
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->intel_syntax)
     intel_operand_size (ins, bytemode, sizeflag);
   append_seg (ins);
@@ -11519,8 +11590,11 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	{
 	  vindex = ins->sib.index;
 	  USED_REX (REX_X);
+	  USED_REX2 (REX_X);
 	  if (ins->rex & REX_X)
 	    vindex += 8;
+	  if (ins->rex2 & REX_X)
+	    vindex += 16;
 	  switch (bytemode)
 	    {
 	    case vex_vsib_d_w_dq_mode:
@@ -11945,7 +12019,7 @@ static bool
 OP_REG (instr_info *ins, int code, int sizeflag)
 {
   const char *s;
-  int add;
+  int add = 0;
 
   switch (code)
     {
@@ -11956,10 +12030,11 @@ OP_REG (instr_info *ins, int code, int sizeflag)
     }
 
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->rex & REX_B)
     add = 8;
-  else
-    add = 0;
+  if (ins->rex2 & REX_B)
+    add += 16;
 
   switch (code)
     {
@@ -12671,8 +12746,11 @@ OP_EX (instr_info *ins, int bytemode, int sizeflag)
 
   reg = ins->modrm.rm;
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->rex & REX_B)
     reg += 8;
+  if (ins->rex2 & REX_B)
+    reg += 16;
   if (ins->vex.evex)
     {
       USED_REX (REX_X);
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 53cb700d0aa..6402b669d37 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -275,6 +275,8 @@ static const dependency isa_dependencies[] =
     "64" },
   { "USER_MSR",
     "64" },
+  { "APX_F",
+    "XSAVE|64" },
 };
 
 /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
@@ -397,6 +399,7 @@ static bitfield cpu_flags[] =
   BITFIELD (FRED),
   BITFIELD (LKGS),
   BITFIELD (USER_MSR),
+  BITFIELD (APX_F),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -486,6 +489,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (ATTSyntax),
   BITFIELD (IntelSyntax),
   BITFIELD (ISA64),
+  BITFIELD (NoEgpr),
 };
 
 #define CLASS(n) #n, n
@@ -1072,10 +1076,48 @@ get_element_size (char **opnd, int lineno)
   return elem_size;
 }
 
+static bool
+rex2_disallowed (const unsigned long long opcode, unsigned int space,
+			       const char *cpu_flags)
+{
+  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
+  if (strcmp (cpu_flags, "XSAVES") >= 0
+      || strcmp (cpu_flags, "XSAVEC") >= 0
+      || strcmp (cpu_flags, "Xsave") >= 0
+      || strcmp (cpu_flags, "Xsaveopt") >= 0)
+    return true;
+
+  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
+     are reserved under REX2 and triggers #UD when prefixed with REX2 */
+  if (space == 0)
+    switch (opcode >> 4)
+      {
+      case 0x4:
+      case 0x7:
+      case 0xA:
+      case 0xE:
+	return true;
+      default:
+	return false;
+    }
+
+  if (space == SPACE_0F)
+    switch (opcode >> 4)
+      {
+      case 0x3:
+      case 0x8:
+	return true;
+      default:
+	return false;
+      }
+
+  return false;
+}
+
 static void
 process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 			      unsigned int prefix, const char *extension_opcode,
-			      char **opnd, int lineno)
+			      char **opnd, int lineno, bool rex2_disallowed)
 {
   char *str, *next, *last;
   bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
@@ -1202,6 +1244,12 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 	  || modifiers[SAE].value))
     modifiers[EVex].value = EVEXDYN;
 
+  /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
+     For template supports both Vex and EVex allowing EGPR.  */
+  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
+      && !modifiers[EVex].value)
+    modifiers[NoEgpr].value = 1;
+
   output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
 }
 
@@ -1425,8 +1473,11 @@ output_i386_opcode (FILE *table, const char *name, char *str,
 	   ident, 2 * (int)length, opcode, end, i);
   free (ident);
 
+  /* Add some specilal handle for current entry.  */
+  bool  has_special_handle = rex2_disallowed (opcode, space, cpu_flags);
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
-				extension_opcode, operand_types, lineno);
+				extension_opcode, operand_types, lineno,
+				has_special_handle);
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 7bb8084b291..d28a4cedf0f 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -319,6 +319,8 @@ enum i386_cpu
   CpuAVX512F,
   /* Intel AVX-512 VL Instructions support required.  */
   CpuAVX512VL,
+  /* Intel APX_F Instructions support required.  */
+  CpuAPX_F,
   /* Not supported in the 64bit mode  */
   CpuNo64,
 
@@ -354,6 +356,7 @@ enum i386_cpu
 		   cpuhle:1, \
 		   cpuavx512f:1, \
 		   cpuavx512vl:1, \
+		   cpuapx_f:1, \
       /* NOTE: This field needs to remain last. */ \
 		   cpuno64:1
 
@@ -745,6 +748,11 @@ enum
 #define INTEL64		2
 #define INTEL64ONLY	3
   ISA64,
+
+  /* egprs (r16-r31) on instruction illegal. We also use it to judge
+     whether the instruction supports pseudo-prefix {rex2}.  */
+  NoEgpr,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -792,6 +800,7 @@ typedef struct i386_opcode_modifier
   unsigned int attsyntax:1;
   unsigned int intelsyntax:1;
   unsigned int isa64:2;
+  unsigned int noegpr:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -1006,7 +1015,8 @@ typedef struct insn_template
 #define Prefix_VEX3		6	/* {vex3} */
 #define Prefix_EVEX		7	/* {evex} */
 #define Prefix_REX		8	/* {rex} */
-#define Prefix_NoOptimize	9	/* {nooptimize} */
+#define Prefix_REX2		9	/* {rex2} */
+#define Prefix_NoOptimize	10	/* {nooptimize} */
 
   /* the bits in opcode_modifier are used to generate the final opcode from
      the base_opcode.  These bits also are used to detect alternate forms of
@@ -1033,6 +1043,7 @@ typedef struct
 #define RegRex	    0x1  /* Extended register.  */
 #define RegRex64    0x2  /* Extended 8 bit register.  */
 #define RegVRex	    0x4  /* Extended vector register.  */
+#define RegRex2	    0x8  /* Extended GPRs R16–R31 register.  */
   unsigned char reg_num;
 #define RegIP	((unsigned char ) ~0)
 /* EIZ and RIZ are fake index registers.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index c31bf20f2e6..cbf9d968fba 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -138,6 +138,7 @@
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
+
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
@@ -895,7 +896,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
 <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
 
 {<pseudopfx>}, PSEUDO_PREFIX/Prefix_<pseudopfx:ident>, <pseudopfx:cpu>, NoSuf|IsPrefix, {}
 
@@ -1428,16 +1429,17 @@ crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Uns
 
 // xsave/xrstor New Instructions.
 
-xsave, 0xfae/4, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xsave64, 0xfae/4, Xsave&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
-xrstor, 0xfae/5, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xrstor64, 0xfae/5, Xsave&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xsave, 0xfae/4, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xsave64, 0xfae/4, Xsave&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
+xrstor, 0xfae/5, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xrstor64, 0xfae/5, Xsave&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 xgetbv, 0xf01d0, Xsave, NoSuf, {}
 xsetbv, 0xf01d1, Xsave, NoSuf, {}
 
 // xsaveopt
-xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+
+xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // AES instructions.
 
@@ -2474,17 +2476,17 @@ clflushopt, 0x660fae/7, ClflushOpt, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex
 
 // XSAVES/XRSTORS instructions.
 
-xrstors, 0xfc7/3, XSAVES, Modrm|NoSuf, { Unspecified|BaseIndex }
-xrstors64, 0xfc7/3, XSAVES&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
-xsaves, 0xfc7/5, XSAVES, Modrm|NoSuf, { Unspecified|BaseIndex }
-xsaves64, 0xfc7/5, XSAVES&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xrstors, 0xfc7/3, XSAVES, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xrstors64, 0xfc7/3, XSAVES&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
+xsaves, 0xfc7/5, XSAVES, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xsaves64, 0xfc7/5, XSAVES&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // XSAVES instructions end.
 
 // XSAVEC instructions.
 
-xsavec, 0xfc7/4, XSAVEC, Modrm|NoSuf, { Unspecified|BaseIndex }
-xsavec64, 0xfc7/4, XSAVEC&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xsavec, 0xfc7/4, XSAVEC, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xsavec64, 0xfc7/4, XSAVEC&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // XSAVEC instructions end.
 
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index 2ac56e3fd0b..8fead35e320 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -43,6 +43,22 @@ r12b, Class=Reg|Byte, RegRex|RegRex64, 4, Dw2Inval, Dw2Inval
 r13b, Class=Reg|Byte, RegRex|RegRex64, 5, Dw2Inval, Dw2Inval
 r14b, Class=Reg|Byte, RegRex|RegRex64, 6, Dw2Inval, Dw2Inval
 r15b, Class=Reg|Byte, RegRex|RegRex64, 7, Dw2Inval, Dw2Inval
+r16b, Class=Reg|Byte, RegRex2|RegRex64, 0, Dw2Inval, Dw2Inval
+r17b, Class=Reg|Byte, RegRex2|RegRex64, 1, Dw2Inval, Dw2Inval
+r18b, Class=Reg|Byte, RegRex2|RegRex64, 2, Dw2Inval, Dw2Inval
+r19b, Class=Reg|Byte, RegRex2|RegRex64, 3, Dw2Inval, Dw2Inval
+r20b, Class=Reg|Byte, RegRex2|RegRex64, 4, Dw2Inval, Dw2Inval
+r21b, Class=Reg|Byte, RegRex2|RegRex64, 5, Dw2Inval, Dw2Inval
+r22b, Class=Reg|Byte, RegRex2|RegRex64, 6, Dw2Inval, Dw2Inval
+r23b, Class=Reg|Byte, RegRex2|RegRex64, 7, Dw2Inval, Dw2Inval
+r24b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 0, Dw2Inval, Dw2Inval
+r25b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 1, Dw2Inval, Dw2Inval
+r26b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 2, Dw2Inval, Dw2Inval
+r27b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 3, Dw2Inval, Dw2Inval
+r28b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 4, Dw2Inval, Dw2Inval
+r29b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 5, Dw2Inval, Dw2Inval
+r30b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 6, Dw2Inval, Dw2Inval
+r31b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 7, Dw2Inval, Dw2Inval
 // 16 bit regs
 ax, Class=Reg|Instance=Accum|Word, 0, 0, Dw2Inval, Dw2Inval
 cx, Class=Reg|Word, 0, 1, Dw2Inval, Dw2Inval
@@ -60,6 +76,22 @@ r12w, Class=Reg|Word, RegRex, 4, Dw2Inval, Dw2Inval
 r13w, Class=Reg|Word, RegRex, 5, Dw2Inval, Dw2Inval
 r14w, Class=Reg|Word, RegRex, 6, Dw2Inval, Dw2Inval
 r15w, Class=Reg|Word, RegRex, 7, Dw2Inval, Dw2Inval
+r16w, Class=Reg|Word, RegRex2, 0, Dw2Inval, Dw2Inval
+r17w, Class=Reg|Word, RegRex2, 1, Dw2Inval, Dw2Inval
+r18w, Class=Reg|Word, RegRex2, 2, Dw2Inval, Dw2Inval
+r19w, Class=Reg|Word, RegRex2, 3, Dw2Inval, Dw2Inval
+r20w, Class=Reg|Word, RegRex2, 4, Dw2Inval, Dw2Inval
+r21w, Class=Reg|Word, RegRex2, 5, Dw2Inval, Dw2Inval
+r22w, Class=Reg|Word, RegRex2, 6, Dw2Inval, Dw2Inval
+r23w, Class=Reg|Word, RegRex2, 7, Dw2Inval, Dw2Inval
+r24w, Class=Reg|Word, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25w, Class=Reg|Word, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26w, Class=Reg|Word, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27w, Class=Reg|Word, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28w, Class=Reg|Word, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29w, Class=Reg|Word, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30w, Class=Reg|Word, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31w, Class=Reg|Word, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 // 32 bit regs
 eax, Class=Reg|Instance=Accum|Dword|BaseIndex, 0, 0, 0, Dw2Inval
 ecx, Class=Reg|Instance=RegC|Dword|BaseIndex, 0, 1, 1, Dw2Inval
@@ -77,6 +109,22 @@ r12d, Class=Reg|Dword|BaseIndex, RegRex, 4, Dw2Inval, Dw2Inval
 r13d, Class=Reg|Dword|BaseIndex, RegRex, 5, Dw2Inval, Dw2Inval
 r14d, Class=Reg|Dword|BaseIndex, RegRex, 6, Dw2Inval, Dw2Inval
 r15d, Class=Reg|Dword|BaseIndex, RegRex, 7, Dw2Inval, Dw2Inval
+r16d, Class=Reg|Dword|BaseIndex, RegRex2, 0, Dw2Inval, Dw2Inval
+r17d, Class=Reg|Dword|BaseIndex, RegRex2, 1, Dw2Inval, Dw2Inval
+r18d, Class=Reg|Dword|BaseIndex, RegRex2, 2, Dw2Inval, Dw2Inval
+r19d, Class=Reg|Dword|BaseIndex, RegRex2, 3, Dw2Inval, Dw2Inval
+r20d, Class=Reg|Dword|BaseIndex, RegRex2, 4, Dw2Inval, Dw2Inval
+r21d, Class=Reg|Dword|BaseIndex, RegRex2, 5, Dw2Inval, Dw2Inval
+r22d, Class=Reg|Dword|BaseIndex, RegRex2, 6, Dw2Inval, Dw2Inval
+r23d, Class=Reg|Dword|BaseIndex, RegRex2, 7, Dw2Inval, Dw2Inval
+r24d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 rax, Class=Reg|Instance=Accum|Qword|BaseIndex, 0, 0, Dw2Inval, 0
 rcx, Class=Reg|Instance=RegC|Qword|BaseIndex, 0, 1, Dw2Inval, 2
 rdx, Class=Reg|Instance=RegD|Qword|BaseIndex, 0, 2, Dw2Inval, 1
@@ -93,6 +141,22 @@ r12, Class=Reg|Qword|BaseIndex, RegRex, 4, Dw2Inval, 12
 r13, Class=Reg|Qword|BaseIndex, RegRex, 5, Dw2Inval, 13
 r14, Class=Reg|Qword|BaseIndex, RegRex, 6, Dw2Inval, 14
 r15, Class=Reg|Qword|BaseIndex, RegRex, 7, Dw2Inval, 15
+r16, Class=Reg|Qword|BaseIndex, RegRex2, 0, Dw2Inval, 130
+r17, Class=Reg|Qword|BaseIndex, RegRex2, 1, Dw2Inval, 131
+r18, Class=Reg|Qword|BaseIndex, RegRex2, 2, Dw2Inval, 132
+r19, Class=Reg|Qword|BaseIndex, RegRex2, 3, Dw2Inval, 133
+r20, Class=Reg|Qword|BaseIndex, RegRex2, 4, Dw2Inval, 134
+r21, Class=Reg|Qword|BaseIndex, RegRex2, 5, Dw2Inval, 135
+r22, Class=Reg|Qword|BaseIndex, RegRex2, 6, Dw2Inval, 136
+r23, Class=Reg|Qword|BaseIndex, RegRex2, 7, Dw2Inval, 137
+r24, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, 138
+r25, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, 139
+r26, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, 140
+r27, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, 141
+r28, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, 142
+r29, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, 143
+r30, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, 144
+r31, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, 145
 // Vector mask registers.
 k0, Class=RegMask, 0, 0, 93, 118
 k1, Class=RegMask, 0, 1, 94, 119
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions.
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
  2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
@ 2023-11-24  7:02 ` Cui, Lili
  2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu

opcode/ChangeLog:

	* i386-dis-evex.hi: Added an empty EVEX_MAP4_ sub-table for
	legacy insn promote to EVEX insn.
	* opcodes/i386-dis-evex.h: Add EVEX_MAP4.
---
 opcodes/i386-dis-evex.h | 291 ++++++++++++++++++++++++++++++++++++++++
 opcodes/i386-dis.c      |   1 +
 2 files changed, 292 insertions(+)

diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index e6295119d2b..7ad1edbe72d 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -872,6 +872,297 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
   },
+  /* EVEX_MAP4_ */
+  {
+    /* 00 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 08 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 10 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 18 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 20 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 28 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 30 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 38 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 40 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 48 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 50 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 58 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 60 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 68 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 70 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 78 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 80 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 88 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 90 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* 98 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* A0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* A8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* B0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* B8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* C8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* D8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* E8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F0 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    /* F8 */
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+  },
   /* EVEX_MAP5_ */
   {
     /* 00 */
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index d402d575a3a..3f1a8644930 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -1297,6 +1297,7 @@ enum
   EVEX_0F = 0,
   EVEX_0F38,
   EVEX_0F3A,
+  EVEX_MAP4,
   EVEX_MAP5,
   EVEX_MAP6,
 };
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
  2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
  2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
@ 2023-11-24  7:02 ` Cui, Lili
  2023-12-07 12:38   ` Jan Beulich
                     ` (2 more replies)
  2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu

This patch adds non-ND, non-NF forms of EVEX promotion insn.

EVEX extension of legacy instructions:
  All promoted legacy instructions are placed in EVEX map 4, which is
  currently reserved.
EVEX extension of EVEX instructions:
  All existing EVEX instructions are extended by APX using the extended
  EVEX prefix, so that they can access all 32 GPRs.
EVEX extension of VEX instructions:
  Promoting a VEX instruction into the EVEX space does not change the map
  id, the opcode, or the operand encoding of the VEX instruction.

Note: The promoted versions of MOVBE will be extended to include the “MOVBE
  reg1, reg2”.

  gas/ChangeLog:

  2023-11-21  Lingling Kong <lingling.kong@intel.com>
	      H.J. Lu  <hongjiu.lu@intel.com>
	      Lili Cui <lili.cui@intel.com>
	      Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c (cpu_flags_not_or_check): Add a new
	function for APX cpu flag checking.
	(cpu_flags_match): handle cpu_flags_not_or_check.
	(install_template): Add AMX_TILE and APX combine.
	(is_any_apx_evex_encoding): Test apx evex encoding.
	(build_apx_evex_prefix): Enabe APX evex prefix.
	(md_assemble): Handle apx with evex encoding.
	(check_EgprOperands): Add nodgpr check for apx.
	(process_suffix): Handle apx map4 prefix.
	(check_register): Assign i.vec_encoding for APX evex instructions.
	* testsuite/gas/i386/x86-64-evex.d: Adjust test cases.
	* gas/testsuite/gas/i386/x86-64-inval-movbe.s: Ditto.
	* gas/testsuite/gas/i386/x86-64-inval-movbe.l: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-len.h: Handle EVEX_LEN_0F38F2, EVEX_LEN_0F38F3.
	* i386-dis-evex-prefix.h: Handle PREFIX_EVEX_0F38F2_L_0,
	PREFIX_EVEX_0F38F3_L_0, PREFIX_EVEX_MAP4_D8,
	PREFIX_EVEX_MAP4_DA, PREFIX_EVEX_MAP4_D8,
	PREFIX_EVEX_MAP4_DA, PREFIX_EVEX_MAP4_DB,
	PREFIX_EVEX_MAP4_DC, PREFIX_EVEX_MAP4_DD,
	PREFIX_EVEX_MAP4_DE, PREFIX_EVEX_MAP4_DF,
	PREFIX_EVEX_MAP4_F0, PREFIX_EVEX_MAP4_F1,
	PREFIX_EVEX_MAP4_F2, PREFIX_EVEX_MAP4_F8.
	* i386-dis-evex-reg.h: Handle REG_EVEX_0F38F3_L_0_P_0.
	* i386-dis-evex.h: Add EVEX_MAP4_ for legacy insn
	promote to apx to use gpr32
	* opcodes/i386-dis-evex-x86-64.h: Handle Add X86_64_EVEX_0F90,
	X86_64_EVEX_0F92, X86_64_EVEX_0F93, X86_64_EVEX_0F3849,
	X86_64_EVEX_0F384B, X86_64_EVEX_0F38E0, X86_64_EVEX_0F38E1,
	X86_64_EVEX_0F38E2, X86_64_EVEX_0F38E3, X86_64_EVEX_0F38E4,
	X86_64_EVEX_0F38E5, X86_64_EVEX_0F38E6, X86_64_EVEX_0F38E7,
	X86_64_EVEX_0F38E8, X86_64_EVEX_0F38E9, X86_64_EVEX_0F38EA,
	X86_64_EVEX_0F38EB, X86_64_EVEX_0F38EC, X86_64_EVEX_0F38ED,
	X86_64_EVEX_0F38EE, X86_64_EVEX_0F38EF, X86_64_EVEX_0F38F2,
	X86_64_EVEX_0F38F3, X86_64_EVEX_0F38F5, X86_64_EVEX_0F38F6,
	X86_64_EVEX_0F38F7, X86_64_EVEX_0F3AF0, X86_64_EVEX_0F91.
	* i386-dis.c
	(struct instr_info): Deleted bool r.
	(PREFIX_NP_OR_DATA): New.
	(NO_PREFIX): New.
	(putop): Ditto.
	(X86_64_EVEX_FROM_VEX_TABLE): Diito.
	(get_valid_dis386): Decode insn erex in extend evex prefix.
	Handle EVEX_MAP4
	(print_insn): Handle PREFIX_DATA_AND_NP_ONLY.
	(print_register): Handle apx instructions decode.
	(OP_E_memory): Diito.
	(OP_G): Diito.
	(OP_XMM): Diito.
	(DistinctDest_Fixup): Diito.
	* i386-gen.c (process_i386_opcode_modifier):
	* i386-opc.h (SPACE_EVEXMAP4): Add legacy insn
	promote to evex.
	* i386-opc.tbl: Handle some legacy and vex insns don't
	support gpr32. And add some legacy insn (map2 / 3) promote
	to evex.
---
 gas/config/tc-i386.c                 |  81 ++++++++++++---
 gas/testsuite/gas/i386/x86-64-evex.d |   2 +-
 gas/testsuite/gas/i386/x86-64.exp    |   2 +-
 opcodes/i386-dis-evex-len.h          |  10 ++
 opcodes/i386-dis-evex-prefix.h       |  66 +++++++++++++
 opcodes/i386-dis-evex-reg.h          |   7 ++
 opcodes/i386-dis-evex-x86-64.h       |  60 +++++++++++
 opcodes/i386-dis-evex.h              |  94 +++++++++---------
 opcodes/i386-dis.c                   | 142 +++++++++++++++++++++++----
 opcodes/i386-gen.c                   |   2 +
 opcodes/i386-opc.h                   |  10 ++
 opcodes/i386-opc.tbl                 |  90 +++++++++++------
 12 files changed, 455 insertions(+), 111 deletions(-)
 create mode 100644 opcodes/i386-dis-evex-x86-64.h

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 638d3aa07c8..ba8001fe1c8 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -409,6 +409,9 @@ struct _i386_insn
     /* Compressed disp8*N attribute.  */
     unsigned int memshift;
 
+    /* No CSPAZO flags update.*/
+    bool has_nf;
+
     /* Prefer load or store in encoding.  */
     enum
       {
@@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
 
   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
-  {
-      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
-	   || maybe_cpu (t, CpuFMA))
-	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
+    {
+      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
+	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
+	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
+	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
 	{
 	  if (need_evex_encoding ())
 	    {
@@ -3692,7 +3696,7 @@ install_template (const insn_template *t)
 		gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
 	    }
 	}
-  }
+    }
 
   /* Note that for pseudo prefixes this produces a length of 1. But for them
      the length isn't interesting at all.  */
@@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || t->opcode_modifier.evex;
 }
 
+static INLINE bool
+is_apx_evex_encoding (void)
+{
+  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
+    || (i.vex.register_specifier
+	&& i.vex.register_specifier->reg_flags & RegRex2);
+}
+
 static INLINE bool
 is_apx_rex2_encoding (void)
 {
@@ -4149,6 +4161,27 @@ build_rex2_prefix (void)
 		    | (i.rex2 << 4) | i.rex);
 }
 
+/* Build the EVEX prefix (4-byte) for evex insn
+   | 62h |
+   | `R`X`B`R' | B'mmm |
+   | W | v`v`v`v | `x' | pp |
+   | z| L'L | b | `v | aaa |
+*/
+static void
+build_apx_evex_prefix (void)
+{
+  build_evex_prefix ();
+  if (i.rex2 & REX_R)
+    i.vex.bytes[1] &= ~0x10;
+  if (i.rex2 & REX_B)
+    i.vex.bytes[1] |= 0x08;
+  if (i.rex2 & REX_X)
+    i.vex.bytes[2] &= ~0x04;
+  if (i.vex.register_specifier
+      && i.vex.register_specifier->reg_flags & RegRex2)
+    i.vex.bytes[3] &= ~0x08;
+}
+
 static void
 process_immext (void)
 {
@@ -5622,13 +5655,18 @@ md_assemble (char *line)
 	  return;
 	}
 
-      if (i.tm.opcode_modifier.vex)
+      if (is_apx_evex_encoding ())
+	build_apx_evex_prefix ();
+      else if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
 	build_evex_prefix ();
 
       /* The individual REX.RXBW bits got consumed.  */
       i.rex &= REX_OPCODE;
+
+      /* The rex2 bits got consumed.  */
+      i.rex2 = 0;
     }
 
   /* Handle conversion of 'int $3' --> special int3 insn.  */
@@ -5655,17 +5693,17 @@ md_assemble (char *line)
      instruction already has a prefix, we need to convert old
      registers to new ones.  */
 
-  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
-       && (i.op[0].regs->reg_flags & RegRex64) != 0)
-      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
-	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
-      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
-	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && (i.rex != 0 || i.rex2 != 0)))
+  if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
+	&& (i.op[0].regs->reg_flags & RegRex64) != 0)
+       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
+	   && (i.op[1].regs->reg_flags & RegRex64) != 0)
+       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
+	    || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
+	   && (i.rex != 0 || i.rex2 != 0))))
     {
       int x;
 
-      if (!i.rex2)
+      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
 	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
@@ -8020,7 +8058,8 @@ process_suffix (void)
       if (i.suffix != QWORD_MNEM_SUFFIX
 	  && i.tm.opcode_modifier.mnemonicsize != IGNORESIZE
 	  && !i.tm.opcode_modifier.floatmf
-	  && !is_any_vex_encoding (&i.tm)
+	  && (!is_any_vex_encoding (&i.tm)
+	      || i.tm.opcode_space == SPACE_EVEXMAP4)
 	  && ((i.suffix == LONG_MNEM_SUFFIX) == (flag_code == CODE_16BIT)
 	      || (flag_code == CODE_64BIT
 		  && i.tm.opcode_modifier.jump == JUMP_BYTE)))
@@ -8030,7 +8069,11 @@ process_suffix (void)
 	  if (i.tm.opcode_modifier.jump == JUMP_BYTE) /* jcxz, loop */
 	    prefix = ADDR_PREFIX_OPCODE;
 
-	  if (!add_prefix (prefix))
+	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
+	     needs to be adjusted.  */
+	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
+	    i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
+	  else if (!add_prefix (prefix))
 	    return 0;
 	}
 
@@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry *r)
       if (!cpu_arch_flags.bitfield.cpuapx_f
 	  || flag_code != CODE_64BIT)
 	return false;
+
+      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
+	 For the later install_template function.  */
+      if (current_templates->start->opcode_modifier.vex
+	  && current_templates->start->opcode_modifier.evex)
+	i.vec_encoding = vex_encoding_evex;
     }
 
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
diff --git a/gas/testsuite/gas/i386/x86-64-evex.d b/gas/testsuite/gas/i386/x86-64-evex.d
index 041747db892..5d974c312da 100644
--- a/gas/testsuite/gas/i386/x86-64-evex.d
+++ b/gas/testsuite/gas/i386/x86-64-evex.d
@@ -17,6 +17,6 @@ Disassembly of section .text:
  +[a-f0-9]+:	62 f1 d6 38 7b f0    	vcvtusi2ss %rax,\{rd-sae\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 57 38 7b f0    	vcvtusi2sd %eax,\{rd-bad\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 d7 38 7b f0    	vcvtusi2sd %rax,\{rd-sae\},%xmm5,%xmm6
- +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,\(bad\)
+ +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,%r16d
  +[a-f0-9]+:	62 e1 7c 08 c2 c0 00 	vcmpeqps %xmm0,%xmm0,\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 2be0df0e981..4a59a726ecb 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
 run_dump_test "x86-64-movbe"
 run_dump_test "x86-64-movbe-intel"
 run_dump_test "x86-64-movbe-suffix"
-run_list_test "x86-64-inval-movbe" "-al"
+run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -al"
 run_dump_test "x86-64-ept"
 run_dump_test "x86-64-ept-intel"
 run_list_test "x86-64-inval-ept" "-al"
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index a02609c50f2..ad59a559e0d 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -62,6 +62,16 @@ static const struct dis386 evex_len_table[][3] = {
     { REG_TABLE (REG_EVEX_0F38C7_L_2) },
   },
 
+  /* EVEX_LEN_0F38F2 */
+  {
+    { PREFIX_TABLE (PREFIX_EVEX_0F38F2_L_0) },
+  },
+
+  /* EVEX_LEN_0F38F3 */
+  {
+    { PREFIX_TABLE (PREFIX_EVEX_0F38F3_L_0) },
+  },
+
   /* EVEX_LEN_0F3A00 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 28da54922c7..b11b7adb443 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -285,6 +285,14 @@
     { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
     { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
   },
+  /* PREFIX_EVEX_0F38F2_L_0 */
+  {
+    { "andnS",	{ Gdq, VexGdq, Edq }, 0 },
+  },
+  /* PREFIX_EVEX_0F38F3_L_0 */
+  {
+    { REG_TABLE (REG_EVEX_0F38F3_L_0_P_0) },
+  },
   /* PREFIX_EVEX_0F3A08 */
   {
     { "vrndscalep%XH",  { XM, EXxh, EXxEVexS, Ib }, 0 },
@@ -338,6 +346,64 @@
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
     { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
   },
+  /* PREFIX_EVEX_MAP4_D8 */
+  {
+    { "sha1nexte", { XM, EXxmm }, 0 },
+    { REG_TABLE (REG_0F38D8_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DA */
+  {
+    { "sha1msg2", { XM, EXxmm }, 0 },
+    { "encodekey128", { Gd, Rd }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DB */
+  {
+    { "sha256rnds2", { XM, EXxmm, XMM0 }, 0 },
+    { "encodekey256", { Gd, Rd }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DC */
+  {
+    { "sha256msg1", { XM, EXxmm }, 0 },
+    { "aesenc128kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DD */
+  {
+    { "sha256msg2", { XM, EXxmm }, 0 },
+    { "aesdec128kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DE */
+  {
+    { Bad_Opcode },
+    { "aesenc256kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DF */
+  {
+    { Bad_Opcode },
+    { "aesdec256kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F0 */
+  {
+    { "crc32A", { Gdq, Eb }, 0 },
+    { "invept", { Gm, Mo }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F1 */
+  {
+    { "crc32Q", { Gdq, Ev }, 0 },
+    { "invvpid", { Gm, Mo }, 0 },
+    { "crc32Q", { Gdq, Ev }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F2 */
+  {
+    { Bad_Opcode },
+    { "invpcid", { Gm, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F8 */
+  {
+    { Bad_Opcode },
+    { "enqcmds", { Gva, M },  0 },
+    { "movdir64b", { Gva, M }, 0 },
+    { "enqcmd", { Gva, M }, 0 },
+  },
   /* PREFIX_EVEX_MAP5_10 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 2885063628b..8374f0ea93a 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -49,3 +49,10 @@
     { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
     { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
   },
+  /* REG_EVEX_0F38F3_L_0_P_0 */
+  {
+    { Bad_Opcode },
+    { "blsrS",	{ VexGdq, Edq }, 0 },
+    { "blsmskS",	{ VexGdq, Edq }, 0 },
+    { "blsiS",	{ VexGdq, Edq }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
new file mode 100644
index 00000000000..18c297feb86
--- /dev/null
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -0,0 +1,60 @@
+  /* X86_64_EVEX_0F90 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F90) },
+  },
+  /* X86_64_EVEX_0F91 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F91) },
+  },
+  /* X86_64_EVEX_0F92 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F92) },
+  },
+  /* X86_64_EVEX_0F93 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F93) },
+  },
+  /* X86_64_EVEX_0F3849 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
+  },
+  /* X86_64_EVEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },
+  },
+  /* X86_64_EVEX_0F38F2 */
+  {
+    { Bad_Opcode },
+    { EVEX_LEN_TABLE (EVEX_LEN_0F38F2) },
+  },
+  /* X86_64_EVEX_0F38F3 */
+  {
+    { Bad_Opcode },
+    { EVEX_LEN_TABLE (EVEX_LEN_0F38F3) },
+  },
+  /* X86_64_EVEX_0F38F5 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F5) },
+  },
+  /* X86_64_EVEX_0F38F6 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F6) },
+  },
+  /* X86_64_EVEX_0F38F7 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F7) },
+  },
+  /* X86_64_EVEX_0F3AF0 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3AF0) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 7ad1edbe72d..ea0a4c0b2a5 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -164,10 +164,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 90 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F90) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F91) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F92) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F93) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -375,9 +375,9 @@ static const struct dis386 evex_table[][256] = {
     { "vpsllv%DQ",	{ XM, Vex, EXx }, PREFIX_DATA },
     /* 48 */
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F384B) },
     { "vrcp14p%XW",	{ XM, EXx }, PREFIX_DATA },
     { "vrcp14s%XW",	{ XMScalar, VexScalar, EXdq }, PREFIX_DATA },
     { "vrsqrt14p%XW",	{ XM, EXx }, 0 },
@@ -545,32 +545,32 @@ static const struct dis386 evex_table[][256] = {
     { "%XEvaesdecY",	{ XM, Vex, EXx }, PREFIX_DATA },
     { "%XEvaesdeclastY", { XM, Vex, EXx }, PREFIX_DATA },
     /* E0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E0) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E1) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E3) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E4) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E7) },
     /* E8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E8) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E9) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EA) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EB) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EC) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38ED) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EE) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EF) },
     /* F0 */
     { Bad_Opcode },
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F3) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F7) },
     /* F8 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -854,7 +854,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3AF0) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -983,13 +983,13 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 60 */
+    { "movbeS",	{ Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "movbeS",	{ Ev, Gv }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "wrussK",	{ M, Gdq }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_0F38F6) },
     { Bad_Opcode },
     /* 68 */
     { Bad_Opcode },
@@ -1113,19 +1113,19 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "sha1rnds4",	{ XM, EXxmm, Ib }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* D8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_D8) },
+    { "sha1msg1",	{ XM, EXxmm }, NO_PREFIX },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DA) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DB) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DC) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DD) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DE) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DF) },
     /* E0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1145,20 +1145,20 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F0) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F1) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F2) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* F8 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
+    { "movdiri",	{ Mdq, Gdq }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_0F38FC) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 3f1a8644930..b81e75aa786 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -132,6 +132,13 @@ enum x86_64_isa
   intel64
 };
 
+enum evex_type
+{
+  evex_default = 0,
+  evex_from_legacy,
+  evex_from_vex,
+};
+
 struct instr_info
 {
   enum address_mode address_mode;
@@ -211,7 +218,6 @@ struct instr_info
     int ll;
     bool w;
     bool evex;
-    bool r;
     bool v;
     bool zeroing;
     bool b;
@@ -219,6 +225,8 @@ struct instr_info
   }
   vex;
 
+  enum evex_type evex_type;
+
   /* Remember if the current op is a jump instruction.  */
   bool op_is_jump;
 
@@ -304,6 +312,8 @@ struct dis_private {
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
 #define PREFIX_REX2 0x1000
+#define PREFIX_NP_OR_DATA 0x2000
+#define NO_PREFIX   0x4000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -801,6 +811,7 @@ enum
   USE_RM_TABLE,
   USE_PREFIX_TABLE,
   USE_X86_64_TABLE,
+  USE_X86_64_EVEX_FROM_VEX_TABLE,
   USE_3BYTE_TABLE,
   USE_XOP_8F_TABLE,
   USE_VEX_C4_TABLE,
@@ -819,6 +830,8 @@ enum
 #define RM_TABLE(I)		DIS386 (USE_RM_TABLE, (I))
 #define PREFIX_TABLE(I)		DIS386 (USE_PREFIX_TABLE, (I))
 #define X86_64_TABLE(I)		DIS386 (USE_X86_64_TABLE, (I))
+#define X86_64_EVEX_FROM_VEX_TABLE(I) \
+  DIS386 (USE_X86_64_EVEX_FROM_VEX_TABLE, (I))
 #define THREE_BYTE_TABLE(I)	DIS386 (USE_3BYTE_TABLE, (I))
 #define XOP_8F_TABLE()		DIS386 (USE_XOP_8F_TABLE, 0)
 #define VEX_C4_TABLE()		DIS386 (USE_VEX_C4_TABLE, 0)
@@ -879,7 +892,8 @@ enum
   REG_EVEX_0F72,
   REG_EVEX_0F73,
   REG_EVEX_0F38C6_L_2,
-  REG_EVEX_0F38C7_L_2
+  REG_EVEX_0F38C7_L_2,
+  REG_EVEX_0F38F3_L_0_P_0,
 };
 
 enum
@@ -1146,6 +1160,8 @@ enum
   PREFIX_EVEX_0F389B,
   PREFIX_EVEX_0F38AA,
   PREFIX_EVEX_0F38AB,
+  PREFIX_EVEX_0F38F2_L_0,
+  PREFIX_EVEX_0F38F3_L_0,
 
   PREFIX_EVEX_0F3A08,
   PREFIX_EVEX_0F3A0A,
@@ -1157,6 +1173,18 @@ enum
   PREFIX_EVEX_0F3A67,
   PREFIX_EVEX_0F3AC2,
 
+  PREFIX_EVEX_MAP4_D8,
+  PREFIX_EVEX_MAP4_DA,
+  PREFIX_EVEX_MAP4_DB,
+  PREFIX_EVEX_MAP4_DC,
+  PREFIX_EVEX_MAP4_DD,
+  PREFIX_EVEX_MAP4_DE,
+  PREFIX_EVEX_MAP4_DF,
+  PREFIX_EVEX_MAP4_F0,
+  PREFIX_EVEX_MAP4_F1,
+  PREFIX_EVEX_MAP4_F2,
+  PREFIX_EVEX_MAP4_F8,
+
   PREFIX_EVEX_MAP5_10,
   PREFIX_EVEX_MAP5_11,
   PREFIX_EVEX_MAP5_1D,
@@ -1268,7 +1296,21 @@ enum
   X86_64_VEX_0F38ED,
   X86_64_VEX_0F38EE,
   X86_64_VEX_0F38EF,
+
   X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
+
+  X86_64_EVEX_0F90,
+  X86_64_EVEX_0F91,
+  X86_64_EVEX_0F92,
+  X86_64_EVEX_0F93,
+  X86_64_EVEX_0F3849,
+  X86_64_EVEX_0F384B,
+  X86_64_EVEX_0F38F2,
+  X86_64_EVEX_0F38F3,
+  X86_64_EVEX_0F38F5,
+  X86_64_EVEX_0F38F6,
+  X86_64_EVEX_0F38F7,
+  X86_64_EVEX_0F3AF0,
 };
 
 enum
@@ -1453,6 +1495,8 @@ enum
   EVEX_LEN_0F385B,
   EVEX_LEN_0F38C6,
   EVEX_LEN_0F38C7,
+  EVEX_LEN_0F38F2,
+  EVEX_LEN_0F38F3,
   EVEX_LEN_0F3A00,
   EVEX_LEN_0F3A01,
   EVEX_LEN_0F3A18,
@@ -4524,10 +4568,11 @@ static const struct dis386 x86_64_table[][2] = {
 
   /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
   {
-    { Bad_Opcode },
-    { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
+      { Bad_Opcode },
+      { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
   },
 
+#include "i386-dis-evex-x86-64.h"
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8733,6 +8778,17 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       dp = &prefix_table[dp->op[1].bytemode][vindex];
       break;
 
+    case USE_X86_64_EVEX_FROM_VEX_TABLE:
+      ins->evex_type = evex_from_vex;
+      /* EVEX from evex instrucions require that EVEX.z, EVEX.L’L, EVEX.b and
+	 the lower 2 bits of EVEX.aaa must be 0.  */
+      if ((ins->vex.mask_register_specifier & 0x3) != 0
+	  || ins->vex.ll != 0
+	  || ins->vex.zeroing != 0
+	  || ins->vex.b)
+	return &bad_opcode;
+
+      /* Fall through.  */
     case USE_X86_64_TABLE:
       vindex = ins->address_mode == mode_64bit ? 1 : 0;
       dp = &x86_64_table[dp->op[1].bytemode][vindex];
@@ -8978,9 +9034,13 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       if (!fetch_code (ins->info, ins->codep + 4))
 	return &err_opcode;
       /* The first byte after 0x62.  */
+      if (*ins->codep & 0x8)
+	ins->rex2 |= REX_B;
+      if (!(*ins->codep & 0x10))
+	ins->rex2 |= REX_R;
+
       ins->rex = ~(*ins->codep >> 5) & 0x7;
-      ins->vex.r = *ins->codep & 0x10;
-      switch ((*ins->codep & 0xf))
+      switch ((*ins->codep & 0x7))
 	{
 	default:
 	  return &bad_opcode;
@@ -8993,6 +9053,12 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	case 0x3:
 	  vex_table_index = EVEX_0F3A;
 	  break;
+	case 0x4:
+	  vex_table_index = EVEX_MAP4;
+	  ins->evex_type = evex_from_legacy;
+	  if (ins->address_mode != mode_64bit)
+	    return &bad_opcode;
+	  break;
 	case 0x5:
 	  vex_table_index = EVEX_MAP5;
 	  break;
@@ -9009,9 +9075,8 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       ins->vex.register_specifier = (~(*ins->codep >> 3)) & 0xf;
 
-      /* The U bit.  */
       if (!(*ins->codep & 0x4))
-	return &bad_opcode;
+	ins->rex2 |= REX_X;
 
       switch ((*ins->codep & 0x3))
 	{
@@ -9041,12 +9106,24 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       if (ins->address_mode != mode_64bit)
 	{
+	  if (ins->evex_type != evex_default
+	      || (ins->rex2 & (REX_B | REX_X)))
+	    return &bad_opcode;
 	  /* In 16/32-bit mode silently ignore following bits.  */
 	  ins->rex &= ~REX_B;
-	  ins->vex.r = true;
+	  ins->rex2 &= ~REX_R;
 	}
 
       ins->need_vex = 4;
+
+      /* EVEX from legacy instructions require that EVEX.z, EVEX.L’L and the
+	 lower 2 bits of EVEX.aaa must be 0.  */
+      if (ins->evex_type == evex_from_legacy
+	  && ((ins->vex.mask_register_specifier & 0x3) != 0
+	      || ins->vex.ll != 0
+	      || ins->vex.zeroing != 0))
+	return &bad_opcode;
+
       ins->codep++;
       vindex = *ins->codep++;
       dp = &evex_table[vex_table_index][vindex];
@@ -9460,6 +9537,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       dp = get_valid_dis386 (dp, &ins);
       if (dp == &err_opcode)
 	goto fetch_error_out;
+
+      /* For APX instructions promoted from legacy maps 0/1, prefix
+	 0x66 is interpreted as the operand size override.  */
+      if (ins.evex_type == evex_from_legacy
+	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
+	sizeflag ^= DFLAG;
+
       if (dp != NULL && putop (&ins, dp->name, sizeflag) == 0)
 	{
 	  if (!get_sib (&ins, sizeflag))
@@ -9639,6 +9723,24 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       if (ins.last_repnz_prefix >= 0)
 	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
       break;
+
+    case PREFIX_NP_OR_DATA:
+      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
+
+    case NO_PREFIX:
+      if (ins.vex.prefix)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
     }
 
   /* Check if the REX prefix is used.  */
@@ -10344,7 +10446,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 		{
 		case 'X':
 		  if (!ins->vex.evex || ins->vex.b || ins->vex.ll >= 2
-		      || !ins->vex.r
+		      || (ins->rex2 & REX_R)
 		      || (ins->modrm.mod == 3 && (ins->rex & REX_X))
 		      || !ins->vex.v || ins->vex.mask_register_specifier)
 		    break;
@@ -11456,7 +11558,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
   add += (ins->rex2 & REX_B) ? 16 : 0;
 
-  if (ins->vex.evex)
+  if (ins->vex.evex && ins->evex_type == evex_default)
     {
 
       /* Zeroing-masking is invalid for memory destinations. Set the flag
@@ -11604,6 +11706,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		abort ();
 	      if (ins->vex.evex)
 		{
+		  /* S/G EVEX insns require EVEX.X4 not to be set.  */
+		  if (ins->rex2 & REX_X)
+		    {
+		      oappend (ins, "(bad)");
+		      return true;
+		    }
+
 		  if (!ins->vex.v)
 		    vindex += 16;
 		  check_gather = ins->obufp == ins->op_out[1];
@@ -11803,7 +11912,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
 	      if (ins->rex & REX_R)
 	        modrm_reg += 8;
-	      if (!ins->vex.r)
+	      if (ins->rex2 & REX_R)
 	        modrm_reg += 16;
 	      if (vindex == modrm_reg)
 		oappend (ins, "/(bad)");
@@ -12009,10 +12118,7 @@ OP_indirE (instr_info *ins, int bytemode, int sizeflag)
 static bool
 OP_G (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.evex && !ins->vex.r && ins->address_mode == mode_64bit)
-    oappend (ins, "(bad)");
-  else
-    print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
+  print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
   return true;
 }
 
@@ -12644,7 +12750,7 @@ OP_XMM (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
     reg += 8;
   if (ins->vex.evex)
     {
-      if (!ins->vex.r)
+      if (ins->rex2 & REX_R)
 	reg += 16;
     }
 
@@ -13652,7 +13758,7 @@ DistinctDest_Fixup (instr_info *ins, int bytemode, int sizeflag)
   /* Calc destination register number.  */
   if (ins->rex & REX_R)
     modrm_reg += 8;
-  if (!ins->vex.r)
+  if (ins->rex2 & REX_R)
     modrm_reg += 16;
 
   /* Calc src1 register number.  */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 6402b669d37..7dab744134f 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -490,6 +490,7 @@ static bitfield opcode_modifiers[] =
   BITFIELD (IntelSyntax),
   BITFIELD (ISA64),
   BITFIELD (NoEgpr),
+  BITFIELD (NF),
 };
 
 #define CLASS(n) #n, n
@@ -1127,6 +1128,7 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
     SPACE(0F),
     SPACE(0F38),
     SPACE(0F3A),
+    SPACE(EVEXMAP4),
     SPACE(EVEXMAP5),
     SPACE(EVEXMAP6),
     SPACE(VEXMAP7),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index d28a4cedf0f..88717fd7575 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -753,6 +753,9 @@ enum
      whether the instruction supports pseudo-prefix {rex2}.  */
   NoEgpr,
 
+  /* No CSPAZO flags update indication.  */
+  NF,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -801,6 +804,7 @@ typedef struct i386_opcode_modifier
   unsigned int intelsyntax:1;
   unsigned int isa64:2;
   unsigned int noegpr:1;
+  unsigned int nf:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -976,6 +980,7 @@ typedef struct insn_template
      1: 0F opcode prefix / space.
      2: 0F38 opcode prefix / space.
      3: 0F3A opcode prefix / space.
+     4: EVEXMAP4 opcode prefix / space.
      5: EVEXMAP5 opcode prefix / space.
      6: EVEXMAP6 opcode prefix / space.
      7: VEXMAP7 opcode prefix / space.
@@ -987,6 +992,7 @@ typedef struct insn_template
 #define SPACE_0F	1
 #define SPACE_0F38	2
 #define SPACE_0F3A	3
+#define SPACE_EVEXMAP4	4
 #define SPACE_EVEXMAP5	5
 #define SPACE_EVEXMAP6	6
 #define SPACE_VEXMAP7	7
@@ -1054,3 +1060,7 @@ typedef struct
 #define Dw2Inval (-1)
 }
 reg_entry;
+
+#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t, cpuid))
+#define AVX512F(cpuid) (maybe_cpu (t, CpuAVX512F) && maybe_cpu (t, cpuid))
+#define AVX512VL(cpuid) (maybe_cpu (t, CpuAVX512VL) && maybe_cpu (t, cpuid))
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index cbf9d968fba..b27131ef185 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -109,6 +109,7 @@
 #define SpaceXOP09 OpcodeSpace=SPACE_XOP09
 #define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
 
+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4
 #define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
 #define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
 
@@ -138,7 +139,6 @@
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
-
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
@@ -194,6 +194,7 @@ mov, 0xf24, i386&No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Te
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movbe, 0x60, Movbe&APX_F, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
 movsb, 0xfbe, i386, Modrm|No_bSuf|No_sSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
 <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:APX_F, nooptimize:NoOptimize:0>
 
 {<pseudopfx>}, PSEUDO_PREFIX/Prefix_<pseudopfx:ident>, <pseudopfx:cpu>, NoSuf|IsPrefix, {}
 
@@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
 
 invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // INVPCID instruction
 
 invpcid, 0x660f3882, INVPCID&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invpcid, 0x660f3882, INVPCID&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invpcid, 0xf3f2, INVPCID&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // SSSE3 instructions.
 
@@ -1426,6 +1430,8 @@ pcmpistri<sse42>, 0x660f3a63, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, Reg
 pcmpistrm<sse42>, 0x660f3a62, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 crc32, 0xf20f38f0, SSE4_2, W|Modrm|No_sSuf|No_qSuf, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
 crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
+crc32, 0xf0, APX_F, W|Modrm|No_sSuf|No_qSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
+crc32, 0xf0, APX_F, W|Modrm|No_wSuf|No_lSuf|No_sSuf|EVex128|EVexMap4, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
 
 // xsave/xrstor New Instructions.
 
@@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {}
 xsetbv, 0xf01d1, Xsave, NoSuf, {}
 
 // xsaveopt
-
 xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
 xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
@@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
 
 // BMI2 instructions.
 
-bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bzhi, 0xf5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+mulx, 0xf2f6, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+rorx, 0xf2f0, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // FMA4 instructions
 
@@ -1914,11 +1919,11 @@ lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|U
 
 // BMI instructions
 
-andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+andn, 0xf2, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+bextr, 0xf7, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 tzcnt, 0xf30fbc, BMI, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // TBM instructions
@@ -2047,13 +2052,20 @@ bndldx, 0x0f1a, MPX, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
 
 // SHA instructions.
 sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1rnds4, 0xd4, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1nexte, 0xf38c8, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1nexte, 0xd8, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg1, 0xf38c9, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg1, 0xd9, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg2, 0xf38ca, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg2, 0xda, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256rnds2, 0xdb, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg1, 0xf38cc, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg1, 0xdc, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg2, 0xdd, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
 // SHA512 instructions.
 
@@ -2112,9 +2124,9 @@ kor<bw>, 0x<bw:kpfx>45, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { R
 kxnor<bw>, 0x<bw:kpfx>46, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 kxor<bw>, 0x<bw:kpfx>47, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
-kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
-kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
-kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>, D|Modrm|Vex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
+kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>&(<bw:kcpu>|APX_F), Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
+kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>&(<bw:kcpu>|APX_F), Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
+kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>&(<bw:kcpu>|APX_F), D|Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
 
 knot<bw>, 0x<bw:kpfx>44, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
 kortest<bw>, 0x<bw:kpfx>98, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
@@ -2589,9 +2601,9 @@ vpmovzxdq, 0x6635, AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexW=1|Disp8MemShift
 kadd<dq>, 0x<dq:kpfx>4a, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kand<dq>, 0x<dq:kpfx>41, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
-kmov<dq>, 0x<dq:kpfx>90, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
-kmov<dq>, 0x<dq:kpfx>91, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
-kmov<dq>, 0xf292, AVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
+kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
+kmov<dq>, 0x<dq:kpfx>91, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0xf292, AVX512BW&(AVX512BW|APX_F), D|Modrm|Vex128|EVex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kortest<dq>, 0x<dq:kpfx>98, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
@@ -2990,9 +3002,13 @@ rdsspq, 0xf30f1e/1, SHSTK&x64, Modrm|NoSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, SHSTK, NoSuf, {}
 rstorssp, 0xf30f01/5, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrssd, 0x66, SHSTK&APX_F, Modrm|IgnoreSize|NoSuf|EVex128|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrssq, 0x0f38f6, SHSTK&x64, Modrm|NoSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x66, SHSTK&APX_F, Modrm|NoSuf|Size64|EVex128|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrussd, 0x6665, SHSTK&APX_F, Modrm|IgnoreSize|NoSuf|EVex128|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrussq, 0x660f38f5, SHSTK&x64, Modrm|NoSuf, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x6665, SHSTK&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, SHSTK, NoSuf, {}
 clrssbsy, 0xf30fae/6, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, IBT, NoSuf, {}
@@ -3040,7 +3056,9 @@ cldemote, 0x0f1c/0, CLDEMOTE, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 // MOVDIR[I,64B] instructions.
 
 movdiri, 0xf38f9, MOVDIRI, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+movdiri, 0xf9, MOVDIRI&APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 movdir64b, 0x660f38f8, MOVDIR64B, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movdir64b, 0x66f8, MOVDIR64B&APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // MOVEDIR instructions end.
 
@@ -3069,7 +3087,9 @@ vcvtneps2bf16<Vxy>, 0xf372, AVX_NE_CONVERT, Modrm|<Vxy:vex>|Space0F38|VexW0|NoSu
 // ENQCMD instructions.
 
 enqcmd, 0xf20f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmd, 0xf2f8, ENQCMD&(ENQCMD|APX_F), Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 enqcmds, 0xf30f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmds, 0xf3f8, ENQCMD&(ENQCMD|APX_F), Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // ENQCMD instructions end.
 
@@ -3130,8 +3150,8 @@ xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
 
 // AMX instructions.
 
-ldtilecfg, 0x49/0, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
-sttilecfg, 0x6649/0, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+ldtilecfg, 0x49/0, AMX_TILE&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+sttilecfg, 0x6649/0, AMX_TILE&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 
 tcmmimfp16ps, 0x666c, AMX_COMPLEX, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tcmmrlfp16ps, 0x6c, AMX_COMPLEX, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
@@ -3143,9 +3163,9 @@ tdpbuud, 0x5e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf,
 tdpbusd, 0x665e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tdpbsud, 0xf35e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
-tileloadd, 0xf24b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tileloaddt1, 0x664b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tilestored, 0xf34b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
+tileloadd, 0xf24b, AMX_TILE&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 0x664b, AMX_TILE&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 0xf34b, AMX_TILE&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
 
 tilerelease, 0x49c0, AMX_TILE, Vex128|Space0F38|VexW0|NoSuf, {}
 
@@ -3157,15 +3177,25 @@ tilezero, 0xf249, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
 
 loadiwkey, 0xf30f38dc, KL, Load|Modrm|NoSuf, { RegXMM, RegXMM }
 encodekey128, 0xf30f38fa, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey128, 0xf3da, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg32, Reg32 }
 encodekey256, 0xf30f38fb, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey256, 0xf3db, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg32, Reg32 }
 aesenc128kl, 0xf30f38dc, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc128kl, 0xf3dc, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec128kl, 0xf30f38dd, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec128kl, 0xf3dd, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesenc256kl, 0xf30f38de, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc256kl, 0xf3de, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec256kl, 0xf30f38df, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec256kl, 0xf3df, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesencwide128kl, 0xf30f38d8/0, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide128kl, 0xf3d8/0, WideKL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide128kl, 0xf30f38d8/1, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide128kl, 0xf3d8/1, WideKL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesencwide256kl, 0xf30f38d8/2, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide256kl, 0xf3d8/2, WideKL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide256kl, 0xf30f38d8/3, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide256kl, 0xf3d8/3, WideKL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 
 // KEYLOCKER instructions end.
 
@@ -3313,7 +3343,7 @@ prefetchit1, 0xf18/6, PREFETCHI, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 
 // CMPCCXADD instructions.
 
-cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD&(CMPCCXADD|APX_F), Modrm|Vex|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // CMPCCXADD instructions end.
 
@@ -3333,9 +3363,13 @@ wrmsrlist, 0xf30f01c6, MSRLIST, NoSuf, {}
 // RAO-INT instructions.
 
 aadd, 0xf38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aadd, 0xfc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aand, 0x660f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aand, 0x66fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aor, 0xf20f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aor, 0xf2fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 axor, 0xf30f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+axor, 0xf3fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // RAO-INT instructions end.
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 5/9] Add tests for APX GPR32 with extend evex prefix
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
                   ` (2 preceding siblings ...)
  2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
@ 2023-11-24  7:02 ` Cui, Lili
  2023-12-07 14:05   ` Jan Beulich
  2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu

gas/ChangeLog:

2023-11-21  Lingling Kong <lingling.kong@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Lili Cui <lili.cui@intel.com>
	    Lin Hu   <lin1.hu@intel.com>

	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: Add some insn don't
	support gpr32.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-inval-movbe.l: And .noapx_f for movbe
	reg to reg.
	* testsuite/gas/i386/x86-64-inval-movbe.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Add new test.
	* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-egpr.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-egpr.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted.d: New test.
	* testsuite/gas/i386/x86-64-apx-evex-promoted.s: New test.
---
 .../gas/i386/x86-64-apx-egpr-inval.l          | 188 +++++++++++
 .../gas/i386/x86-64-apx-egpr-inval.s          | 194 ++++++++++-
 .../gas/i386/x86-64-apx-egpr-promote-inval.l  |  20 ++
 .../gas/i386/x86-64-apx-egpr-promote-inval.s  |  29 ++
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d |  20 ++
 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s |  21 ++
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  30 ++
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  28 ++
 .../gas/i386/x86-64-apx-evex-promoted-intel.d | 318 ++++++++++++++++++
 .../gas/i386/x86-64-apx-evex-promoted.d       | 318 ++++++++++++++++++
 .../gas/i386/x86-64-apx-evex-promoted.s       | 314 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |   5 +
 12 files changed, 1484 insertions(+), 1 deletion(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s

diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
index 0aa079ca29c..292d6d63880 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -12,4 +12,192 @@
 .*:16: Error: unsupported extended GPR for addressing for `xsaveopt64'
 .*:17: Error: unsupported extended GPR for addressing for `xsavec'
 .*:18: Error: unsupported extended GPR for addressing for `xsavec64'
+.*:20: Error: unsupported extended GPR for addressing for `blendpd'
+.*:21: Error: unsupported extended GPR for addressing for `blendps'
+.*:22: Error: unsupported extended GPR for addressing for `blendvpd'
+.*:23: Error: unsupported extended GPR for addressing for `blendvpd'
+.*:24: Error: unsupported extended GPR for addressing for `blendvps'
+.*:25: Error: unsupported extended GPR for addressing for `blendvps'
+.*:26: Error: unsupported extended GPR for addressing for `dppd'
+.*:27: Error: unsupported extended GPR for addressing for `dpps'
+.*:28: Error: register type mismatch for `extractps'
+.*:29: Error: unsupported extended GPR for addressing for `extractps'
+.*:30: Error: unsupported extended GPR for addressing for `insertps'
+.*:31: Error: unsupported extended GPR for addressing for `movntdqa'
+.*:32: Error: unsupported extended GPR for addressing for `mpsadbw'
+.*:33: Error: unsupported extended GPR for addressing for `pabsb'
+.*:34: Error: unsupported extended GPR for addressing for `pabsd'
+.*:35: Error: unsupported extended GPR for addressing for `pabsw'
+.*:36: Error: unsupported extended GPR for addressing for `packusdw'
+.*:37: Error: unsupported extended GPR for addressing for `palignr'
+.*:38: Error: unsupported extended GPR for addressing for `pblendvb'
+.*:39: Error: unsupported extended GPR for addressing for `pblendvb'
+.*:40: Error: unsupported extended GPR for addressing for `pblendw'
+.*:41: Error: unsupported extended GPR for addressing for `pcmpeqq'
+.*:42: Error: unsupported extended GPR for addressing for `pcmpestri'
+.*:43: Error: unsupported extended GPR for addressing for `pcmpestrm'
+.*:44: Error: unsupported extended GPR for addressing for `pcmpgtq'
+.*:45: Error: unsupported extended GPR for addressing for `pcmpistri'
+.*:46: Error: unsupported extended GPR for addressing for `pcmpistrm'
+.*:47: Error: register type mismatch for `pextrb'
+.*:48: Error: unsupported extended GPR for addressing for `pextrb'
+.*:49: Error: unsupported extended GPR for addressing for `pextrd'
+.*:50: Error: unsupported extended GPR for addressing for `pextrq'
+.*:51: Error: unsupported extended GPR for addressing for `pextrw'
+.*:52: Error: unsupported extended GPR for addressing for `phaddd'
+.*:53: Error: unsupported extended GPR for addressing for `phaddsw'
+.*:54: Error: unsupported extended GPR for addressing for `phaddw'
+.*:55: Error: unsupported extended GPR for addressing for `phminposuw'
+.*:56: Error: unsupported extended GPR for addressing for `phsubw'
+.*:57: Error: register type mismatch for `pinsrb'
+.*:58: Error: unsupported extended GPR for addressing for `pinsrb'
+.*:59: Error: register type mismatch for `pinsrd'
+.*:60: Error: unsupported extended GPR for addressing for `pinsrd'
+.*:61: Error: register type mismatch for `pinsrq'
+.*:62: Error: unsupported extended GPR for addressing for `pinsrq'
+.*:63: Error: unsupported extended GPR for addressing for `pmaddubsw'
+.*:64: Error: unsupported extended GPR for addressing for `pmaxsb'
+.*:65: Error: unsupported extended GPR for addressing for `pmaxsd'
+.*:66: Error: unsupported extended GPR for addressing for `pmaxud'
+.*:67: Error: unsupported extended GPR for addressing for `pmaxuw'
+.*:68: Error: unsupported extended GPR for addressing for `pminsb'
+.*:69: Error: unsupported extended GPR for addressing for `pminsd'
+.*:70: Error: unsupported extended GPR for addressing for `pminud'
+.*:71: Error: unsupported extended GPR for addressing for `pminuw'
+.*:72: Error: unsupported extended GPR for addressing for `pmovsxbd'
+.*:73: Error: unsupported extended GPR for addressing for `pmovsxbq'
+.*:74: Error: unsupported extended GPR for addressing for `pmovsxbw'
+.*:75: Error: unsupported extended GPR for addressing for `pmovsxbw'
+.*:76: Error: unsupported extended GPR for addressing for `pmovsxdq'
+.*:77: Error: unsupported extended GPR for addressing for `pmovsxwd'
+.*:78: Error: unsupported extended GPR for addressing for `pmovsxwq'
+.*:79: Error: unsupported extended GPR for addressing for `pmovzxbd'
+.*:80: Error: unsupported extended GPR for addressing for `pmovzxbq'
+.*:81: Error: unsupported extended GPR for addressing for `pmovzxdq'
+.*:82: Error: unsupported extended GPR for addressing for `pmovzxwd'
+.*:83: Error: unsupported extended GPR for addressing for `pmovzxwq'
+.*:84: Error: unsupported extended GPR for addressing for `pmuldq'
+.*:85: Error: unsupported extended GPR for addressing for `pmulhrsw'
+.*:86: Error: unsupported extended GPR for addressing for `pmulld'
+.*:87: Error: unsupported extended GPR for addressing for `pshufb'
+.*:88: Error: unsupported extended GPR for addressing for `psignb'
+.*:89: Error: unsupported extended GPR for addressing for `psignd'
+.*:90: Error: unsupported extended GPR for addressing for `psignw'
+.*:91: Error: unsupported extended GPR for addressing for `roundpd'
+.*:92: Error: unsupported extended GPR for addressing for `roundps'
+.*:93: Error: unsupported extended GPR for addressing for `roundsd'
+.*:94: Error: unsupported extended GPR for addressing for `roundss'
+.*:96: Error: unsupported extended GPR for addressing for `aesdec'
+.*:97: Error: unsupported extended GPR for addressing for `aesdeclast'
+.*:98: Error: unsupported extended GPR for addressing for `aesenc'
+.*:99: Error: unsupported extended GPR for addressing for `aesenclast'
+.*:100: Error: unsupported extended GPR for addressing for `aesimc'
+.*:101: Error: unsupported extended GPR for addressing for `aeskeygenassist'
+.*:102: Error: unsupported extended GPR for addressing for `pclmulhqhqdq'
+.*:103: Error: unsupported extended GPR for addressing for `pclmulhqlqdq'
+.*:104: Error: unsupported extended GPR for addressing for `pclmullqhqdq'
+.*:105: Error: unsupported extended GPR for addressing for `pclmullqlqdq'
+.*:106: Error: unsupported extended GPR for addressing for `pclmulqdq'
+.*:108: Error: unsupported extended GPR for addressing for `gf2p8affineinvqb'
+.*:109: Error: unsupported extended GPR for addressing for `gf2p8affineqb'
+.*:110: Error: unsupported extended GPR for addressing for `gf2p8mulb'
+.*:112: Error: unsupported extended GPR for addressing for `vaesimc'
+.*:113: Error: unsupported extended GPR for addressing for `vaeskeygenassist'
+.*:114: Error: unsupported extended GPR for addressing for `vblendpd'
+.*:115: Error: unsupported extended GPR for addressing for `vblendpd'
+.*:116: Error: unsupported extended GPR for addressing for `vblendps'
+.*:117: Error: unsupported extended GPR for addressing for `vblendps'
+.*:118: Error: unsupported extended GPR for addressing for `vblendvpd'
+.*:119: Error: unsupported extended GPR for addressing for `vblendvpd'
+.*:120: Error: unsupported extended GPR for addressing for `vblendvps'
+.*:121: Error: unsupported extended GPR for addressing for `vblendvps'
+.*:122: Error: unsupported extended GPR for addressing for `vdppd'
+.*:123: Error: unsupported extended GPR for addressing for `vdpps'
+.*:124: Error: unsupported extended GPR for addressing for `vdpps'
+.*:125: Error: unsupported extended GPR for addressing for `vhaddpd'
+.*:126: Error: unsupported extended GPR for addressing for `vhaddpd'
+.*:127: Error: unsupported extended GPR for addressing for `vhsubps'
+.*:128: Error: unsupported extended GPR for addressing for `vhsubps'
+.*:129: Error: unsupported extended GPR for addressing for `vlddqu'
+.*:130: Error: unsupported extended GPR for addressing for `vlddqu'
+.*:131: Error: unsupported extended GPR for addressing for `vldmxcsr'
+.*:132: Error: unsupported extended GPR for addressing for `vmaskmovpd'
+.*:133: Error: unsupported extended GPR for addressing for `vmaskmovpd'
+.*:134: Error: unsupported extended GPR for addressing for `vmaskmovpd'
+.*:135: Error: unsupported extended GPR for addressing for `vmaskmovpd'
+.*:136: Error: unsupported extended GPR for addressing for `vmaskmovps'
+.*:137: Error: unsupported extended GPR for addressing for `vmaskmovps'
+.*:138: Error: unsupported extended GPR for addressing for `vmaskmovps'
+.*:139: Error: unsupported extended GPR for addressing for `vmaskmovps'
+.*:140: Error: register type mismatch for `vmovmskpd'
+.*:141: Error: register type mismatch for `vmovmskpd'
+.*:142: Error: register type mismatch for `vmovmskps'
+.*:143: Error: register type mismatch for `vmovmskps'
+.*:144: Error: unsupported extended GPR for addressing for `vpblendd'
+.*:145: Error: unsupported extended GPR for addressing for `vpblendd'
+.*:146: Error: unsupported extended GPR for addressing for `vpblendvb'
+.*:147: Error: unsupported extended GPR for addressing for `vpblendvb'
+.*:148: Error: unsupported extended GPR for addressing for `vpblendw'
+.*:149: Error: unsupported extended GPR for addressing for `vpblendw'
+.*:150: Error: unsupported extended GPR for addressing for `vpcmpeqb'
+.*:151: Error: unsupported extended GPR for addressing for `vpcmpeqd'
+.*:152: Error: unsupported extended GPR for addressing for `vpcmpeqq'
+.*:153: Error: unsupported extended GPR for addressing for `vpcmpeqw'
+.*:154: Error: unsupported extended GPR for addressing for `vpcmpestri'
+.*:155: Error: unsupported extended GPR for addressing for `vpcmpestrm'
+.*:156: Error: unsupported extended GPR for addressing for `vpcmpgtb'
+.*:157: Error: unsupported extended GPR for addressing for `vpcmpgtd'
+.*:158: Error: unsupported extended GPR for addressing for `vpcmpgtq'
+.*:159: Error: unsupported extended GPR for addressing for `vpcmpgtw'
+.*:160: Error: unsupported extended GPR for addressing for `vpcmpistri'
+.*:161: Error: unsupported extended GPR for addressing for `vpcmpistrm'
+.*:162: Error: unsupported extended GPR for addressing for `vperm2f128'
+.*:163: Error: unsupported extended GPR for addressing for `vperm2i128'
+.*:164: Error: unsupported extended GPR for addressing for `vphaddd'
+.*:165: Error: unsupported extended GPR for addressing for `vphaddd'
+.*:166: Error: unsupported extended GPR for addressing for `vphaddsw'
+.*:167: Error: unsupported extended GPR for addressing for `vphaddsw'
+.*:168: Error: unsupported extended GPR for addressing for `vphaddw'
+.*:169: Error: unsupported extended GPR for addressing for `vphaddw'
+.*:170: Error: unsupported extended GPR for addressing for `vphminposuw'
+.*:171: Error: unsupported extended GPR for addressing for `vphsubd'
+.*:172: Error: unsupported extended GPR for addressing for `vphsubd'
+.*:173: Error: unsupported extended GPR for addressing for `vphsubsw'
+.*:174: Error: unsupported extended GPR for addressing for `vphsubsw'
+.*:175: Error: unsupported extended GPR for addressing for `vphsubw'
+.*:176: Error: unsupported extended GPR for addressing for `vphsubw'
+.*:177: Error: unsupported extended GPR for addressing for `vpmaskmovd'
+.*:178: Error: unsupported extended GPR for addressing for `vpmaskmovd'
+.*:179: Error: unsupported extended GPR for addressing for `vpmaskmovd'
+.*:180: Error: unsupported extended GPR for addressing for `vpmaskmovd'
+.*:181: Error: unsupported extended GPR for addressing for `vpmaskmovq'
+.*:182: Error: unsupported extended GPR for addressing for `vpmaskmovq'
+.*:183: Error: unsupported extended GPR for addressing for `vpmaskmovq'
+.*:184: Error: unsupported extended GPR for addressing for `vpmaskmovq'
+.*:185: Error: register type mismatch for `vpmovmskb'
+.*:186: Error: register type mismatch for `vpmovmskb'
+.*:187: Error: unsupported extended GPR for addressing for `vpsignb'
+.*:188: Error: unsupported extended GPR for addressing for `vpsignb'
+.*:189: Error: unsupported extended GPR for addressing for `vpsignd'
+.*:190: Error: unsupported extended GPR for addressing for `vpsignd'
+.*:191: Error: unsupported extended GPR for addressing for `vpsignw'
+.*:192: Error: unsupported extended GPR for addressing for `vpsignw'
+.*:193: Error: unsupported extended GPR for addressing for `vptest'
+.*:194: Error: unsupported extended GPR for addressing for `vrcpps'
+.*:195: Error: unsupported extended GPR for addressing for `vrcpps'
+.*:196: Error: unsupported extended GPR for addressing for `vrcpss'
+.*:197: Error: unsupported extended GPR for addressing for `vroundpd'
+.*:198: Error: unsupported extended GPR for addressing for `vroundps'
+.*:199: Error: unsupported extended GPR for addressing for `vroundsd'
+.*:200: Error: unsupported extended GPR for addressing for `vroundss'
+.*:201: Error: unsupported extended GPR for addressing for `vrsqrtps'
+.*:202: Error: unsupported extended GPR for addressing for `vrsqrtps'
+.*:203: Error: unsupported extended GPR for addressing for `vrsqrtss'
+.*:204: Error: unsupported extended GPR for addressing for `vstmxcsr'
+.*:205: Error: unsupported extended GPR for addressing for `vtestpd'
+.*:206: Error: unsupported extended GPR for addressing for `vtestpd'
+.*:207: Error: unsupported extended GPR for addressing for `vtestps'
+.*:208: Error: unsupported extended GPR for addressing for `vtestps'
+.*:209: Error: unsupported extended GPR for addressing for `vtestps'
+.*:210: Error: unsupported extended GPR for addressing for `vptest'
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
index c4d2308a604..f9ef2ddbac7 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -1,4 +1,4 @@
-# Check Illegal 64bit APX_F instructions
+# Check illegal 64bit APX_F instructions
 	.text
 	.arch .noapx_f
 	test    $0x7, %r17d
@@ -16,3 +16,195 @@
 	xsaveopt64 (%r16, %r31)
 	xsavec (%r16, %rbx)
 	xsavec64 (%r16, %r31)
+#SSE
+	blendpd $100,(%r18),%xmm6
+	blendps $100,(%r18),%xmm6
+	blendvpd %xmm0,(%r19),%xmm6
+	blendvpd (%r19),%xmm6
+	blendvps %xmm0,(%r19),%xmm6
+	blendvps (%r19),%xmm6
+	dppd $100,(%r20),%xmm6
+	dpps $100,(%r20),%xmm6
+	extractps $100,%xmm4,%r21
+	extractps $100,%xmm4,(%r21)
+	insertps $100,(%r21),%xmm6
+	movntdqa (%r21),%xmm4
+	mpsadbw $100,(%r21),%xmm6
+	pabsb (%r17),%xmm0
+	pabsd (%r17),%xmm0
+	pabsw (%r17),%xmm0
+	packusdw (%r21),%xmm6
+	palignr $100,(%r17),%xmm6
+	pblendvb %xmm0,(%r22),%xmm6
+	pblendvb (%r22),%xmm6
+	pblendw $100,(%r22),%xmm6
+	pcmpeqq (%r22),%xmm6
+	pcmpestri $100,(%r25),%xmm6
+	pcmpestrm $100,(%r25),%xmm6
+	pcmpgtq (%r25),%xmm4
+	pcmpistri $100,(%r25),%xmm6
+	pcmpistrm $100,(%r25),%xmm6
+	pextrb $100,%xmm4,%r22
+	pextrb $100,%xmm4,(%r22)
+	pextrd $100,%xmm4,(%r22)
+	pextrq $100,%xmm4,(%r22)
+	pextrw $100,%xmm4,(%r22)
+	phaddd  (%r17),%xmm0
+	phaddsw (%r17),%xmm0
+	phaddw  (%r17),%xmm0
+	phminposuw (%r23),%xmm4
+	phsubw (%r17),%xmm0
+	pinsrb $100,%r23,%xmm4
+	pinsrb $100,(%r23),%xmm4
+	pinsrd $100, %r23d, %xmm4
+	pinsrd $100,(%r23),%xmm4
+	pinsrq $100, %r24, %xmm4
+	pinsrq $100,(%r24),%xmm4
+	pmaddubsw (%r17),%xmm0
+	pmaxsb (%r24),%xmm6
+	pmaxsd (%r24),%xmm6
+	pmaxud (%r24),%xmm6
+	pmaxuw (%r24),%xmm6
+	pminsb (%r24),%xmm6
+	pminsd (%r24),%xmm6
+	pminud (%r24),%xmm6
+	pminuw (%r24),%xmm6
+	pmovsxbd (%r24),%xmm4
+	pmovsxbq (%r24),%xmm4
+	pmovsxbw (%r24),%xmm4
+	pmovsxbw (%r24),%xmm4
+	pmovsxdq (%r24),%xmm4
+	pmovsxwd (%r24),%xmm4
+	pmovsxwq (%r24),%xmm4
+	pmovzxbd (%r24),%xmm4
+	pmovzxbq (%r24),%xmm4
+	pmovzxdq (%r24),%xmm4
+	pmovzxwd (%r24),%xmm4
+	pmovzxwq (%r24),%xmm4
+	pmuldq (%r24),%xmm4
+	pmulhrsw (%r17),%xmm0
+	pmulld (%r24),%xmm4
+	pshufb (%r17),%xmm0
+	psignb (%r17),%xmm0
+	psignd (%r17),%xmm0
+	psignw (%r17),%xmm0
+	roundpd $100,(%r24),%xmm6
+	roundps $100,(%r24),%xmm6
+	roundsd $100,(%r24),%xmm6
+	roundss $100,(%r24),%xmm6
+#AES
+	aesdec (%r26),%xmm6
+	aesdeclast (%r26),%xmm6
+	aesenc (%r26),%xmm6
+	aesenclast (%r26),%xmm6
+	aesimc (%r26),%xmm6
+	aeskeygenassist $100,(%r26),%xmm6
+	pclmulhqhqdq (%r26),%xmm6
+	pclmulhqlqdq (%r26),%xmm6
+	pclmullqhqdq (%r26),%xmm6
+	pclmullqlqdq (%r26),%xmm6
+	pclmulqdq $100,(%r26),%xmm6
+#GFNI
+	gf2p8affineinvqb $100,(%r26),%xmm6
+	gf2p8affineqb $100,(%r26),%xmm6
+	gf2p8mulb (%r26),%xmm6
+#VEX without evex
+	vaesimc (%r27), %xmm3
+	vaeskeygenassist $7,(%r27),%xmm3
+	vblendpd $7,(%r27),%xmm6,%xmm2
+	vblendpd $7,(%r27),%ymm6,%ymm2
+	vblendps $7,(%r27),%xmm6,%xmm2
+	vblendps $7,(%r27),%ymm6,%ymm2
+	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
+	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
+	vblendvps %xmm4,(%r27),%xmm2,%xmm7
+	vblendvps %ymm4,(%r27),%ymm2,%ymm7
+	vdppd $7,(%r27),%xmm6,%xmm2
+	vdpps $7,(%r27),%xmm6,%xmm2
+	vdpps $7,(%r27),%ymm6,%ymm2
+	vhaddpd (%r27),%xmm6,%xmm5
+	vhaddpd (%r27),%ymm6,%ymm5
+	vhsubps (%r27),%xmm6,%xmm5
+	vhsubps (%r27),%ymm6,%ymm5
+	vlddqu (%r27),%xmm4
+	vlddqu (%r27),%ymm4
+	vldmxcsr (%r27)
+	vmaskmovpd %xmm4,%xmm6,(%r27)
+	vmaskmovpd %ymm4,%ymm6,(%r27)
+	vmaskmovpd (%r27),%xmm4,%xmm6
+	vmaskmovpd (%r27),%ymm4,%ymm6
+	vmaskmovps %xmm4,%xmm6,(%r27)
+	vmaskmovps %ymm4,%ymm6,(%r27)
+	vmaskmovps (%r27),%xmm4,%xmm6
+	vmaskmovps (%r27),%ymm4,%ymm6
+	vmovmskpd %xmm4,%r27d
+	vmovmskpd %xmm8,%r27d
+	vmovmskps %xmm4,%r27d
+	vmovmskps %ymm8,%r27d
+	vpblendd $7,(%r27),%xmm6,%xmm2
+	vpblendd $7,(%r27),%ymm6,%ymm2
+	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
+	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
+	vpblendw $7,(%r27),%xmm6,%xmm2
+	vpblendw $7,(%r27),%ymm6,%ymm2
+	vpcmpeqb (%r26),%ymm6,%ymm2
+	vpcmpeqd (%r26),%ymm6,%ymm2
+	vpcmpeqq (%r16),%ymm6,%ymm2
+	vpcmpeqw (%r16),%ymm6,%ymm2
+	vpcmpestri $7,(%r27),%xmm6
+	vpcmpestrm $7,(%r27),%xmm6
+	vpcmpgtb (%r26),%ymm6,%ymm2
+	vpcmpgtd (%r26),%ymm6,%ymm2
+	vpcmpgtq (%r16),%ymm6,%ymm2
+	vpcmpgtw (%r16),%ymm6,%ymm2
+	vpcmpistri $100,(%r25),%xmm6
+	vpcmpistrm $100,(%r25),%xmm6
+	vperm2f128 $7,(%r27),%ymm6,%ymm2
+	vperm2i128 $7,(%r27),%ymm6,%ymm2
+	vphaddd (%r27),%xmm6,%xmm7
+	vphaddd (%r27),%ymm6,%ymm7
+	vphaddsw (%r27),%xmm6,%xmm7
+	vphaddsw (%r27),%ymm6,%ymm7
+	vphaddw (%r27),%xmm6,%xmm7
+	vphaddw (%r27),%ymm6,%ymm7
+	vphminposuw (%r27),%xmm6
+	vphsubd (%r27),%xmm6,%xmm7
+	vphsubd (%r27),%ymm6,%ymm7
+	vphsubsw (%r27),%xmm6,%xmm7
+	vphsubsw (%r27),%ymm6,%ymm7
+	vphsubw (%r27),%xmm6,%xmm7
+	vphsubw (%r27),%ymm6,%ymm7
+	vpmaskmovd %xmm4,%xmm6,(%r27)
+	vpmaskmovd %ymm4,%ymm6,(%r27)
+	vpmaskmovd (%r27),%xmm4,%xmm6
+	vpmaskmovd (%r27),%ymm4,%ymm6
+	vpmaskmovq %xmm4,%xmm6,(%r27)
+	vpmaskmovq %ymm4,%ymm6,(%r27)
+	vpmaskmovq (%r27),%xmm4,%xmm6
+	vpmaskmovq (%r27),%ymm4,%ymm6
+	vpmovmskb %xmm4,%r27
+	vpmovmskb %ymm4,%r27d
+	vpsignb (%r27),%xmm6,%xmm7
+	vpsignb (%r27),%xmm6,%xmm7
+	vpsignd (%r27),%xmm6,%xmm7
+	vpsignd (%r27),%xmm6,%xmm7
+	vpsignw (%r27),%xmm6,%xmm7
+	vpsignw (%r27),%xmm6,%xmm7
+	vptest (%r27),%ymm6
+	vrcpps (%r27),%xmm6
+	vrcpps (%r27),%ymm6
+	vrcpss (%r27),%xmm6,%xmm6
+	vroundpd $1,(%r24),%xmm6
+	vroundps $2,(%r24),%xmm6
+	vroundsd $3,(%r24),%xmm6,%xmm3
+	vroundss $4,(%r24),%xmm6,%xmm3
+	vrsqrtps (%r27),%xmm6
+	vrsqrtps (%r27),%ymm6
+	vrsqrtss (%r27),%xmm6,%xmm6
+	vstmxcsr (%r27)
+	vtestpd (%r27),%xmm6
+	vtestpd (%r27),%ymm6
+	vtestps (%r27),%xmm6
+	vtestps (%r27),%ymm6
+	vtestps (%r27),%ymm6
+	vptest (%r27),%xmm6
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
new file mode 100644
index 00000000000..f8701d7ec22
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l
@@ -0,0 +1,20 @@
+.*: Assembler messages:
+.*:4: Error: `movbe' is not supported on `x86_64.nomovbe'
+.*:5: Error: `movbe' is not supported on `x86_64.nomovbe'
+.*:8: Error: `invept' is not supported on `x86_64.noept'
+.*:9: Error: `invept' is not supported on `x86_64.noept'
+.*:12: Error: `kmovq' is not supported on `x86_64.noavx512bw'
+.*:13: Error: `kmovq' is not supported on `x86_64.noavx512bw'
+.*:16: Error: `kmovb' is not supported on `x86_64.noavx512dq'
+.*:17: Error: `kmovb' is not supported on `x86_64.noavx512dq'
+.*:20: Error: `kmovw' is not supported on `x86_64.noavx512f'
+.*:21: Error: `kmovw' is not supported on `x86_64.noavx512f'
+.*:24: Error: `andn' is not supported on `x86_64.nobmi'
+.*:25: Error: `andn' is not supported on `x86_64.nobmi'
+.*:28: Error: `bzhi' is not supported on `x86_64.nobmi2'
+.*:29: Error: `bzhi' is not supported on `x86_64.nobmi2'
+GAS LISTING .*
+#...
+[ 	]*1[ 	]+\# Check illegal 64bit APX EVEX promoted instructions
+[ 	]*2[ 	]+\.text
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
new file mode 100644
index 00000000000..2ea47419b4d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s
@@ -0,0 +1,29 @@
+# Check illegal 64bit APX EVEX promoted instructions
+	.text
+	.arch .nomovbe
+	movbe (%r16), %r17
+	movbe (%rax), %rcx
+	.arch default
+	.arch .noept
+	invept (%r16), %r17
+	invept (%rax), %rcx
+	.arch default
+	.arch .noavx512bw
+	kmovq %k1, (%r16)
+	kmovq %k1, (%r8)
+	.arch default
+	.arch .noavx512dq
+	kmovb %k1, %r16d
+	kmovb %k1, %r8d
+	.arch default
+	.arch .noavx512f
+	kmovw %k1, %r16d
+	kmovw %k1, %r8d
+	.arch default
+	.arch .nobmi
+	andn %r16,%r15,%r11
+	andn %r15,%r15,%r11
+	.arch default
+	.arch .nobmi2
+	bzhi %r16,%r15,%r11
+	bzhi %r15,%r15,%r11
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
new file mode 100644
index 00000000000..c3c578675c0
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.d
@@ -0,0 +1,20 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX old evex insn use gpr32 with extend-evex prefix
+#source: x86-64-apx-evex-egpr.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 fb 79 48 19 04 08 01[	 ]+vextractf32x4 \$0x1,%zmm0,\(%r16,%r17,1\)
+\s*[a-f0-9]+:\s*62 fa 79 48 5a 04 1a[	 ]+vbroadcasti32x4 \(%r18,%r19,1\),%zmm0
+\s*[a-f0-9]+:\s*62 eb 7d 08 17 c4 01[	 ]+vextractps \$0x1,%xmm16,%r20d
+\s*[a-f0-9]+:\s*62 69 97 00 2a f5[	 ]+vcvtsi2sd %r21,%xmm29,%xmm30
+\s*[a-f0-9]+:\s*67 62 fe 55 58 96 36[	 ]+vfmaddsub132ph \(%r22d\)\{1to32\},%zmm5,%zmm6
+\s*[a-f0-9]+:\s*62 81 fe 18 78 fe[	 ]+vcvttss2usi \{sae\},%xmm30,%r23
+\s*[a-f0-9]+:\s*62 25 10 47 58 b4 c5 00 00 00 10[	 ]+vaddph 0x10000000\(%rbp,%r24,8\),%zmm29,%zmm30\{%k7\}
+\s*[a-f0-9]+:\s*62 4d 7c 08 2f 71 7f[	 ]+vcomish 0xfe\(%r25\),%xmm30
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
new file mode 100644
index 00000000000..7d1c5de2b6d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-egpr.s
@@ -0,0 +1,21 @@
+# Check 64bit old evex instructions use gpr32 with evex prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+## DestMem
+	 vextractf32x4	$1, %zmm0, (%r16,%r17)
+## SrcMem
+	 vbroadcasti32x4	(%r18,%r19), %zmm0
+## DestReg
+	 vextractps	$1, %xmm16, %r20d
+## SrcReg
+	 vcvtsi2sdq      %r21, %xmm29, %xmm30
+## Broadcast
+	 vfmaddsub132ph  (%r22d){1to32}, %zmm5, %zmm6
+## SAE
+	 vcvttss2usi     {sae}, %xmm30, %r23
+## Masking
+	 vaddph  0x10000000(%rbp, %r24, 8), %zmm29, %zmm30{%k7}
+## Disp8memshift
+	 vcomish 254(%r25), %xmm30
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
new file mode 100644
index 00000000000..07760240793
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -0,0 +1,30 @@
+#objdump: -dw
+#name: x86-64 EVEX-promoted bad
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7e 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c7[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7f 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c7[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+62 e2 f9 41 91 84[ 	]+vpgatherqq \(bad\),%zmm16\{%k1\}
+[ 	]*[a-f0-9]+:[ 	]+cd 7b[ 	]+int    \$0x7b
+[ 	]*[a-f0-9]+:[ 	]+00 00[ 	]+add    %al,\(%rax\)
+[ 	]*[a-f0-9]+:[ 	]+00 ff[ 	]+add    %bh,%bh
+[ 	]*[a-f0-9]+:[ 	]+62 fd 7d 08 60[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c7[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7d[ 	]+\(bad\)  \{%k1\}
+[ 	]*[a-f0-9]+:[ 	]+09 60 c7[ 	]+or     %esp,-0x39\(%rax\)
+[ 	]*[a-f0-9]+:[ 	]+62 fc 7d[ 	]+\(bad\).*
+[ 	]*[a-f0-9]+:[ 	]+38 60 c7[ 	]+cmp    %ah,-0x39\(%rax\)
+[ 	]*[a-f0-9]+:[ 	]+62 f2 64 09 f5[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c8 ff ff ff[ 	]+enter  \$0xffff,\$0xff
+[ 	]*[a-f0-9]+:[ 	]+62 f2 64 38 f5[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+c8 ff ff ff[ 	]+enter  \$0xffff,\$0xff
+[ 	]*[a-f0-9]+:[ 	]+67 62 f2 7c 18 f5[ 	]+addr32 \(bad\)
+[ 	]*[a-f0-9]+:[ 	]+0b ff[ 	]+or     %edi,%edi
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
new file mode 100644
index 00000000000..bfec0652d13
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -0,0 +1,28 @@
+# Check Illegal prefix for 64bit EVEX-promoted instructions
+
+        .allow_index_reg
+        .text
+_start:
+	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
+	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
+	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
+	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
+	#VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
+	#(illegal value).
+	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
+	.byte 0xff
+	#EVEX_MAP4 movbe %r23w,%ax set EVEX.mm == b01 (illegal value).
+	.insn EVEX.L0.66.M13.W0 0x60, %di, %ax
+	#EVEX_MAP4 movbe %r23w,%ax set EVEX.aa(P[17:16]) == b01 (illegal value).
+	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
+	#EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == 0b11 (illegal value).
+	.insn EVEX.L0.66.M12.W0 0x60, %di, {rd-sae}, %ax
+	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[17:16](EVEX.aa) == 1 (illegal value).
+	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax, %ebx, %ecx{%k1}
+	.byte 0xff, 0xff, 0xff
+	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[22:21](EVEX.L’L) == 1 (illegal value).
+	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax, {rd-sae}, %ebx, %ecx
+	.byte 0xff, 0xff, 0xff
+	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[20](EVEX.b) == 1 (illegal value).
+	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax ,(%ebx){1to8}, %ecx
+	.byte 0xff
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
new file mode 100644
index 00000000000..02e811de88d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d
@@ -0,0 +1,318 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 APX_F EVEX-Promoted insns (Intel disassembly)
+#source: x86-64-apx-evex-promoted.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32[	 ]+r22,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32[	 ]+r22,QWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32[	 ]+r17,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32[	 ]+r21d,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32[	 ]+ebx,BYTE PTR \[r19\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32[	 ]+r23d,r31d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32[	 ]+r23d,DWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32[	 ]+r21d,r31w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32[	 ]+r21d,WORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32[	 ]+r18,rax
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+BYTE PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+k5,BYTE PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+k5,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+r31,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+k5,r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+k5,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+k5,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+ax,r18w
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r16\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+DWORD PTR \[r16\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r16\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+r31,QWORD PTR \[r16\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+r18w,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4 xmm22,xmm23,0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\],0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2 xmm12,XMMWORD PTR \[r31\+rax\*4\+0x123\],xmm0
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl xmm22,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32[	 ]+r22,r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32[	 ]+r22,QWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32[	 ]+r17,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32[	 ]+r21d,r19b
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32[	 ]+ebx,BYTE PTR \[r19\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32[	 ]+r23d,r31d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32[	 ]+r23d,DWORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32[	 ]+r21d,r31w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32[	 ]+r21d,WORD PTR \[r31\]
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32[	 ]+r18,rax
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+r31,OWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+BYTE PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+k5,BYTE PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+k5,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+r31,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+k5,r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+k5,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+r25d,k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+k5,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+k5,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+ax,r18w
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r16\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+WORD PTR \[r31\+rax\*4\+0x123\],r18w
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+DWORD PTR \[r16\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r16\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+r31,QWORD PTR \[r16\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+r18w,WORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+r25d,\[r31d\+eax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+r31,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+edx,r25d,DWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+r15,r31,QWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4 xmm22,xmm23,0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\],0x7b
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2 xmm22,xmm23
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2 xmm22,XMMWORD PTR \[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2 xmm12,XMMWORD PTR \[r31\+rax\*4\+0x123\],xmm0
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+r10d,edx,r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+edx,DWORD PTR \[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+r11,r15,r31
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+r15,QWORD PTR \[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1 tmm6,\[r31\+rax\*4\+0x123\]
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+\[r31\+rax\*4\+0x123\],tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+\[r31\+rax\*4\+0x123\],r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+\[r31\+rax\*4\+0x123\],r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+\[r31\+rax\*4\+0x123\],r31
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
new file mode 100644
index 00000000000..3a7dffc013b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.d
@@ -0,0 +1,318 @@
+#as:
+#objdump: -dw
+#name: x86_64 APX_F EVEX-Promoted insns
+#source: x86-64-apx-evex-promoted.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32  %r31,%r22
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32q \(%r31\),%r22
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32  %r19b,%r17
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32  %r19b,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32b \(%r19\),%ebx
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32  %r31d,%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32l \(%r31\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32  %r31w,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32w \(%r31\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32  %rax,%r18
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+%k5,%r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+%r31,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+%r18w,%ax
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+%r25d,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r16,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r18w
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4[	 ]+\$0x7b,%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4[	 ]+\$0x7b,0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2[	 ]+%xmm0,0x123\(%r31,%rax,4\),%xmm12
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 fc 8c 87 23 01 00 00[	 ]+aadd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 fc bc 87 23 01 00 00[	 ]+aadd[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 fc 8c 87 23 01 00 00[	 ]+aand[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 fc bc 87 23 01 00 00[	 ]+aand[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dd b4 87 23 01 00 00[	 ]+aesdec128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 df b4 87 23 01 00 00[	 ]+aesdec256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 8c 87 23 01 00 00[	 ]+aesdecwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 9c 87 23 01 00 00[	 ]+aesdecwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 dc b4 87 23 01 00 00[	 ]+aesenc128kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7e 08 de b4 87 23 01 00 00[	 ]+aesenc256kl[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 84 87 23 01 00 00[	 ]+aesencwide128kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 d8 94 87 23 01 00 00[	 ]+aesencwide256kl[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 fc 8c 87 23 01 00 00[	 ]+aor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c ff 08 fc bc 87 23 01 00 00[	 ]+aor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 fc 8c 87 23 01 00 00[	 ]+axor[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 fc bc 87 23 01 00 00[	 ]+axor[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f7 d2[	 ]+bextr[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f7 94 87 23 01 00 00[	 ]+bextr[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f7 df[	 ]+bextr[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f7 bc 87 23 01 00 00[	 ]+bextr[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d9[	 ]+blsi[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 df[	 ]+blsi[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 9c 87 23 01 00 00[	 ]+blsi[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 d1[	 ]+blsmsk[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 d7[	 ]+blsmsk[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 94 87 23 01 00 00[	 ]+blsmsk[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 da 6c 08 f3 c9[	 ]+blsr[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 08 f3 cf[	 ]+blsr[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 da 84 00 f3 8c 87 23 01 00 00[	 ]+blsr[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 72 34 00 f5 d2[	 ]+bzhi[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 34 00 f5 94 87 23 01 00 00[	 ]+bzhi[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 84 00 f5 df[	 ]+bzhi[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 84 00 f5 bc 87 23 01 00 00[	 ]+bzhi[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e6 94 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e6 bc 87 23 01 00 00[	 ]+cmpbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e2 94 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e2 bc 87 23 01 00 00[	 ]+cmpbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ec 94 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ec bc 87 23 01 00 00[	 ]+cmplxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e7 94 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e7 bc 87 23 01 00 00[	 ]+cmpnbexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e3 94 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e3 bc 87 23 01 00 00[	 ]+cmpnbxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ef 94 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ef bc 87 23 01 00 00[	 ]+cmpnlexadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ed 94 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ed bc 87 23 01 00 00[	 ]+cmpnlxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e1 94 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e1 bc 87 23 01 00 00[	 ]+cmpnoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 eb 94 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 eb bc 87 23 01 00 00[	 ]+cmpnpxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e9 94 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e9 bc 87 23 01 00 00[	 ]+cmpnsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e5 94 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e5 bc 87 23 01 00 00[	 ]+cmpnzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e0 94 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e0 bc 87 23 01 00 00[	 ]+cmpoxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 ea 94 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 ea bc 87 23 01 00 00[	 ]+cmppxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e8 94 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e8 bc 87 23 01 00 00[	 ]+cmpsxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 e4 94 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r25d,%edx,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 e4 bc 87 23 01 00 00[	 ]+cmpzxadd[	 ]+%r31,%r15,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 f7[	 ]+crc32  %r31,%r22
+[	 ]*[a-f0-9]+:[	 ]*62 cc fc 08 f1 37[	 ]+crc32q \(%r31\),%r22
+[	 ]*[a-f0-9]+:[	 ]*62 ec fc 08 f0 cb[	 ]+crc32  %r19b,%r17
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7c 08 f0 eb[	 ]+crc32  %r19b,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7c 08 f0 1b[	 ]+crc32b \(%r19\),%ebx
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 ff[	 ]+crc32  %r31d,%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 f1 3f[	 ]+crc32l \(%r31\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 ef[	 ]+crc32  %r31w,%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 f1 2f[	 ]+crc32w \(%r31\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*62 e4 fc 08 f1 d0[	 ]+crc32  %rax,%r18
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 da d1[	 ]+encodekey128[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7e 08 db d1[	 ]+encodekey256[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7f 08 f8 8c 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7f 08 f8 bc 87 23 01 00 00[	 ]+enqcmd[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7e 08 f8 8c 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7e 08 f8 bc 87 23 01 00 00[	 ]+enqcmds[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f0 bc 87 23 01 00 00[	 ]+invept[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f2 bc 87 23 01 00 00[	 ]+invpcid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c fe 08 f1 bc 87 23 01 00 00[	 ]+invvpid[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 61 7d 08 93 cd[	 ]+kmovb[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 91 ac 87 23 01 00 00[	 ]+kmovb[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 92 e9[	 ]+kmovb[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7d 08 90 ac 87 23 01 00 00[	 ]+kmovb[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7f 08 93 cd[	 ]+kmovd[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 91 ac 87 23 01 00 00[	 ]+kmovd[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7f 08 92 e9[	 ]+kmovd[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fd 08 90 ac 87 23 01 00 00[	 ]+kmovd[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 ff 08 93 fd[	 ]+kmovq[	 ]+%k5,%r31
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 91 ac 87 23 01 00 00[	 ]+kmovq[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 ff 08 92 ef[	 ]+kmovq[	 ]+%r31,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 fc 08 90 ac 87 23 01 00 00[	 ]+kmovq[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 61 7c 08 93 cd[	 ]+kmovw[	 ]+%k5,%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 91 ac 87 23 01 00 00[	 ]+kmovw[	 ]+%k5,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 92 e9[	 ]+kmovw[	 ]+%r25d,%k5
+[	 ]*[a-f0-9]+:[	 ]*62 d9 7c 08 90 ac 87 23 01 00 00[	 ]+kmovw[	 ]+0x123\(%r31,%rax,4\),%k5
+[	 ]*[a-f0-9]+:[	 ]*62 da 7c 08 49 84 87 23 01 00 00[	 ]+ldtilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 fc 7d 08 60 c2[	 ]+movbe[	 ]+%r18w,%ax
+[	 ]*[a-f0-9]+:[	 ]*62 ec 7d 08 61 94 80 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 61 94 87 23 01 00 00[	 ]+movbe[	 ]+%r18w,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 dc 7c 08 60 d1[	 ]+movbe[	 ]+%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 6c 7c 08 61 8c 80 23 01 00 00[	 ]+movbe[	 ]+%r25d,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5c fc 08 60 ff[	 ]+movbe[	 ]+%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 61 bc 80 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r16,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 61 bc 87 23 01 00 00[	 ]+movbe[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 6c fc 08 60 bc 80 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r16,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7d 08 60 94 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r18w
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 60 8c 87 23 01 00 00[	 ]+movbe[	 ]+0x123\(%r31,%rax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*67 62 4c 7d 08 f8 8c 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31d,%eax,4\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 f8 bc 87 23 01 00 00[	 ]+movdir64b[	 ]+0x123\(%r31,%rax,4\),%r31
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 f9 8c 87 23 01 00 00[	 ]+movdiri[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 f9 bc 87 23 01 00 00[	 ]+movdiri[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6f 08 f5 d1[	 ]+pdep[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 08 f5 df[	 ]+pdep[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f5 94 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f5 bc 87 23 01 00 00[	 ]+pdep[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 5a 6e 08 f5 d1[	 ]+pext[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 08 f5 df[	 ]+pext[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 da 36 00 f5 94 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r25d,%edx
+[	 ]*[a-f0-9]+:[	 ]*62 5a 86 00 f5 bc 87 23 01 00 00[	 ]+pext[	 ]+0x123\(%r31,%rax,4\),%r31,%r15
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d9 f7[	 ]+sha1msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d9 b4 87 23 01 00 00[	 ]+sha1msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 da f7[	 ]+sha1msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 da b4 87 23 01 00 00[	 ]+sha1msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d8 f7[	 ]+sha1nexte[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d8 b4 87 23 01 00 00[	 ]+sha1nexte[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 d4 f7 7b[	 ]+sha1rnds4[	 ]+\$0x7b,%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 d4 b4 87 23 01 00 00 7b[	 ]+sha1rnds4[	 ]+\$0x7b,0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dc f7[	 ]+sha256msg1[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dc b4 87 23 01 00 00[	 ]+sha256msg1[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 a4 7c 08 dd f7[	 ]+sha256msg2[	 ]+%xmm23,%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 cc 7c 08 dd b4 87 23 01 00 00[	 ]+sha256msg2[	 ]+0x123\(%r31,%rax,4\),%xmm22
+[	 ]*[a-f0-9]+:[	 ]*62 5c 7c 08 db a4 87 23 01 00 00[	 ]+sha256rnds2[	 ]+%xmm0,0x123\(%r31,%rax,4\),%xmm12
+[	 ]*[a-f0-9]+:[	 ]*62 72 35 00 f7 d2[	 ]+shlx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 35 00 f7 94 87 23 01 00 00[	 ]+shlx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 85 00 f7 df[	 ]+shlx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 85 00 f7 bc 87 23 01 00 00[	 ]+shlx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 72 37 00 f7 d2[	 ]+shrx[	 ]+%r25d,%edx,%r10d
+[	 ]*[a-f0-9]+:[	 ]*62 da 37 00 f7 94 87 23 01 00 00[	 ]+shrx[	 ]+%r25d,0x123\(%r31,%rax,4\),%edx
+[	 ]*[a-f0-9]+:[	 ]*62 52 87 00 f7 df[	 ]+shrx[	 ]+%r31,%r15,%r11
+[	 ]*[a-f0-9]+:[	 ]*62 5a 87 00 f7 bc 87 23 01 00 00[	 ]+shrx[	 ]+%r31,0x123\(%r31,%rax,4\),%r15
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 49 84 87 23 01 00 00[	 ]+sttilecfg[	 ]+0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 da 7f 08 4b b4 87 23 01 00 00[	 ]+tileloadd[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7d 08 4b b4 87 23 01 00 00[	 ]+tileloaddt1[	 ]+0x123\(%r31,%rax,4\),%tmm6
+[	 ]*[a-f0-9]+:[	 ]*62 da 7e 08 4b b4 87 23 01 00 00[	 ]+tilestored[	 ]+%tmm6,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7c 08 66 8c 87 23 01 00 00[	 ]+wrssd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fc 08 66 bc 87 23 01 00 00[	 ]+wrssq[	 ]+%r31,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c 7d 08 65 8c 87 23 01 00 00[	 ]+wrussd[	 ]+%r25d,0x123\(%r31,%rax,4\)
+[	 ]*[a-f0-9]+:[	 ]*62 4c fd 08 65 bc 87 23 01 00 00[	 ]+wrussq[	 ]+%r31,0x123\(%r31,%rax,4\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
new file mode 100644
index 00000000000..39752c27432
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted.s
@@ -0,0 +1,314 @@
+# Check 64bit APX_F EVEX-Promoted instructions.
+
+	.text
+_start:
+	aadd	%r25d,0x123(%r31,%rax,4)
+	aadd	%r31,0x123(%r31,%rax,4)
+	aand	%r25d,0x123(%r31,%rax,4)
+	aand	%r31,0x123(%r31,%rax,4)
+	aesdec128kl	0x123(%r31,%rax,4),%xmm22
+	aesdec256kl	0x123(%r31,%rax,4),%xmm22
+	aesdecwide128kl	0x123(%r31,%rax,4)
+	aesdecwide256kl	0x123(%r31,%rax,4)
+	aesenc128kl	0x123(%r31,%rax,4),%xmm22
+	aesenc256kl	0x123(%r31,%rax,4),%xmm22
+	aesencwide128kl	0x123(%r31,%rax,4)
+	aesencwide256kl	0x123(%r31,%rax,4)
+	aor	%r25d,0x123(%r31,%rax,4)
+	aor	%r31,0x123(%r31,%rax,4)
+	axor	%r25d,0x123(%r31,%rax,4)
+	axor	%r31,0x123(%r31,%rax,4)
+	bextr	%r25d,%edx,%r10d
+	bextr	%r25d,0x123(%r31,%rax,4),%edx
+	bextr	%r31,%r15,%r11
+	bextr	%r31,0x123(%r31,%rax,4),%r15
+	blsi	%r25d,%edx
+	blsi	%r31,%r15
+	blsi	0x123(%r31,%rax,4),%r25d
+	blsi	0x123(%r31,%rax,4),%r31
+	blsmsk	%r25d,%edx
+	blsmsk	%r31,%r15
+	blsmsk	0x123(%r31,%rax,4),%r25d
+	blsmsk	0x123(%r31,%rax,4),%r31
+	blsr	%r25d,%edx
+	blsr	%r31,%r15
+	blsr	0x123(%r31,%rax,4),%r25d
+	blsr	0x123(%r31,%rax,4),%r31
+	bzhi	%r25d,%edx,%r10d
+	bzhi	%r25d,0x123(%r31,%rax,4),%edx
+	bzhi	%r31,%r15,%r11
+	bzhi	%r31,0x123(%r31,%rax,4),%r15
+	cmpbexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpbexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpbxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpbxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmplxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmplxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnbexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnbexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnbxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnbxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnlexadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnlexadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnlxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnlxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnoxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnoxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnpxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnpxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnsxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnsxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpnzxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpnzxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpoxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpoxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmppxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmppxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpsxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpsxadd	%r31,%r15,0x123(%r31,%rax,4)
+	cmpzxadd	%r25d,%edx,0x123(%r31,%rax,4)
+	cmpzxadd	%r31,%r15,0x123(%r31,%rax,4)
+	crc32q	%r31, %r22
+	crc32q	(%r31), %r22
+	crc32b	%r19b, %r17
+	crc32b	%r19b, %r21d
+	crc32b	(%r19),%ebx
+	crc32l	%r31d, %r23d
+	crc32l	(%r31), %r23d
+	crc32w	%r31w, %r21d
+	crc32w	(%r31),%r21d
+	crc32	%rax, %r18
+	encodekey128	%r25d,%edx
+	encodekey256	%r25d,%edx
+	enqcmd	0x123(%r31d,%eax,4),%r25d
+	enqcmd	0x123(%r31,%rax,4),%r31
+	enqcmds	0x123(%r31d,%eax,4),%r25d
+	enqcmds	0x123(%r31,%rax,4),%r31
+	invept	0x123(%r31,%rax,4),%r31
+	invpcid	0x123(%r31,%rax,4),%r31
+	invvpid	0x123(%r31,%rax,4),%r31
+	kmovb	%k5,%r25d
+	kmovb	%k5,0x123(%r31,%rax,4)
+	kmovb	%r25d,%k5
+	kmovb	0x123(%r31,%rax,4),%k5
+	kmovd	%k5,%r25d
+	kmovd	%k5,0x123(%r31,%rax,4)
+	kmovd	%r25d,%k5
+	kmovd	0x123(%r31,%rax,4),%k5
+	kmovq	%k5,%r31
+	kmovq	%k5,0x123(%r31,%rax,4)
+	kmovq	%r31,%k5
+	kmovq	0x123(%r31,%rax,4),%k5
+	kmovw	%k5,%r25d
+	kmovw	%k5,0x123(%r31,%rax,4)
+	kmovw	%r25d,%k5
+	kmovw	0x123(%r31,%rax,4),%k5
+	ldtilecfg	0x123(%r31,%rax,4)
+	movbe	%r18w,%ax
+	movbe	%r18w,0x123(%r16,%rax,4)
+	movbe	%r18w,0x123(%r31,%rax,4)
+	movbe	%r25d,%edx
+	movbe	%r25d,0x123(%r16,%rax,4)
+	movbe	%r31,%r15
+	movbe	%r31,0x123(%r16,%rax,4)
+	movbe	%r31,0x123(%r31,%rax,4)
+	movbe	0x123(%r16,%rax,4),%r31
+	movbe	0x123(%r31,%rax,4),%r18w
+	movbe	0x123(%r31,%rax,4),%r25d
+	movdir64b	0x123(%r31d,%eax,4),%r25d
+	movdir64b	0x123(%r31,%rax,4),%r31
+	movdiri	%r25d,0x123(%r31,%rax,4)
+	movdiri	%r31,0x123(%r31,%rax,4)
+	pdep	%r25d,%edx,%r10d
+	pdep	%r31,%r15,%r11
+	pdep	0x123(%r31,%rax,4),%r25d,%edx
+	pdep	0x123(%r31,%rax,4),%r31,%r15
+	pext	%r25d,%edx,%r10d
+	pext	%r31,%r15,%r11
+	pext	0x123(%r31,%rax,4),%r25d,%edx
+	pext	0x123(%r31,%rax,4),%r31,%r15
+	sha1msg1	%xmm23,%xmm22
+	sha1msg1	0x123(%r31,%rax,4),%xmm22
+	sha1msg2	%xmm23,%xmm22
+	sha1msg2	0x123(%r31,%rax,4),%xmm22
+	sha1nexte	%xmm23,%xmm22
+	sha1nexte	0x123(%r31,%rax,4),%xmm22
+	sha1rnds4	$0x7b,%xmm23,%xmm22
+	sha1rnds4	$0x7b,0x123(%r31,%rax,4),%xmm22
+	sha256msg1	%xmm23,%xmm22
+	sha256msg1	0x123(%r31,%rax,4),%xmm22
+	sha256msg2	%xmm23,%xmm22
+	sha256msg2	0x123(%r31,%rax,4),%xmm22
+	sha256rnds2	0x123(%r31,%rax,4),%xmm12
+	shlx	%r25d,%edx,%r10d
+	shlx	%r25d,0x123(%r31,%rax,4),%edx
+	shlx	%r31,%r15,%r11
+	shlx	%r31,0x123(%r31,%rax,4),%r15
+	shrx	%r25d,%edx,%r10d
+	shrx	%r25d,0x123(%r31,%rax,4),%edx
+	shrx	%r31,%r15,%r11
+	shrx	%r31,0x123(%r31,%rax,4),%r15
+	sttilecfg	0x123(%r31,%rax,4)
+	tileloadd	0x123(%r31,%rax,4),%tmm6
+	tileloaddt1	0x123(%r31,%rax,4),%tmm6
+	tilestored	%tmm6,0x123(%r31,%rax,4)
+	wrssd	%r25d,0x123(%r31,%rax,4)
+	wrssq	%r31,0x123(%r31,%rax,4)
+	wrussd	%r25d,0x123(%r31,%rax,4)
+	wrussq	%r31,0x123(%r31,%rax,4)
+
+	.intel_syntax noprefix
+	aadd	[r31+rax*4+0x123],r25d
+	aadd	[r31+rax*4+0x123],r31
+	aand	[r31+rax*4+0x123],r25d
+	aand	[r31+rax*4+0x123],r31
+	aesdec128kl	xmm22,[r31+rax*4+0x123]
+	aesdec256kl	xmm22,[r31+rax*4+0x123]
+	aesdecwide128kl	[r31+rax*4+0x123]
+	aesdecwide256kl	[r31+rax*4+0x123]
+	aesenc128kl	xmm22,[r31+rax*4+0x123]
+	aesenc256kl	xmm22,[r31+rax*4+0x123]
+	aesencwide128kl	[r31+rax*4+0x123]
+	aesencwide256kl	[r31+rax*4+0x123]
+	aor	[r31+rax*4+0x123],r25d
+	aor	[r31+rax*4+0x123],r31
+	axor	[r31+rax*4+0x123],r25d
+	axor	[r31+rax*4+0x123],r31
+	bextr	r10d,edx,r25d
+	bextr	edx, [r31+rax*4+0x123],r25d
+	bextr	r11,r15,r31
+	bextr	r15, [r31+rax*4+0x123],r31
+	blsi	edx,r25d
+	blsi	r15,r31
+	blsi	r25d, [r31+rax*4+0x123]
+	blsi	r31,  [r31+rax*4+0x123]
+	blsmsk	edx,r25d
+	blsmsk	r15,r31
+	blsmsk	r25d, [r31+rax*4+0x123]
+	blsmsk	r31,  [r31+rax*4+0x123]
+	blsr	edx,r25d
+	blsr	r15,r31
+	blsr	r25d, [r31+rax*4+0x123]
+	blsr	r31,  [r31+rax*4+0x123]
+	bzhi	r10d,edx,r25d
+	bzhi	edx, [r31+rax*4+0x123],r25d
+	bzhi	r11,r15,r31
+	bzhi	r15, [r31+rax*4+0x123],r31
+	cmpbexadd	 [r31+rax*4+0x123],edx,r25d
+	cmpbexadd	 [r31+rax*4+0x123],r15,r31
+	cmpbxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpbxadd	 [r31+rax*4+0x123],r15,r31
+	cmplxadd	 [r31+rax*4+0x123],edx,r25d
+	cmplxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnbexadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnbexadd	 [r31+rax*4+0x123],r15,r31
+	cmpnbxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnbxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnlexadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnlexadd	 [r31+rax*4+0x123],r15,r31
+	cmpnlxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnlxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnoxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnoxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnpxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnpxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnsxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnsxadd	 [r31+rax*4+0x123],r15,r31
+	cmpnzxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpnzxadd	 [r31+rax*4+0x123],r15,r31
+	cmpoxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpoxadd	 [r31+rax*4+0x123],r15,r31
+	cmppxadd	 [r31+rax*4+0x123],edx,r25d
+	cmppxadd	 [r31+rax*4+0x123],r15,r31
+	cmpsxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpsxadd	 [r31+rax*4+0x123],r15,r31
+	cmpzxadd	 [r31+rax*4+0x123],edx,r25d
+	cmpzxadd	 [r31+rax*4+0x123],r15,r31
+	crc32	r22,r31
+	crc32	r22,QWORD PTR [r31]
+	crc32	r17,r19b
+	crc32	r21d,r19b
+	crc32	ebx,BYTE PTR [r19]
+	crc32	r23d,r31d
+	crc32	r23d,DWORD PTR [r31]
+	crc32	r21d,r31w
+	crc32	r21d,WORD PTR [r31]
+	crc32	r18,rax
+	encodekey128	edx,r25d
+	encodekey256	edx,r25d
+	enqcmd	r25d,[r31d+eax*4+0x123]
+	enqcmd	r31,[r31+rax*4+0x123]
+	enqcmds	r25d,[r31d+eax*4+0x123]
+	enqcmds	r31,[r31+rax*4+0x123]
+	invept	r31,OWORD PTR [r31+rax*4+0x123]
+	invpcid	r31,[r31+rax*4+0x123]
+	invvpid	r31,OWORD PTR [r31+rax*4+0x123]
+	kmovb	r25d,k5
+	kmovb	BYTE PTR [r31+rax*4+0x123],k5
+	kmovb	k5,r25d
+	kmovb	k5,BYTE PTR [r31+rax*4+0x123]
+	kmovd	r25d,k5
+	kmovd	DWORD PTR [r31+rax*4+0x123],k5
+	kmovd	k5,r25d
+	kmovd	k5,DWORD PTR [r31+rax*4+0x123]
+	kmovq	r31,k5
+	kmovq	QWORD PTR [r31+rax*4+0x123],k5
+	kmovq	k5,r31
+	kmovq	k5,QWORD PTR [r31+rax*4+0x123]
+	kmovw	r25d,k5
+	kmovw	WORD PTR [r31+rax*4+0x123],k5
+	kmovw	k5,r25d
+	kmovw	k5,WORD PTR [r31+rax*4+0x123]
+	ldtilecfg	[r31+rax*4+0x123]
+	movbe	ax,r18w
+	movbe	WORD PTR [r16+rax*4+0x123],r18w
+	movbe	WORD PTR [r31+rax*4+0x123],r18w
+	movbe	edx,r25d
+	movbe	DWORD PTR [r16+rax*4+0x123],r25d
+	movbe	r15,r31
+	movbe	QWORD PTR [r16+rax*4+0x123],r31
+	movbe	QWORD PTR [r31+rax*4+0x123],r31
+	movbe	r31,QWORD PTR [r16+rax*4+0x123]
+	movbe	r18w,WORD PTR [r31+rax*4+0x123]
+	movbe	r25d,DWORD PTR [r31+rax*4+0x123]
+	movdir64b	r25d,[r31d+eax*4+0x123]
+	movdir64b	r31,[r31+rax*4+0x123]
+	movdiri	DWORD PTR [r31+rax*4+0x123],r25d
+	movdiri	QWORD PTR [r31+rax*4+0x123],r31
+	pdep	r10d,edx,r25d
+	pdep	r11,r15,r31
+	pdep	edx,r25d,DWORD PTR [r31+rax*4+0x123]
+	pdep	r15,r31,QWORD PTR [r31+rax*4+0x123]
+	pext	r10d,edx,r25d
+	pext	r11,r15,r31
+	pext	edx,r25d,DWORD PTR [r31+rax*4+0x123]
+	pext	r15,r31,QWORD PTR [r31+rax*4+0x123]
+	sha1msg1	xmm22,xmm23
+	sha1msg1	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1msg2	xmm22,xmm23
+	sha1msg2	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1nexte	xmm22,xmm23
+	sha1nexte	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha1rnds4	xmm22,xmm23,0x7b
+	sha1rnds4	xmm22,XMMWORD PTR [r31+rax*4+0x123],0x7b
+	sha256msg1	xmm22,xmm23
+	sha256msg1	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha256msg2	xmm22,xmm23
+	sha256msg2	xmm22,XMMWORD PTR [r31+rax*4+0x123]
+	sha256rnds2	xmm12,XMMWORD PTR [r31+rax*4+0x123]
+	shlx	r10d,edx,r25d
+	shlx	edx,DWORD PTR [r31+rax*4+0x123],r25d
+	shlx	r11,r15,r31
+	shlx	r15,QWORD PTR [r31+rax*4+0x123],r31
+	shrx	r10d,edx,r25d
+	shrx	edx,DWORD PTR [r31+rax*4+0x123],r25d
+	shrx	r11,r15,r31
+	shrx	r15,QWORD PTR [r31+rax*4+0x123],r31
+	sttilecfg	[r31+rax*4+0x123]
+	tileloadd	tmm6,[r31+rax*4+0x123]
+	tileloaddt1	tmm6,[r31+rax*4+0x123]
+	tilestored	[r31+rax*4+0x123],tmm6
+	wrssd	DWORD PTR [r31+rax*4+0x123],r25d
+	wrssq	QWORD PTR [r31+rax*4+0x123],r31
+	wrussd	DWORD PTR [r31+rax*4+0x123],r25d
+	wrussq	QWORD PTR [r31+rax*4+0x123],r31
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 4a59a726ecb..f6b6bb2f426 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -364,7 +364,12 @@ run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
 run_list_test "x86-64-apx-egpr-inval"
+run_dump_test "x86-64-apx-evex-promoted-bad"
+run_list_test "x86-64-apx-egpr-promote-inval" "-al"
 run_dump_test "x86-64-apx-rex2"
+run_dump_test "x86-64-apx-evex-promoted"
+run_dump_test "x86-64-apx-evex-promoted-intel"
+run_dump_test "x86-64-apx-evex-egpr"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 6/9] Support APX NDD
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
                   ` (3 preceding siblings ...)
  2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
@ 2023-11-24  7:02 ` Cui, Lili
  2023-12-08 14:12   ` Jan Beulich
  2023-12-08 14:27   ` Jan Beulich
  2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, konglin1

From: konglin1 <lingling.kong@intel.com>

opcodes/ChangeLog:

	* opcodes/i386-dis-evex-prefix.h: Add NDD decode for adox/adcx.
	* opcodes/i386-dis-evex-reg.h: Handle for REG_EVEX_MAP4_80,
	REG_EVEX_MAP4_81, REG_EVEX_MAP4_83,  REG_EVEX_MAP4_F6,
	REG_EVEX_MAP4_F7, REG_EVEX_MAP4_FE, REG_EVEX_MAP4_FF.
	* opcodes/i386-dis-evex.h: Add NDD insn.
	* opcodes/i386-dis.c (VexGb): Add new define.
	(VexGv): Ditto.
	(get_valid_dis386): Change for NDD decode.
	(print_insn): Ditto.
	(print_register): Ditto.
	(intel_operand_size): Ditto.
	(OP_E_memory): Ditto.
	(OP_VEX): Ditto.
	* opcodes/i386-opc.h (VexVVVV_SRC): New.
	VexVVVV_DST):  Ditto.
	* opcodes/i386-opc.tbl: Add APX NDD instructions and adjust VexVVVV.
	* opcodes/i386-tbl.h: Regenerated.

gas/ChangeLog:

	* gas/config/tc-i386.c (is_any_apx_evex_encoding): Add legacy insn
	promote to SPACE_EVEXMAP4.
	(md_assemble): Change for ndd encode.
	(process_operands): Ditto.
	(build_modrm_byte): Ditto.
	(operand_size_match):
	Support APX NDD that the number of operands is 3.
	(match_template): Support swap the first two operands for
	APX NDD.
	reg_table
	* testsuite/gas/i386/x86-64.exp: Add x86-64-apx-ndd.
	* testsuite/gas/i386/x86-64-apx-ndd.d: New test.
	* testsuite/gas/i386/x86-64-apx-ndd.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d : Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s : Ditto.
---
 gas/config/tc-i386.c                          |  82 ++++++---
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |   2 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |   2 +
 gas/testsuite/gas/i386/x86-64-apx-ndd.d       | 160 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64-apx-ndd.s       | 155 ++++++++++++++++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  42 +++++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  43 +++++
 gas/testsuite/gas/i386/x86-64.exp             |   1 +
 opcodes/i386-dis-evex-reg.h                   |  55 ++++++
 opcodes/i386-dis-evex.h                       | 124 ++++++-------
 opcodes/i386-dis.c                            | 169 +++++++++++-------
 opcodes/i386-opc.h                            |   6 +-
 opcodes/i386-opc.tbl                          |  89 +++++++++
 13 files changed, 775 insertions(+), 155 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index ba8001fe1c8..1efda914150 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -2242,8 +2242,10 @@ operand_size_match (const insn_template *t)
       unsigned int given = i.operands - j - 1;
 
       /* For FMA4 and XOP insns VEX.W controls just the first two
-	 register operands.  */
-      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
+	 register operands. And APX_F insns just swap the two source operands,
+	 with the 3rd one being the destination.  */
+      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
+	  || is_cpu (t, CpuAPX_F))
 	given = j < 2 ? 1 - j : j;
 
       if (t->operand_types[j].bitfield.class == Reg
@@ -4180,6 +4182,11 @@ build_apx_evex_prefix (void)
   if (i.vex.register_specifier
       && i.vex.register_specifier->reg_flags & RegRex2)
     i.vex.bytes[3] &= ~0x08;
+
+  /* Encode the NDD bit of the instruction promoted from the legacy
+     space.  */
+  if (i.vex.register_specifier && i.tm.opcode_space == SPACE_EVEXMAP4)
+    i.vex.bytes[3] |= 0x10;
 }
 
 static void
@@ -7404,18 +7411,22 @@ match_template (char mnem_suffix)
 	     - the store form is requested, and the template is a load form,
 	     - the non-default (swapped) form is requested.  */
 	  overlap1 = operand_type_and (operand_types[0], operand_types[1]);
+
+	  j = i.operands - 1 - (t->opcode_space == SPACE_EVEXMAP4
+				&& t->opcode_modifier.vexvvvv);
+
 	  if (t->opcode_modifier.d && i.reg_operands == i.operands
 	      && !operand_type_all_zero (&overlap1))
 	    switch (i.dir_encoding)
 	      {
 	      case dir_encoding_load:
-		if (operand_type_check (operand_types[i.operands - 1], anymem)
+		if (operand_type_check (operand_types[j], anymem)
 		    || t->opcode_modifier.regmem)
 		  goto check_reverse;
 		break;
 
 	      case dir_encoding_store:
-		if (!operand_type_check (operand_types[i.operands - 1], anymem)
+		if (!operand_type_check (operand_types[j], anymem)
 		    && !t->opcode_modifier.regmem)
 		  goto check_reverse;
 		break;
@@ -7426,6 +7437,7 @@ match_template (char mnem_suffix)
 	      case dir_encoding_default:
 		break;
 	      }
+
 	  /* If we want store form, we skip the current load.  */
 	  if ((i.dir_encoding == dir_encoding_store
 	       || i.dir_encoding == dir_encoding_swap)
@@ -7455,11 +7467,13 @@ match_template (char mnem_suffix)
 		continue;
 	      /* Try reversing direction of operands.  */
 	      j = is_cpu (t, CpuFMA4)
-		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
+		  || is_cpu (t, CpuXOP)
+		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
 	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
 	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
 	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
-	      gas_assert (t->operands != 3 || !check_register);
+	      gas_assert (t->operands != 3 || !check_register
+			  || is_cpu (t, CpuAPX_F));
 	      if (!operand_type_match (overlap0, i.types[0])
 		  || !operand_type_match (overlap1, i.types[j])
 		  || (t->operands == 3
@@ -7494,6 +7508,11 @@ match_template (char mnem_suffix)
 		  found_reverse_match = Opcode_VexW;
 		  goto check_operands_345;
 		}
+	      else if (is_cpu (t, CpuAPX_F) && i.operands == 3)
+		{
+		  found_reverse_match = Opcode_D;
+		  goto check_operands_345;
+		}
 	      else if (t->opcode_space != SPACE_BASE
 		       && (t->opcode_space != SPACE_0F
 			   /* MOV to/from CR/DR/TR, as an exception, follow
@@ -7667,6 +7686,9 @@ match_template (char mnem_suffix)
 
       i.tm.base_opcode ^= found_reverse_match;
 
+      if (i.tm.opcode_space == SPACE_EVEXMAP4)
+	goto swap_first_2;
+
       /* Certain SIMD insns have their load forms specified in the opcode
 	 table, and hence we need to _set_ RegMem instead of clearing it.
 	 We need to avoid setting the bit though on insns like KMOVW.  */
@@ -7686,6 +7708,7 @@ match_template (char mnem_suffix)
 	 flipping VEX.W.  */
       i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
 
+    swap_first_2:
       j = i.tm.operand_types[0].bitfield.imm8;
       i.tm.operand_types[j] = operand_types[j + 1];
       i.tm.operand_types[j + 1] = operand_types[j];
@@ -8511,12 +8534,9 @@ process_operands (void)
      unnecessary segment overrides.  */
   const reg_entry *default_seg = NULL;
 
-  /* We only need to check those implicit registers for instructions
-     with 3 operands or less.  */
-  if (i.operands <= 3)
-    for (unsigned int j = 0; j < i.operands; j++)
-      if (i.types[j].bitfield.instance != InstanceNone)
-	i.reg_operands--;
+  for (unsigned int j = 0; j < i.operands; j++)
+    if (i.types[j].bitfield.instance != InstanceNone)
+      i.reg_operands--;
 
   if (i.tm.opcode_modifier.sse2avx)
     {
@@ -8870,25 +8890,33 @@ build_modrm_byte (void)
 				     || i.vec_encoding == vex_encoding_evex));
     }
 
-  for (v = source + 1; v < dest; ++v)
-    if (v != reg_slot)
-      break;
-  if (v >= dest)
-    v = ~0;
-  if (i.tm.extension_opcode != None)
+  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
     {
-      if (dest != source)
-	v = dest;
-      dest = ~0;
+      v = dest;
+      dest-- ;
     }
-  gas_assert (source < dest);
-  if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
-      && source != op)
+  else
     {
-      unsigned int tmp = source;
+      for (v = source + 1; v < dest; ++v)
+	if (v != reg_slot)
+	  break;
+      if (v >= dest)
+	v = ~0;
+      if (i.tm.extension_opcode != None)
+	{
+	  if (dest != source)
+	    v = dest;
+	  dest = ~0;
+	}
+      gas_assert (source < dest);
+      if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
+	  && source != op)
+	{
+	  unsigned int tmp = source;
 
-      source = v;
-      v = tmp;
+	  source = v;
+	  v = tmp;
+	}
     }
 
   if (v < MAX_OPERANDS)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
index 07760240793..2ae0f1a358f 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -27,4 +27,6 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:[ 	]+c8 ff ff ff[ 	]+enter  \$0xffff,\$0xff
 [ 	]*[a-f0-9]+:[ 	]+67 62 f2 7c 18 f5[ 	]+addr32 \(bad\)
 [ 	]*[a-f0-9]+:[ 	]+0b ff[ 	]+or     %edi,%edi
+[ 	]*[a-f0-9]+:[ 	]+62 f4 fc 08 ff[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+d8[ 	]+.byte 0xd8
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
index bfec0652d13..c4646dcadb4 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -26,3 +26,5 @@ _start:
 	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[20](EVEX.b) == 1 (illegal value).
 	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax ,(%ebx){1to8}, %ecx
 	.byte 0xff
+	#{evex} inc %rax %rbx EVEX.vvvv' != 1111 && EVEX.ND = 0.
+	.insn EVEX.L0.NP.M4.W1 0xff, %rax, %rbx
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd.d b/gas/testsuite/gas/i386/x86-64-apx-ndd.d
new file mode 100644
index 00000000000..73410606ce3
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.d
@@ -0,0 +1,160 @@
+#as:
+#objdump: -dw
+#name: x86-64 APX NDD instructions with evex prefix encoding
+#source: x86-64-apx-ndd.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 d0 34 12 	adc    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 10 f9    	adc    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 12 04 07 	adc    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 13 04 07 	adc    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 14 83 11 	adcl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 54 6d 10 66 c7    	adcx   %r15d,%r8d,%r18d
+\s*[a-f0-9]+:\s*62 14 f9 08 66 04 3f 	adcx   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*62 14 69 10 66 04 3f 	adcx   \(%r15,%r31,1\),%r8d,%r18d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 c0 34 12 	add    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 d4 fc 10 81 c7 33 44 34 12 	add    \$0x12344433,%r15,%r16
+\s*[a-f0-9]+:\s*62 d4 74 10 80 c5 34 	add    \$0x34,%r13b,%r17b
+\s*[a-f0-9]+:\s*62 f4 bc 18 81 c0 11 22 33 f4 	add    \$0xfffffffff4332211,%rax,%r8
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+\s*[a-f0-9]+:\s*62 44 f8 10 01 3c c0 	add    %r31,\(%r8,%r16,8\),%r16
+\s*[a-f0-9]+:\s*62 44 7c 10 00 f8    	add    %r31b,%r8b,%r16b
+\s*[a-f0-9]+:\s*62 44 7c 10 01 f8    	add    %r31d,%r8d,%r16d
+\s*[a-f0-9]+:\s*62 44 7d 10 01 f8    	add    %r31w,%r8w,%r16w
+\s*[a-f0-9]+:\s*62 5c fc 10 03 07    	add    \(%r31\),%r8,%r16
+\s*[a-f0-9]+:\s*62 5c f8 10 03 84 07 90 90 00 00 	add    0x9090\(%r31,%r16,1\),%r8,%r16
+\s*[a-f0-9]+:\s*62 44 7c 10 00 f8    	add    %r31b,%r8b,%r16b
+\s*[a-f0-9]+:\s*62 44 7c 10 01 f8    	add    %r31d,%r8d,%r16d
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 04 83 11 	addl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 d4 fc 10 81 04 8f 33 44 34 12 	addq   \$0x12344433,\(%r15,%rcx,4\),%r16
+\s*[a-f0-9]+:\s*62 44 7d 10 01 f8    	add    %r31w,%r8w,%r16w
+\s*[a-f0-9]+:\s*62 54 6e 10 66 c7    	adox   %r15d,%r8d,%r18d
+\s*[a-f0-9]+:\s*62 5c fc 10 03 c7    	add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+\s*[a-f0-9]+:\s*62 14 fa 08 66 04 3f 	adox   \(%r15,%r31,1\),%r8
+\s*[a-f0-9]+:\s*62 14 6a 10 66 04 3f 	adox   \(%r15,%r31,1\),%r8d,%r18d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 e0 34 12 	and    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 20 f9    	and    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 22 04 07 	and    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 23 04 07 	and    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 24 83 11 	andl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 47 90 90 90 90 90 	cmova  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 43 90 90 90 90 90 	cmovae -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 42 90 90 90 90 90 	cmovb  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 46 90 90 90 90 90 	cmovbe -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 44 90 90 90 90 90 	cmove  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4f 90 90 90 90 90 	cmovg  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4d 90 90 90 90 90 	cmovge -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4c 90 90 90 90 90 	cmovl  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4e 90 90 90 90 90 	cmovle -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 45 90 90 90 90 90 	cmovne -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 41 90 90 90 90 90 	cmovno -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 49 90 90 90 90 90 	cmovns -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 40 90 90 90 90 90 	cmovo  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 48 90 90 90 90 90 	cmovs  -0x6f6f6f70\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*62 f4 f4 10 ff c8    	dec    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 fe 0c 27 	decb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 b4 b0 10 af 94 f8 09 09 00 00 	imul   0x909\(%rax,%r31,8\),%rdx,%r25
+\s*[a-f0-9]+:\s*67 62 f4 3c 18 af 90 09 09 09 00 	imul   0x90909\(%eax\),%edx,%r8d
+\s*[a-f0-9]+:\s*62 dc fc 10 ff c7    	inc    %r31,%r16
+\s*[a-f0-9]+:\s*62 dc bc 18 ff c7    	inc    %r31,%r8
+\s*[a-f0-9]+:\s*62 f4 e4 18 ff c0    	inc    %rax,%rbx
+\s*[a-f0-9]+:\s*62 f4 f4 10 f7 d8    	neg    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 f6 1c 27 	negb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 f4 f4 10 f7 d0    	not    %rax,%r17
+\s*[a-f0-9]+:\s*62 9c 3c 18 f6 14 27 	notb   \(%r31,%r12,1\),%r8b
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 c8 34 12 	or     \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 08 f9    	or     %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 0a 04 07 	or     \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 0b 04 07 	or     \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 0c 83 11 	orl    \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 d4 02 	rcl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 d0    	rcl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 10    	rclb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 10 02 	rcll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 10    	rclw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 14 83 	rclw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 dc 02 	rcr    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 d8    	rcr    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 18    	rcrb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 18 02 	rcrl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 18    	rcrw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 1c 83 	rcrw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 c4 02 	rol    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 c0    	rol    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 00    	rolb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 00 02 	roll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 00    	rolw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 04 83 	rolw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 cc 02 	ror    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 c8    	ror    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 08    	rorb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 08 02 	rorl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 08    	rorw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 0c 83 	rorw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 fc 02 	sar    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 f8    	sar    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 38    	sarb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 38 02 	sarl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 38    	sarw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 3c 83 	sarw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 d8 34 12 	sbb    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 18 f9    	sbb    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 1a 04 07 	sbb    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 1b 04 07 	sbb    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 1c 83 11 	sbbl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 e4 02 	shl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 e4 02 	shl    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e0    	shl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e0    	shl    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 20    	shlb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 20    	shlb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 74 84 10 24 20 01 	shld   \$0x1,%r12,\(%rax\),%r31
+\s*[a-f0-9]+:\s*62 74 04 10 24 38 02 	shld   \$0x2,%r15d,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 54 05 10 24 c4 02 	shld   \$0x2,%r8w,%r12w,%r31w
+\s*[a-f0-9]+:\s*62 7c bc 18 a5 e0    	shld   %cl,%r12,%r16,%r8
+\s*[a-f0-9]+:\s*62 7c 05 10 a5 2c 83 	shld   %cl,%r13w,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 74 05 10 a5 08    	shld   %cl,%r9w,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 20 02 	shll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 20 02 	shll   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 20    	shlw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 20    	shlw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 24 83 	shlw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 24 83 	shlw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 d4 04 10 c0 ec 02 	shr    \$0x2,%r12b,%r31b
+\s*[a-f0-9]+:\s*62 fc 3c 18 d2 e8    	shr    %cl,%r16b,%r8b
+\s*[a-f0-9]+:\s*62 f4 04 10 d0 28    	shrb   \$1,\(%rax\),%r31b
+\s*[a-f0-9]+:\s*62 74 84 10 2c 20 01 	shrd   \$0x1,%r12,\(%rax\),%r31
+\s*[a-f0-9]+:\s*62 74 04 10 2c 38 02 	shrd   \$0x2,%r15d,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 54 05 10 2c c4 02 	shrd   \$0x2,%r8w,%r12w,%r31w
+\s*[a-f0-9]+:\s*62 7c bc 18 ad e0    	shrd   %cl,%r12,%r16,%r8
+\s*[a-f0-9]+:\s*62 7c 05 10 ad 2c 83 	shrd   %cl,%r13w,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 74 05 10 ad 08    	shrd   %cl,%r9w,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 f4 04 10 c1 28 02 	shrl   \$0x2,\(%rax\),%r31d
+\s*[a-f0-9]+:\s*62 f4 05 10 d1 28    	shrw   \$1,\(%rax\),%r31w
+\s*[a-f0-9]+:\s*62 fc 05 10 d3 2c 83 	shrw   %cl,\(%r19,%rax,4\),%r31w
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 e8 34 12 	sub    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 28 f9    	sub    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 2a 04 07 	sub    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 2b 04 07 	sub    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 2c 83 11 	subl   \$0x11,\(%r19,%rax,4\),%r20d
+\s*[a-f0-9]+:\s*62 f4 0d 10 81 f0 34 12 	xor    \$0x1234,%ax,%r30w
+\s*[a-f0-9]+:\s*62 7c 6c 10 30 f9    	xor    %r15b,%r17b,%r18b
+\s*[a-f0-9]+:\s*62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+\s*[a-f0-9]+:\s*62 c4 3c 18 32 04 07 	xor    \(%r15,%rax,1\),%r16b,%r8b
+\s*[a-f0-9]+:\s*62 c4 3d 18 33 04 07 	xor    \(%r15,%rax,1\),%r16w,%r8w
+\s*[a-f0-9]+:\s*62 fc 5c 10 83 34 83 11 	xorl   \$0x11,\(%r19,%rax,4\),%r20d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd.s b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
new file mode 100644
index 00000000000..c6edaace312
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
@@ -0,0 +1,155 @@
+# Check 64bit APX NDD instructions with evex prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+	adc    $0x1234,%ax,%r30w
+	adc    %r15b,%r17b,%r18b
+	adc    %r15d,(%r8),%r18d
+	adc    (%r15,%rax,1),%r16b,%r8b
+	adc    (%r15,%rax,1),%r16w,%r8w
+	adcl   $0x11,(%r19,%rax,4),%r20d
+	adcx   %r15d,%r8d,%r18d
+	adcx   (%r15,%r31,1),%r8
+	adcx   (%r15,%r31,1),%r8d,%r18d
+	add    $0x1234,%ax,%r30w
+	add    $0x12344433,%r15,%r16
+	add    $0x34,%r13b,%r17b
+	add    $0xfffffffff4332211,%rax,%r8
+	add    %r31,%r8,%r16
+	add    %r31,(%r8),%r16
+	add    %r31,(%r8,%r16,8),%r16
+	add    %r31b,%r8b,%r16b
+	add    %r31d,%r8d,%r16d
+	add    %r31w,%r8w,%r16w
+	add    (%r31),%r8,%r16
+	add    0x9090(%r31,%r16,1),%r8,%r16
+	addb    %r31b,%r8b,%r16b
+	addl    %r31d,%r8d,%r16d
+	addl   $0x11,(%r19,%rax,4),%r20d
+	addq    %r31,%r8,%r16
+	addq   $0x12344433,(%r15,%rcx,4),%r16
+	addw    %r31w,%r8w,%r16w
+	adox   %r15d,%r8d,%r18d
+	{load}  add    %r31,%r8,%r16
+	{store} add    %r31,%r8,%r16
+	adox   (%r15,%r31,1),%r8
+	adox   (%r15,%r31,1),%r8d,%r18d
+	and    $0x1234,%ax,%r30w
+	and    %r15b,%r17b,%r18b
+	and    %r15d,(%r8),%r18d
+	and    (%r15,%rax,1),%r16b,%r8b
+	and    (%r15,%rax,1),%r16w,%r8w
+	andl   $0x11,(%r19,%rax,4),%r20d
+	cmova  0x90909090(%eax),%edx,%r8d
+	cmovae 0x90909090(%eax),%edx,%r8d
+	cmovb  0x90909090(%eax),%edx,%r8d
+	cmovbe 0x90909090(%eax),%edx,%r8d
+	cmove  0x90909090(%eax),%edx,%r8d
+	cmovg  0x90909090(%eax),%edx,%r8d
+	cmovge 0x90909090(%eax),%edx,%r8d
+	cmovl  0x90909090(%eax),%edx,%r8d
+	cmovle 0x90909090(%eax),%edx,%r8d
+	cmovne 0x90909090(%eax),%edx,%r8d
+	cmovno 0x90909090(%eax),%edx,%r8d
+	cmovnp 0x90909090(%eax),%edx,%r8d
+	cmovns 0x90909090(%eax),%edx,%r8d
+	cmovo  0x90909090(%eax),%edx,%r8d
+	cmovp  0x90909090(%eax),%edx,%r8d
+	cmovs  0x90909090(%eax),%edx,%r8d
+	dec    %rax,%r17
+	decb   (%r31,%r12,1),%r8b
+	imul   0x909(%rax,%r31,8),%rdx,%r25
+	imul   0x90909(%eax),%edx,%r8d
+	inc    %r31,%r16
+	inc    %r31,%r8
+	inc    %rax,%rbx
+	neg    %rax,%r17
+	negb   (%r31,%r12,1),%r8b
+	not    %rax,%r17
+	notb   (%r31,%r12,1),%r8b
+	or     $0x1234,%ax,%r30w
+	or     %r15b,%r17b,%r18b
+	or     %r15d,(%r8),%r18d
+	or     (%r15,%rax,1),%r16b,%r8b
+	or     (%r15,%rax,1),%r16w,%r8w
+	orl    $0x11,(%r19,%rax,4),%r20d
+	rcl    $0x2,%r12b,%r31b
+	rcl    %cl,%r16b,%r8b
+	rclb   $0x1, (%rax),%r31b
+	rcll   $0x2,(%rax),%r31d
+	rclw   $0x1, (%rax),%r31w
+	rclw   %cl,(%r19,%rax,4),%r31w
+	rcr    $0x2,%r12b,%r31b
+	rcr    %cl,%r16b,%r8b
+	rcrb   (%rax),%r31b
+	rcrl   $0x2,(%rax),%r31d
+	rcrw   $0x1, (%rax),%r31w
+	rcrw   %cl,(%r19,%rax,4),%r31w
+	rol    $0x2,%r12b,%r31b
+	rol    %cl,%r16b,%r8b
+	rolb   $0x1, (%rax),%r31b
+	roll   $0x2,(%rax),%r31d
+	rolw   $0x1, (%rax),%r31w
+	rolw   %cl,(%r19,%rax,4),%r31w
+	ror    $0x2,%r12b,%r31b
+	ror    %cl,%r16b,%r8b
+	rorb   $0x1, (%rax),%r31b
+	rorl   $0x2,(%rax),%r31d
+	rorw   $0x1, (%rax),%r31w
+	rorw   %cl,(%r19,%rax,4),%r31w
+	sar    $0x2,%r12b,%r31b
+	sar    %cl,%r16b,%r8b
+	sarb   $0x1, (%rax),%r31b
+	sarl   $0x2,(%rax),%r31d
+	sarw   $0x1, (%rax),%r31w
+	sarw   %cl,(%r19,%rax,4),%r31w
+	sbb    $0x1234,%ax,%r30w
+	sbb    %r15b,%r17b,%r18b
+	sbb    %r15d,(%r8),%r18d
+	sbb    (%r15,%rax,1),%r16b,%r8b
+	sbb    (%r15,%rax,1),%r16w,%r8w
+	sbbl   $0x11,(%r19,%rax,4),%r20d
+	shl    $0x2,%r12b,%r31b
+	shl    $0x2,%r12b,%r31b
+	shl    %cl,%r16b,%r8b
+	shl    %cl,%r16b,%r8b
+	shlb   $0x1, (%rax),%r31b
+	shlb   $0x1, (%rax),%r31b
+	shld   $0x1,%r12,(%rax),%r31
+	shld   $0x2,%r15d,(%rax),%r31d
+	shld   $0x2,%r8w,%r12w,%r31w
+	shld   %cl,%r12,%r16,%r8
+	shld   %cl,%r13w,(%r19,%rax,4),%r31w
+	shld   %cl,%r9w,(%rax),%r31w
+	shll   $0x2,(%rax),%r31d
+	shll   $0x2,(%rax),%r31d
+	shlw   $0x1, (%rax),%r31w
+	shlw   $0x1, (%rax),%r31w
+	shlw   %cl,(%r19,%rax,4),%r31w
+	shlw   %cl,(%r19,%rax,4),%r31w
+	shr    $0x2,%r12b,%r31b
+	shr    %cl,%r16b,%r8b
+	shrb   $0x1, (%rax),%r31b
+	shrd   $0x1,%r12,(%rax),%r31
+	shrd   $0x2,%r15d,(%rax),%r31d
+	shrd   $0x2,%r8w,%r12w,%r31w
+	shrd   %cl,%r12,%r16,%r8
+	shrd   %cl,%r13w,(%r19,%rax,4),%r31w
+	shrd   %cl,%r9w,(%rax),%r31w
+	shrl   $0x2,(%rax),%r31d
+	shrw   $0x1, (%rax),%r31w
+	shrw   %cl,(%r19,%rax,4),%r31w
+	sub    $0x1234,%ax,%r30w
+	sub    %r15b,%r17b,%r18b
+	sub    %r15d,(%r8),%r18d
+	sub    (%r15,%rax,1),%r16b,%r8b
+	sub    (%r15,%rax,1),%r16w,%r8w
+	subl   $0x11,(%r19,%rax,4),%r20d
+	xor    $0x1234,%ax,%r30w
+	xor    %r15b,%r17b,%r18b
+	xor    %r15d,(%r8),%r18d
+	xor    (%r15,%rax,1),%r16b,%r8b
+	xor    (%r15,%rax,1),%r16w,%r8w
+	xorl   $0x11,(%r19,%rax,4),%r20d
+
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 708c22b5899..1d399ffa949 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -137,6 +137,48 @@ Disassembly of section .text:
  +[a-f0-9]+:	33 07                	xor    \(%rdi\),%eax
  +[a-f0-9]+:	31 07                	xor    %eax,\(%rdi\)
  +[a-f0-9]+:	33 07                	xor    \(%rdi\),%eax
+ +[a-f0-9]+:	62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+ +[a-f0-9]+:	62 44 fc 10 03 38    	add    \(%r8\),%r31,%r16
+ +[a-f0-9]+:	62 44 fc 10 01 38    	add    %r31,\(%r8\),%r16
+ +[a-f0-9]+:	62 44 fc 10 03 38    	add    \(%r8\),%r31,%r16
+ +[a-f0-9]+:	62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 2b 38    	sub    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 29 38    	sub    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 2b 38    	sub    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 1b 38    	sbb    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 19 38    	sbb    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 1b 38    	sbb    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 23 38    	and    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 21 38    	and    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 23 38    	and    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 0b 38    	or     \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 09 38    	or     %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 0b 38    	or     \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 33 38    	xor    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 31 38    	xor    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 33 38    	xor    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 13 38    	adc    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 54 6c 10 11 38    	adc    %r15d,\(%r8\),%r18d
+ +[a-f0-9]+:	62 54 6c 10 13 38    	adc    \(%r8\),%r15d,%r18d
+ +[a-f0-9]+:	62 44 fc 10 01 f8    	add    %r31,%r8,%r16
+ +[a-f0-9]+:	62 5c fc 10 03 c7    	add    %r31,%r8,%r16
+ +[a-f0-9]+:	62 7c 6c 10 28 f9    	sub    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 2a cf    	sub    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 18 f9    	sbb    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 1a cf    	sbb    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 20 f9    	and    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 22 cf    	and    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 08 f9    	or     %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 0a cf    	or     %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 30 f9    	xor    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 32 cf    	xor    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 7c 6c 10 10 f9    	adc    %r15b,%r17b,%r18b
+ +[a-f0-9]+:	62 c4 6c 10 12 cf    	adc    %r15b,%r17b,%r18b
  +[a-f0-9]+:	b0 12                	mov    \$0x12,%al
  +[a-f0-9]+:	b8 45 03 00 00       	mov    \$0x345,%eax
  +[a-f0-9]+:	b0 12                	mov    \$0x12,%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index 29a0c3368fc..e5b3a0d625d 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -134,6 +134,49 @@ _start:
 	{load} xor (%rdi), %eax
 	{store} xor %eax, (%rdi)
 	{store} xor (%rdi), %eax
+	{load}  add    %r31,(%r8),%r16
+	{load}	add    (%r8),%r31,%r16
+	{store} add    %r31,(%r8),%r16
+	{store}	add    (%r8),%r31,%r16
+	{load} 	sub    %r15d,(%r8),%r18d
+	{load}	sub    (%r8),%r15d,%r18d
+	{store} sub    %r15d,(%r8),%r18d
+	{store} sub    (%r8),%r15d,%r18d
+	{load} 	sbb    %r15d,(%r8),%r18d
+	{load}	sbb    (%r8),%r15d,%r18d
+	{store} sbb    %r15d,(%r8),%r18d
+	{store} sbb    (%r8),%r15d,%r18d
+	{load} 	and    %r15d,(%r8),%r18d
+	{load}	and    (%r8),%r15d,%r18d
+	{store} and    %r15d,(%r8),%r18d
+	{store} and    (%r8),%r15d,%r18d
+	{load} 	or     %r15d,(%r8),%r18d
+	{load}	or     (%r8),%r15d,%r18d
+	{store} or     %r15d,(%r8),%r18d
+	{store} or     (%r8),%r15d,%r18d
+	{load} 	xor    %r15d,(%r8),%r18d
+	{load}	xor    (%r8),%r15d,%r18d
+	{store} xor    %r15d,(%r8),%r18d
+	{store} xor    (%r8),%r15d,%r18d
+	{load} 	adc    %r15d,(%r8),%r18d
+	{load}	adc    (%r8),%r15d,%r18d
+	{store} adc    %r15d,(%r8),%r18d
+	{store} adc    (%r8),%r15d,%r18d
+
+	{store} add    %r31,%r8,%r16
+	{load}  add    %r31,%r8,%r16
+	{store} sub    %r15b,%r17b,%r18b
+	{load}	sub    %r15b,%r17b,%r18b
+	{store}	sbb    %r15b,%r17b,%r18b
+	{load}	sbb    %r15b,%r17b,%r18b
+	{store}	and    %r15b,%r17b,%r18b
+	{load}	and    %r15b,%r17b,%r18b
+	{store}	or     %r15b,%r17b,%r18b
+	{load}	or     %r15b,%r17b,%r18b
+	{store}	xor    %r15b,%r17b,%r18b
+	{load}	xor    %r15b,%r17b,%r18b
+	{store}	adc    %r15b,%r17b,%r18b
+	{load}	adc    %r15b,%r17b,%r18b
 
 	.irp m, mov, adc, add, and, cmp, or, sbb, sub, test, xor
 	\m	$0x12, %al
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index f6b6bb2f426..c28e4e7e333 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -370,6 +370,7 @@ run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-apx-evex-promoted"
 run_dump_test "x86-64-apx-evex-promoted-intel"
 run_dump_test "x86-64-apx-evex-egpr"
+run_dump_test "x86-64-apx-ndd"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 8374f0ea93a..b7f87c2fa39 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -56,3 +56,58 @@
     { "blsmskS",	{ VexGdq, Edq }, 0 },
     { "blsiS",	{ VexGdq, Edq }, 0 },
   },
+  /* REG_EVEX_MAP4_80 */
+  {
+    { "addA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "orA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "adcA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "sbbA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "andA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "subA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "xorA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+  },
+  /* REG_EVEX_MAP4_81 */
+  {
+    { "addQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "orQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "adcQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "sbbQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "andQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "subQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+    { "xorQ",	{ VexGv, Ev, Iv }, PREFIX_NP_OR_DATA },
+  },
+  /* REG_EVEX_MAP4_83 */
+  {
+    { "addQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "orQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "adcQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "sbbQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "andQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "subQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+    { "xorQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
+  },
+  /* REG_EVEX_MAP4_F6 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "notA",	{ VexGb, Eb }, NO_PREFIX },
+    { "negA",	{ VexGb, Eb }, NO_PREFIX },
+  },
+  /* REG_EVEX_MAP4_F7 */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { "notQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+    { "negQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+  },
+  /* REG_EVEX_MAP4_FE */
+  {
+    { "incA",	{ VexGb, Eb }, NO_PREFIX },
+    { "decA",	{ VexGb, Eb }, NO_PREFIX },
+  },
+  /* REG_EVEX_MAP4_FF */
+  {
+    { "incQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+    { "decQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+  },
+
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index ea0a4c0b2a5..a6e1eb3250f 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -875,64 +875,64 @@ static const struct dis386 evex_table[][256] = {
   /* EVEX_MAP4_ */
   {
     /* 00 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "addB",             { VexGb, Eb, Gb }, NO_PREFIX },
+    { "addS",             { VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "addB",             { VexGb, Gb, EbS }, NO_PREFIX },
+    { "addS",             { VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 08 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "orB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "orS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "orB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "orS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 10 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "adcB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "adcS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "adcB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "adcS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 18 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "sbbB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "sbbS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "sbbB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "sbbS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 20 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "andB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "andS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "andB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "andS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
+    { "shldS",		{ VexGv, Ev, Gv, Ib }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 28 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "subB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "subS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "subB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "subS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
+    { "shrdS",		{ VexGv, Ev, Gv, Ib }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* 30 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "xorB",		{ VexGb, Eb, Gb }, NO_PREFIX },
+    { "xorS",		{ VexGv, Ev, Gv }, PREFIX_NP_OR_DATA },
+    { "xorB",		{ VexGb, Gb, EbS }, NO_PREFIX },
+    { "xorS",		{ VexGv, Gv, EvS }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -947,23 +947,23 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 40 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "%CFcmovoS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovnoS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovbS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovaeS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmoveS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovneS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovbeS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovaS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
     /* 48 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "%CFcmovsS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovnsS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovpS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovnpS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovlS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovgeS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovleS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "%CFcmovgS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
     /* 50 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1019,10 +1019,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 80 */
+    { REG_TABLE (REG_EVEX_MAP4_80) },
+    { REG_TABLE (REG_EVEX_MAP4_81) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_83) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1060,7 +1060,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "shldS",	{ VexGv, Ev, Gv, CL }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     /* A8 */
@@ -1069,9 +1069,9 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
+    { "shrdS",	{ VexGv, Ev, Gv, CL }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "imulS",	{ VexGv, Gv, Ev }, PREFIX_NP_OR_DATA },
     /* B0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1091,8 +1091,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* C0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_C0) },
+    { REG_TABLE (REG_C1) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1109,10 +1109,10 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* D0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_D0) },
+    { REG_TABLE (REG_D1) },
+    { REG_TABLE (REG_D2) },
+    { REG_TABLE (REG_D3) },
     { "sha1rnds4",	{ XM, EXxmm, Ib }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1151,8 +1151,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_F6) },
+    { REG_TABLE (REG_EVEX_MAP4_F7) },
     /* F8 */
     { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
     { "movdiri",	{ Mdq, Gdq }, NO_PREFIX },
@@ -1160,8 +1160,8 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { PREFIX_TABLE (PREFIX_0F38FC) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_FE) },
+    { REG_TABLE (REG_EVEX_MAP4_FF) },
   },
   /* EVEX_MAP5_ */
   {
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index b81e75aa786..50b2734108b 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -579,6 +579,8 @@ fetch_error (const instr_info *ins)
 #define VexGatherD { OP_VEX, vex_vsib_d_w_dq_mode }
 #define VexGatherQ { OP_VEX, vex_vsib_q_w_dq_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define VexGb { OP_VEX, b_mode }
+#define VexGv { OP_VEX, v_mode }
 #define VexTmm { OP_VEX, tmm_mode }
 #define XMVexI4 { OP_REG_VexI4, x_mode }
 #define XMVexScalarI4 { OP_REG_VexI4, scalar_mode }
@@ -894,6 +896,13 @@ enum
   REG_EVEX_0F38C6_L_2,
   REG_EVEX_0F38C7_L_2,
   REG_EVEX_0F38F3_L_0_P_0,
+  REG_EVEX_MAP4_80,
+  REG_EVEX_MAP4_81,
+  REG_EVEX_MAP4_83,
+  REG_EVEX_MAP4_F6,
+  REG_EVEX_MAP4_F7,
+  REG_EVEX_MAP4_FE,
+  REG_EVEX_MAP4_FF,
 };
 
 enum
@@ -2605,25 +2614,25 @@ static const struct dis386 reg_table[][8] = {
   },
   /* REG_C0 */
   {
-    { "rolA",	{ Eb, Ib }, 0 },
-    { "rorA",	{ Eb, Ib }, 0 },
-    { "rclA",	{ Eb, Ib }, 0 },
-    { "rcrA",	{ Eb, Ib }, 0 },
-    { "shlA",	{ Eb, Ib }, 0 },
-    { "shrA",	{ Eb, Ib }, 0 },
-    { "shlA",	{ Eb, Ib }, 0 },
-    { "sarA",	{ Eb, Ib }, 0 },
+    { "rolA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "rorA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "rclA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "rcrA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "shrA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, Ib }, NO_PREFIX },
+    { "sarA",	{ VexGb, Eb, Ib }, NO_PREFIX },
   },
   /* REG_C1 */
   {
-    { "rolQ",	{ Ev, Ib }, 0 },
-    { "rorQ",	{ Ev, Ib }, 0 },
-    { "rclQ",	{ Ev, Ib }, 0 },
-    { "rcrQ",	{ Ev, Ib }, 0 },
-    { "shlQ",	{ Ev, Ib }, 0 },
-    { "shrQ",	{ Ev, Ib }, 0 },
-    { "shlQ",	{ Ev, Ib }, 0 },
-    { "sarQ",	{ Ev, Ib }, 0 },
+    { "rolQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "rorQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "rclQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "rcrQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "shrQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
+    { "sarQ",	{ VexGv, Ev, Ib }, PREFIX_NP_OR_DATA },
   },
   /* REG_C6 */
   {
@@ -2649,47 +2658,47 @@ static const struct dis386 reg_table[][8] = {
   },
   /* REG_D0 */
   {
-    { "rolA",	{ Eb, I1 }, 0 },
-    { "rorA",	{ Eb, I1 }, 0 },
-    { "rclA",	{ Eb, I1 }, 0 },
-    { "rcrA",	{ Eb, I1 }, 0 },
-    { "shlA",	{ Eb, I1 }, 0 },
-    { "shrA",	{ Eb, I1 }, 0 },
-    { "shlA",	{ Eb, I1 }, 0 },
-    { "sarA",	{ Eb, I1 }, 0 },
+    { "rolA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "rorA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "rclA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "rcrA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "shrA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, I1 }, NO_PREFIX },
+    { "sarA",	{ VexGb, Eb, I1 }, NO_PREFIX },
   },
   /* REG_D1 */
   {
-    { "rolQ",	{ Ev, I1 }, 0 },
-    { "rorQ",	{ Ev, I1 }, 0 },
-    { "rclQ",	{ Ev, I1 }, 0 },
-    { "rcrQ",	{ Ev, I1 }, 0 },
-    { "shlQ",	{ Ev, I1 }, 0 },
-    { "shrQ",	{ Ev, I1 }, 0 },
-    { "shlQ",	{ Ev, I1 }, 0 },
-    { "sarQ",	{ Ev, I1 }, 0 },
+    { "rolQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "rorQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "rclQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "rcrQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "shrQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
+    { "sarQ",	{ VexGv, Ev, I1 }, PREFIX_NP_OR_DATA },
   },
   /* REG_D2 */
   {
-    { "rolA",	{ Eb, CL }, 0 },
-    { "rorA",	{ Eb, CL }, 0 },
-    { "rclA",	{ Eb, CL }, 0 },
-    { "rcrA",	{ Eb, CL }, 0 },
-    { "shlA",	{ Eb, CL }, 0 },
-    { "shrA",	{ Eb, CL }, 0 },
-    { "shlA",	{ Eb, CL }, 0 },
-    { "sarA",	{ Eb, CL }, 0 },
+    { "rolA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "rorA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "rclA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "rcrA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "shrA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "shlA",	{ VexGb, Eb, CL }, NO_PREFIX },
+    { "sarA",	{ VexGb, Eb, CL }, NO_PREFIX },
   },
   /* REG_D3 */
   {
-    { "rolQ",	{ Ev, CL }, 0 },
-    { "rorQ",	{ Ev, CL }, 0 },
-    { "rclQ",	{ Ev, CL }, 0 },
-    { "rcrQ",	{ Ev, CL }, 0 },
-    { "shlQ",	{ Ev, CL }, 0 },
-    { "shrQ",	{ Ev, CL }, 0 },
-    { "shlQ",	{ Ev, CL }, 0 },
-    { "sarQ",	{ Ev, CL }, 0 },
+    { "rolQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "rorQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "rclQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "rcrQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "shrQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "shlQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
+    { "sarQ",	{ VexGv, Ev, CL }, PREFIX_NP_OR_DATA },
   },
   /* REG_F6 */
   {
@@ -3639,8 +3648,8 @@ static const struct dis386 prefix_table[][4] = {
   /* PREFIX_0F38F6 */
   {
     { "wrssK",	{ M, Gdq }, 0 },
-    { "adoxS",	{ Gdq, Edq}, 0 },
-    { "adcxS",	{ Gdq, Edq}, 0 },
+    { "adoxS",	{ VexGdq, Gdq, Edq}, 0 },
+    { "adcxS",	{ VexGdq, Gdq, Edq}, 0 },
     { Bad_Opcode },
   },
 
@@ -9114,6 +9123,12 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	  ins->rex2 &= ~REX_R;
 	}
 
+      /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
+	 all bits of EVEX.vvvv and EVEX.V' must be 1.  */
+      if (ins->evex_type == evex_from_legacy && !ins->vex.b
+	  && (ins->vex.register_specifier || !ins->vex.v))
+	return &bad_opcode;
+
       ins->need_vex = 4;
 
       /* EVEX from legacy instructions require that EVEX.z, EVEX.L’L and the
@@ -9131,8 +9146,10 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       if (!fetch_modrm (ins))
 	return &err_opcode;
 
-      /* Set vector length.  */
-      if (ins->modrm.mod == 3 && ins->vex.b)
+      /* Set vector length. For EVEX-promoted instructions, evex.ll == 0b00,
+	 which has the same encoding as vex.length == 128 and they can share
+	 the same processing with vex.length in OP_VEX.  */
+      if (ins->modrm.mod == 3 && ins->vex.b && ins->evex_type != evex_from_legacy)
 	ins->vex.length = 512;
       else
 	{
@@ -9598,8 +9615,8 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 	    }
 
 	  /* Check whether rounding control was enabled for an insn not
-	     supporting it.  */
-	  if (ins.modrm.mod == 3 && ins.vex.b
+	     supporting it, when evex.b is not treated as evex.nd.  */
+	  if (ins.modrm.mod == 3 && ins.vex.b && ins.evex_type == evex_default
 	      && !(ins.evex_used & EVEX_b_used))
 	    {
 	      for (i = 0; i < MAX_OPERANDS; ++i)
@@ -10487,16 +10504,23 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 	  ins->used_prefixes |= (ins->prefixes & PREFIX_ADDR);
 	  break;
 	case 'F':
-	  if (ins->intel_syntax)
-	    break;
-	  if ((ins->prefixes & PREFIX_ADDR) || (sizeflag & SUFFIX_ALWAYS))
+	  if (l == 0)
 	    {
-	      if (sizeflag & AFLAG)
-		*ins->obufp++ = ins->address_mode == mode_64bit ? 'q' : 'l';
-	      else
-		*ins->obufp++ = ins->address_mode == mode_64bit ? 'l' : 'w';
-	      ins->used_prefixes |= (ins->prefixes & PREFIX_ADDR);
+	      if (ins->intel_syntax)
+		break;
+	      if ((ins->prefixes & PREFIX_ADDR) || (sizeflag & SUFFIX_ALWAYS))
+		{
+		  if (sizeflag & AFLAG)
+		    *ins->obufp++ = ins->address_mode == mode_64bit ? 'q' : 'l';
+		  else
+		    *ins->obufp++ = ins->address_mode == mode_64bit ? 'l' : 'w';
+		  ins->used_prefixes |= (ins->prefixes & PREFIX_ADDR);
+		}
 	    }
+	  else if (l == 1 && last[0] == 'C')
+	    break;
+	  else
+	    abort ();
 	  break;
 	case 'G':
 	  if (ins->intel_syntax || (ins->obufp[-1] != 's'
@@ -11060,7 +11084,8 @@ print_displacement (instr_info *ins, bfd_signed_vma val)
 static void
 intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.b)
+  /* Check if there is a broadcast, when evex.b is not treated as evex.nd.  */
+  if (ins->vex.b && ins->evex_type == evex_default)
     {
       if (!ins->vex.no_broadcast)
 	switch (bytemode)
@@ -11558,6 +11583,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
   add += (ins->rex2 & REX_B) ? 16 : 0;
 
+  /* Handles EVEX other than APX EVEX-promoted instructions.  */
   if (ins->vex.evex && ins->evex_type == evex_default)
     {
 
@@ -11994,7 +12020,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	  print_operand_value (ins, disp & 0xffff, dis_style_text);
 	}
     }
-  if (ins->vex.b)
+  if (ins->vex.b && ins->evex_type == evex_default)
     {
       ins->evex_used |= EVEX_b_used;
 
@@ -13362,6 +13388,14 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (!ins->need_vex)
     return true;
 
+  /* Here vex.b is treated as "EVEX.ND".  */
+  if (ins->evex_type == evex_from_legacy)
+    {
+      ins->evex_used |= EVEX_b_used;
+      if (!ins->vex.b)
+	return true;
+    }
+
   reg = ins->vex.register_specifier;
   ins->vex.register_specifier = 0;
   if (ins->address_mode != mode_64bit)
@@ -13453,12 +13487,19 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  names = att_names_xmm;
 	  ins->evex_used |= EVEX_len_used;
 	  break;
+	case v_mode:
 	case dq_mode:
 	  if (ins->rex & REX_W)
 	    names = att_names64;
+	  else if (bytemode == v_mode
+		   && !(sizeflag & DFLAG))
+	    names = att_names16;
 	  else
 	    names = att_names32;
 	  break;
+	case b_mode:
+	  names = att_names8rex;
+	  break;
 	case mask_bd_mode:
 	case mask_mode:
 	  if (reg > 0x7)
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 88717fd7575..256f5a3865e 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -638,8 +638,10 @@ enum
   Vex,
   /* How to encode VEX.vvvv:
      0: VEX.vvvv must be 1111b.
-     1: VEX.vvvv encodes one of the register operands.
+     1: VEX.vvvv encodes one of the src register operands.
+     2: VEX.vvvv encodes the dest register operand.
    */
+#define VexVVVV_DST   2
   VexVVVV,
   /* How the VEX.W bit is used:
      0: Set by the REX.W bit.
@@ -786,7 +788,7 @@ typedef struct i386_opcode_modifier
   unsigned int immext:1;
   unsigned int norex64:1;
   unsigned int vex:2;
-  unsigned int vexvvvv:1;
+  unsigned int vexvvvv:2;
   unsigned int vexw:2;
   unsigned int opcodeprefix:2;
   unsigned int sib:3;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index b27131ef185..5aa00cb93ef 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -139,9 +139,13 @@
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
+#define DstVVVV VexVVVV=VexVVVV_DST
+
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
+// And re-use the bit to mark some NDD insns that swapping the source operands
+// may allow to switch from EVEX encoding to REX2 encoding.
 #define C StaticRounding
 
 #define FP 387|287|8087
@@ -288,26 +292,40 @@ std, 0xfd, 0, NoSuf, {}
 sti, 0xfb, 0, NoSuf, {}
 
 // Arithmetic.
+add, 0x0, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
+inc, 0xfe/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
 inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Optimize|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
 sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sub, 0x83/5, APX_F, Modrm|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sub, 0x2c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+sub, 0x80/5, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
+dec, 0xfe/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x83/3, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+sbb, 0x80/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 cmp, 0x38, 0, D|W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 cmp, 0x83/7, 0, Modrm|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
@@ -318,31 +336,50 @@ test, 0x84, 0, D|W|C|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, R
 test, 0xa8, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
 test, 0xf6/0, 0, W|Modrm|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+and, 0x20, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+and, 0x80/4, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+or, 0x8, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+or, 0x80/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+xor, 0x30, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+xor, 0x80/6, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 // clr with 1 operand is really xor with 2 operands.
 clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }
 
+adc, 0x10, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 adc, 0x10, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
+adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+adc, 0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+neg, 0xf6/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+
+not, 0xf6/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+not, 0xf6/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
 aaa, 0x37, No64, NoSuf, {}
 aas, 0x3f, No64, NoSuf, {}
@@ -375,6 +412,7 @@ cqto, 0x99, x64, Size64|NoSuf, {}
 // These multiplies can only be selected with single operand forms.
 mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+imul, 0xaf, APX_F, C|Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64 }
 imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -389,52 +427,98 @@ div, 0xf6/6, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|
 idiv, 0xf6/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 idiv, 0xf6/7, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
 
+rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+ror, 0xd0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+ror, 0xc0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 ror, 0xc0/1, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+ror, 0xd2/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 ror, 0xd2/1, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+ror, 0xd0/1, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 ror, 0xd0/1, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xc0/2, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xc0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xd2/2, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd2/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcl, 0xd0/2, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xc0/3, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xc0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xd2/3, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd2/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 rcr, 0xd0/3, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+rcr, 0xd0/3, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+sal, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sal, 0xc0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sal, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sal, 0xd2/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sal, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sal, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sal, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+shl, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shl, 0xc0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shl, 0xc0/4, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shl, 0xd2/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shl, 0xd2/4, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shl, 0xd0/4, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shl, 0xd0/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+shr, 0xd0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shr, 0xc0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shr, 0xc0/5, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shr, 0xd2/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shr, 0xd2/5, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+shr, 0xd0/5, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 shr, 0xd0/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+sar, 0xd0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sar, 0xc0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sar, 0xc0/7, i186, W|Modrm|No_sSuf, { Imm8, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sar, 0xd2/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sar, 0xd2/7, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
+sar, 0xd0/7, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
 sar, 0xd0/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
 
+shld, 0x24, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shld, 0xfa4, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+shld, 0xa5, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+shld, 0xa5, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shld, 0xfa5, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 
+shrd, 0x2c, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+shrd, 0xad, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
+shrd, 0xad, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
 
 // Control transfer instructions.
@@ -940,6 +1024,7 @@ ud2b, 0xfb9, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|U
 // 3rd official undefined instr (older CPUs don't take a ModR/M byte)
 ud0, 0xfff, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
+cmov<cc>, 0x4<cc:opc>, CMOV&APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
 cmov<cc>, 0xf4<cc:opc>, CMOV, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 fcmovb, 0xda/0, i687, Modrm|NoSuf, { FloatReg, FloatAcc }
@@ -2027,8 +2112,12 @@ xcryptofb, 0xf30fa7e8, PadLock, NoSuf|RepPrefixOk, {}
 xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {}
 
 // Multy-precision Add Carry, rdseed instructions.
+adcx, 0x6666, ADX&APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adcx, 0x660f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adcx, 0x6666, ADX&APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adox, 0xf366, ADX&APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
 adox, 0xf30f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+adox, 0xf366, ADX&APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 rdseed, 0xfc7/7, RdSeed, Modrm|NoSuf, { Reg16|Reg32|Reg64 }
 
 // SMAP instructions.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 7/9] Support APX Push2/Pop2
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
                   ` (4 preceding siblings ...)
  2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
@ 2023-11-24  7:02 ` Cui, Lili
  2023-12-11 11:17   ` Jan Beulich
  2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, Mo, Zewei

From: "Mo, Zewei" <zewei.mo@intel.com>

PPX functionality for PUSH/POP is not implemented in this patch
and will be implemented separately.

gas/ChangeLog:

2023-11-21  Zewei Mo <zewei.mo@intel.com>
            H.J. Lu  <hongjiu.lu@intel.com>
            Lili Cui <lili.cui@intel.com>

	* config/tc-i386.c: (enum i386_error):
	New unsupported_rsp_register and invalid_src_register_set.
	(md_assemble): Add handler for unsupported_rsp_register and
	invalid_src_register_set.
	(check_APX_operands): Add invalid check for push2/pop2.
	(match_template): Handle check_APX_operands.
	* testsuite/gas/i386/i386.exp: Add apx-push2pop2 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2.d: New test.
	* testsuite/gas/i386/x86-64-apx-push2pop2.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-apx-push2pop2-inval.s: Ditto.
	* testsuite/gas/i386/apx-push2pop2-inval.s: Ditto.
	* testsuite/gas/i386/apx-push2pop2-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Added bad
	testcases for POP2.
	* testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-reg.h: Add REG_EVEX_MAP4_8F.
	* i386-dis-evex-w.h: Add EVEX_W_MAP4_8F_R_0 and EVEX_W_MAP4_FF_R_6
	* i386-dis-evex.h: Add REG_EVEX_MAP4_8F.
	* i386-dis.c (PUSH2_POP2_Fixup): Add special handling for PUSH2/POP2.
	(get_valid_dis386): Add handler for vector length and address_mode for
	APX-Push2/Pop2 insn.
	(nd): define nd as b for EVEX-promoted instrutions.
	(OP_VEX): Add handler of 64-bit vvvv register for APX-Push2/Pop2 insn.
	* i386-gen.c: Add Push2Pop2 bitfield.
	* i386-opc.h: Regenerated.
	* i386-opc.tbl: Regenerated.
---
 gas/config/tc-i386.c                          | 42 +++++++++++++++++++
 gas/testsuite/gas/i386/apx-push2pop2-inval.l  |  5 +++
 gas/testsuite/gas/i386/apx-push2pop2-inval.s  |  9 ++++
 gas/testsuite/gas/i386/i386.exp               |  1 +
 .../gas/i386/x86-64-apx-evex-promoted-bad.d   |  6 ++-
 .../gas/i386/x86-64-apx-evex-promoted-bad.s   |  6 +++
 .../gas/i386/x86-64-apx-push2pop2-intel.d     | 42 +++++++++++++++++++
 .../gas/i386/x86-64-apx-push2pop2-inval.l     | 13 ++++++
 .../gas/i386/x86-64-apx-push2pop2-inval.s     | 17 ++++++++
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d | 42 +++++++++++++++++++
 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s | 39 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |  3 ++
 opcodes/i386-dis-evex-reg.h                   |  9 ++++
 opcodes/i386-dis-evex-w.h                     | 10 +++++
 opcodes/i386-dis-evex.h                       |  2 +-
 opcodes/i386-dis.c                            | 39 +++++++++++++++--
 opcodes/i386-opc.h                            |  1 +
 opcodes/i386-opc.tbl                          |  9 ++++
 18 files changed, 289 insertions(+), 6 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-push2pop2.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 1efda914150..e7e104dba07 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -248,6 +248,7 @@ enum i386_error
     invalid_vector_register_set,
     invalid_tmm_register_set,
     invalid_dest_and_src_register_set,
+    invalid_src_register_set,
     invalid_pseudo_prefix,
     unsupported_vector_index_register,
     unsupported_broadcast,
@@ -256,6 +257,7 @@ enum i386_error
     mask_not_on_destination,
     no_default_mask,
     unsupported_rc_sae,
+    unsupported_rsp_register,
     invalid_register_operand,
     internal_error,
   };
@@ -5398,6 +5400,9 @@ md_assemble (char *line)
 	case invalid_dest_and_src_register_set:
 	  err_msg = _("destination and source registers must be distinct");
 	  break;
+	case invalid_src_register_set:
+	  err_msg = _("two source registers must be distinct");
+	  break;
 	case invalid_pseudo_prefix:
 	  err_msg = _("rex2 pseudo prefix cannot be used here");
 	  break;
@@ -5422,6 +5427,9 @@ md_assemble (char *line)
 	case unsupported_rc_sae:
 	  err_msg = _("unsupported static rounding/sae");
 	  break;
+	case unsupported_rsp_register:
+	  err_msg = _("cannot be used with %rsp register");
+	  break;
 	case invalid_register_operand:
 	  err_msg = _("invalid register operand");
 	  break;
@@ -7113,6 +7121,33 @@ check_EgprOperands (const insn_template *t)
   return 0;
 }
 
+/* Check if APX operands are valid for the instruction.  */
+static int
+check_APX_operands (const insn_template *t)
+{
+  /* Push2* and Pop2* cannot use RSP and Pop2* cannot pop two same registers.
+   */
+  if (t->mnem_off == MN_push2 || t->mnem_off == MN_push2p
+      || t->mnem_off == MN_pop2 || t->mnem_off == MN_pop2p)
+    {
+      unsigned int reg1 = register_number (i.op[0].regs);
+      unsigned int reg2 = register_number (i.op[1].regs);
+
+      if (reg1 == 0x4 || reg2 == 0x4)
+	{
+	  i.error = unsupported_rsp_register;
+	  return 1;
+	}
+      if (t->base_opcode == 0x8f && reg1 == reg2)
+	{
+	  i.error = invalid_src_register_set;
+	  return 1;
+	}
+    }
+
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7606,6 +7641,13 @@ match_template (char mnem_suffix)
 	  continue;
 	}
 
+      /* Check if APX operands are valid.  */
+      if (check_APX_operands (t))
+	{
+	  specific_error = progress (i.error);
+	  continue;
+	}
+
       /* Check whether to use the shorter VEX encoding for certain insns where
 	 the EVEX enconding comes first in the table.  This requires the respective
 	 AVX-* feature to be explicitly enabled.  */
diff --git a/gas/testsuite/gas/i386/apx-push2pop2-inval.l b/gas/testsuite/gas/i386/apx-push2pop2-inval.l
new file mode 100644
index 00000000000..a55a71520c8
--- /dev/null
+++ b/gas/testsuite/gas/i386/apx-push2pop2-inval.l
@@ -0,0 +1,5 @@
+.* Assembler messages:
+.*:6: Error: `push2' is only supported in 64-bit mode
+.*:7: Error: `push2p' is only supported in 64-bit mode
+.*:8: Error: `pop2' is only supported in 64-bit mode
+.*:9: Error: `pop2p' is only supported in 64-bit mode
diff --git a/gas/testsuite/gas/i386/apx-push2pop2-inval.s b/gas/testsuite/gas/i386/apx-push2pop2-inval.s
new file mode 100644
index 00000000000..77166327ed1
--- /dev/null
+++ b/gas/testsuite/gas/i386/apx-push2pop2-inval.s
@@ -0,0 +1,9 @@
+# Check 32bit APX-PUSH2/POP2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2 %rax, %rbx
+	push2p %rax, %rbx
+	pop2 %rax, %rbx
+	pop2p %rax, %rbx
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 6ab197089ee..835cc3ecdff 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -511,6 +511,7 @@ if [gas_32_check] then {
     run_dump_test "sm4-intel"
     run_list_test "pbndkb-inval"
     run_list_test "user_msr-inval"
+    run_list_test "apx-push2pop2-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
index 2ae0f1a358f..b6ba39a5a25 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
@@ -28,5 +28,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:[ 	]+67 62 f2 7c 18 f5[ 	]+addr32 \(bad\)
 [ 	]*[a-f0-9]+:[ 	]+0b ff[ 	]+or     %edi,%edi
 [ 	]*[a-f0-9]+:[ 	]+62 f4 fc 08 ff[ 	]+\(bad\)
-[ 	]*[a-f0-9]+:[ 	]+d8[ 	]+.byte 0xd8
-#pass
+[ 	]*[a-f0-9]+:[ 	]+d8 ff[ 	]+fdivr  %st\(7\),%st
+[ 	]*[a-f0-9]+:[ 	]+62 f4 64[ 	]+\(bad\)
+[ 	]*[a-f0-9]+:[ 	]+08 8f c0 ff ff ff[ 	]+or     %cl,-0x40\(%rdi\)
+[ 	]*[a-f0-9]+:[ 	]+62 f4 7c 18 8f c0[ 	]+pop2   %rax,\(bad\)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
index c4646dcadb4..2339349bd99 100644
--- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
@@ -28,3 +28,9 @@ _start:
 	.byte 0xff
 	#{evex} inc %rax %rbx EVEX.vvvv' != 1111 && EVEX.ND = 0.
 	.insn EVEX.L0.NP.M4.W1 0xff, %rax, %rbx
+	.byte 0xff
+	# pop2 %rax, %rbx set EVEX.ND=0.
+	.byte 0x62,0xf4,0x64,0x08,0x8f,0xc0
+	.byte 0xff, 0xff, 0xff
+	# pop2 %rax set EVEX.vvvv' = 1111.
+	.byte 0x62,0xf4,0x7c,0x18,0x8f,0xc0
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
new file mode 100644
index 00000000000..46b21219582
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-intel.d
@@ -0,0 +1,42 @@
+#as: --64
+#objdump: -dw -Mintel
+#name: i386 APX-push2pop2 insns (Intel disassembly)
+#source: x86-64-apx-push2pop2.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+rax,rbx
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+r8,r17
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+r31,r9
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+r24,r31
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+r31,r24
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+rbx,rax
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+r17,r8
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+r9,r31
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+r31,r24
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
new file mode 100644
index 00000000000..a23011ea15d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.l
@@ -0,0 +1,13 @@
+.* Assembler messages:
+.*:6: Error: operand size mismatch for `push2'
+.*:7: Error: operand size mismatch for `push2'
+.*:8: Error: cannot be used with %rsp register for `push2'
+.*:9: Error: cannot be used with %rsp register for `push2'
+.*:10: Error: operand size mismatch for `push2p'
+.*:11: Error: cannot be used with %rsp register for `push2p'
+.*:12: Error: operand size mismatch for `pop2'
+.*:13: Error: cannot be used with %rsp register for `pop2'
+.*:14: Error: cannot be used with %rsp register for `pop2'
+.*:15: Error: two source registers must be distinct for `pop2'
+.*:16: Error: cannot be used with %rsp register for `pop2p'
+.*:17: Error: two source registers must be distinct for `pop2p'
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
new file mode 100644
index 00000000000..83cef97d57e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2-inval.s
@@ -0,0 +1,17 @@
+# Check illegal APX-Push2Pop2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2  %ax, %bx
+	push2  %eax, %ebx
+	push2  %rsp, %r17
+	push2  %r17, %rsp
+	push2p %eax, %ebx
+	push2p %rsp, %r17
+	pop2   %ax, %bx
+	pop2   %rax, %rsp
+	pop2   %rsp, %rax
+	pop2   %r12, %r12
+	pop2p  %rax, %rsp
+	pop2p  %r12, %r12
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
new file mode 100644
index 00000000000..54f22a7f94e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.d
@@ -0,0 +1,42 @@
+#as: --64
+#objdump: -dw
+#name: x86_64 APX-push2pop2 insns
+#source: x86-64-apx-push2pop2.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 7c 18 ff f3\s+push2\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc 3c 18 ff f1\s+push2\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 04 10 ff f1\s+push2\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc 3c 10 ff f7\s+push2\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 fc 18 ff f3\s+push2p\s+%rbx,%rax
+\s*[a-f0-9]+:\s*62 fc bc 18 ff f1\s+push2p\s+%r17,%r8
+\s*[a-f0-9]+:\s*62 d4 84 10 ff f1\s+push2p\s+%r9,%r31
+\s*[a-f0-9]+:\s*62 dc bc 10 ff f7\s+push2p\s+%r31,%r24
+\s*[a-f0-9]+:\s*62 f4 64 18 8f c0\s+pop2\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 74 10 8f c0\s+pop2\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc 34 18 8f c7\s+pop2\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 04 10 8f c0\s+pop2\s+%r24,%r31
+\s*[a-f0-9]+:\s*62 f4 e4 18 8f c0\s+pop2p\s+%rax,%rbx
+\s*[a-f0-9]+:\s*62 d4 f4 10 8f c0\s+pop2p\s+%r8,%r17
+\s*[a-f0-9]+:\s*62 dc b4 18 8f c7\s+pop2p\s+%r31,%r9
+\s*[a-f0-9]+:\s*62 dc 84 10 8f c0\s+pop2p\s+%r24,%r31
diff --git a/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
new file mode 100644
index 00000000000..4cfc0a2185f
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
@@ -0,0 +1,39 @@
+# Check 64bit APX-Push2Pop2 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	push2 %rbx, %rax
+	push2 %r17, %r8
+	push2 %r9, %r31
+	push2 %r31, %r24
+	push2p %rbx, %rax
+	push2p %r17, %r8
+	push2p %r9, %r31
+	push2p %r31, %r24
+	pop2 %rax, %rbx
+	pop2 %r8, %r17
+	pop2 %r31, %r9
+	pop2 %r24, %r31
+	pop2p %rax, %rbx
+	pop2p %r8, %r17
+	pop2p %r31, %r9
+	pop2p %r24, %r31
+
+.intel_syntax noprefix
+	push2 rax, rbx
+	push2 r8, r17
+	push2 r31, r9
+	push2 r24, r31
+	push2p rax, rbx
+	push2p r8, r17
+	push2p r31, r9
+	push2p r24, r31
+	pop2 rbx, rax
+	pop2 r17, r8
+	pop2 r9, r31
+	pop2 r31, r24
+	pop2p rbx, rax
+	pop2p r17, r8
+	pop2p r9, r31
+	pop2p r31, r24
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index c28e4e7e333..b834379a491 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -345,6 +345,9 @@ run_dump_test "x86-64-avx512dq-rcigrd-intel"
 run_dump_test "x86-64-avx512dq-rcigrd"
 run_dump_test "x86-64-avx512dq-rcigrne-intel"
 run_dump_test "x86-64-avx512dq-rcigrne"
+run_dump_test "x86-64-apx-push2pop2"
+run_dump_test "x86-64-apx-push2pop2-intel"
+run_list_test "x86-64-apx-push2pop2-inval"
 run_dump_test "x86-64-avx512dq-rcigru-intel"
 run_dump_test "x86-64-avx512dq-rcigru"
 run_dump_test "x86-64-avx512dq-rcigrz-intel"
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index b7f87c2fa39..bf33f5425da 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -86,6 +86,10 @@
     { "subQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
     { "xorQ",	{ VexGv, Ev, sIb }, PREFIX_NP_OR_DATA },
   },
+  /* REG_EVEX_MAP4_8F */
+  {
+    { VEX_W_TABLE (EVEX_W_MAP4_8F_R_0) },
+  },
   /* REG_EVEX_MAP4_F6 */
   {
     { Bad_Opcode },
@@ -109,5 +113,10 @@
   {
     { "incQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
     { "decQ",	{ VexGv, Ev }, PREFIX_NP_OR_DATA },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_MAP4_FF_R_6) },
   },
 
diff --git a/opcodes/i386-dis-evex-w.h b/opcodes/i386-dis-evex-w.h
index b828277d413..12ab29544bb 100644
--- a/opcodes/i386-dis-evex-w.h
+++ b/opcodes/i386-dis-evex-w.h
@@ -442,6 +442,16 @@
     { Bad_Opcode },
     { "vpshrdw",   { XM, Vex, EXx, Ib }, 0 },
   },
+  /* EVEX_W_MAP4_8F_R_0 */
+  {
+    { "pop2", { { PUSH2_POP2_Fixup, q_mode}, Eq }, NO_PREFIX },
+    { "pop2p", { { PUSH2_POP2_Fixup, q_mode}, Eq }, NO_PREFIX },
+  },
+  /* EVEX_W_MAP4_FF_R_6 */
+  {
+    { "push2", { { PUSH2_POP2_Fixup, q_mode}, Eq }, 0 },
+    { "push2p", { { PUSH2_POP2_Fixup, q_mode}, Eq }, 0 },
+  },
   /* EVEX_W_MAP5_5B_P_0 */
   {
     { "vcvtdq2ph%XY",	{ XMxmmq, EXx, EXxEVexR }, 0 },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index a6e1eb3250f..e7ee868cf1f 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -1035,7 +1035,7 @@ static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { REG_TABLE (REG_EVEX_MAP4_8F) },
     /* 90 */
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 50b2734108b..0612b0cd4b4 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -105,6 +105,7 @@ static bool FXSAVE_Fixup (instr_info *, int, int);
 static bool MOVSXD_Fixup (instr_info *, int, int);
 static bool DistinctDest_Fixup (instr_info *, int, int);
 static bool PREFETCHI_Fixup (instr_info *, int, int);
+static bool PUSH2_POP2_Fixup (instr_info *, int, int);
 
 static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
 						enum disassembler_style,
@@ -225,6 +226,9 @@ struct instr_info
   }
   vex;
 
+/* For APX EVEX-promoted prefix, EVEX.ND shares the same bit as vex.b.  */
+#define nd b
+
   enum evex_type evex_type;
 
   /* Remember if the current op is a jump instruction.  */
@@ -899,6 +903,7 @@ enum
   REG_EVEX_MAP4_80,
   REG_EVEX_MAP4_81,
   REG_EVEX_MAP4_83,
+  REG_EVEX_MAP4_8F,
   REG_EVEX_MAP4_F6,
   REG_EVEX_MAP4_F7,
   REG_EVEX_MAP4_FE,
@@ -1742,6 +1747,9 @@ enum
   EVEX_W_0F3A70,
   EVEX_W_0F3A72,
 
+  EVEX_W_MAP4_8F_R_0,
+  EVEX_W_MAP4_FF_R_6,
+
   EVEX_W_MAP5_5B_P_0,
   EVEX_W_MAP5_7A_P_3,
 };
@@ -9125,7 +9133,7 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
 	 all bits of EVEX.vvvv and EVEX.V' must be 1.  */
-      if (ins->evex_type == evex_from_legacy && !ins->vex.b
+      if (ins->evex_type == evex_from_legacy && !ins->vex.nd
 	  && (ins->vex.register_specifier || !ins->vex.v))
 	return &bad_opcode;
 
@@ -13388,11 +13396,10 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (!ins->need_vex)
     return true;
 
-  /* Here vex.b is treated as "EVEX.ND".  */
   if (ins->evex_type == evex_from_legacy)
     {
       ins->evex_used |= EVEX_b_used;
-      if (!ins->vex.b)
+      if (!ins->vex.nd)
 	return true;
     }
 
@@ -13500,6 +13507,9 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	case b_mode:
 	  names = att_names8rex;
 	  break;
+	case q_mode:
+	  names = att_names64;
+	  break;
 	case mask_bd_mode:
 	case mask_mode:
 	  if (reg > 0x7)
@@ -13884,3 +13894,26 @@ PREFETCHI_Fixup (instr_info *ins, int bytemode, int sizeflag)
 
   return OP_M (ins, bytemode, sizeflag);
 }
+
+static bool
+PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
+{
+  if (ins->modrm.mod != 3 || !ins->vex.b)
+    return true;
+
+  unsigned int vvvv_reg = ins->vex.register_specifier
+    | (!ins->vex.v << 4);
+  unsigned int rm_reg = ins->modrm.rm + (ins->rex & REX_B ? 8 : 0)
+    + (ins->rex2 & REX_B ? 16 : 0);
+
+  /* Push2/Pop2 cannot use RSP and Pop2 cannot pop two same registers.  */
+  if (!ins->vex.nd || vvvv_reg == 0x4 || rm_reg == 0x4
+      || (!ins->modrm.reg
+	  && vvvv_reg == rm_reg))
+    {
+      oappend (ins, "(bad)");
+      return true;
+    }
+
+  return OP_VEX (ins, bytemode, sizeflag);
+}
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 256f5a3865e..edd59dd67ea 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -807,6 +807,7 @@ typedef struct i386_opcode_modifier
   unsigned int isa64:2;
   unsigned int noegpr:1;
   unsigned int nf:1;
+  unsigned int push2pop2:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 5aa00cb93ef..642e519fe3a 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3486,3 +3486,12 @@ uwrmsr, 0xf30f38f8, USER_MSR, Modrm|NoSuf|NoRex64, { Reg64, Reg64 }
 uwrmsr, 0xf3f8/0, USER_MSR, Modrm|Vex128|VexMap7|VexW0|NoSuf, { Imm32, Reg64 }
 
 // USER_MSR instructions end.
+
+// APX Push2/Pop2 instructions.
+
+push2, 0xff/6, APX_F, Modrm|VexW0|EVex128|EVexMap4|VexVVVV|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+push2p, 0xff/6, APX_F, Modrm|VexW1|EVex128|EVexMap4|VexVVVV|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+pop2, 0x8f/0, APX_F, Modrm|VexW0|EVex128|EVexMap4|VexVVVV|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+pop2p, 0x8f/0, APX_F, Modrm|VexW1|EVex128|EVexMap4|VexVVVV|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Reg64, Reg64 }
+
+// APX Push2/Pop2 instructions end.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
                   ` (5 preceding siblings ...)
  2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
@ 2023-11-24  7:02 ` Cui, Lili
  2023-12-11 12:27   ` Jan Beulich
  2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, Hu, Lin1

From: "Hu, Lin1" <lin1.hu@intel.com>

This patch aims to optimize:

add %r16, %r15, %r15 -> add %r16, %r15

gas/ChangeLog:

	* config/tc-i386.c (check_RexOperands): New function.
	(can_convert_NDD_to_legacy): Ditto.
	(match_template): If we can optimzie APX NDD insns, so rematch
	template.
	* testsuite/gas/i386/x86-64.exp: Add test.
	* testsuite/gas/i386/x86-64-apx-ndd-optimize.d: New test.
	* testsuite/gas/i386/x86-64-apx-ndd-optimize.s: Ditto.
---
 gas/config/tc-i386.c                          | 107 ++++++++++++++
 .../gas/i386/x86-64-apx-ndd-optimize.d        | 130 ++++++++++++++++++
 .../gas/i386/x86-64-apx-ndd-optimize.s        | 123 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64.exp             |   1 +
 4 files changed, 361 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index e7e104dba07..aa66f704c48 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t)
   return 0;
 }
 
+/* Check if the instruction use the REX registers.  */
+static bool
+check_RexOperands ()
+{
+  for (unsigned int op = 0; op < i.operands; op++)
+    {
+      if (i.types[op].bitfield.class != Reg)
+	continue;
+
+      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
+	return true;
+    }
+
+  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
+      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
+    return true;
+
+  /* Check pseudo prefix {rex} are valid.  */
+  return i.rex_encoding;
+}
+
+/* Optimize APX NDD insns to legacy insns.  */
+static unsigned int
+can_convert_NDD_to_legacy (const insn_template *t)
+{
+  unsigned int match_dest_op = ~0;
+
+  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
+      && t->opcode_space == SPACE_EVEXMAP4
+      && !i.has_nf
+      && i.reg_operands >= 2)
+    {
+      unsigned int dest = i.operands - 1;
+      unsigned int src1 = i.operands - 2;
+      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+      if (i.types[src1].bitfield.class == Reg
+	  && i.op[src1].regs == i.op[dest].regs)
+	match_dest_op = src1;
+      /* If the first operand is the same as the third operand,
+	 these instructions need to support the ability to commutative
+	 the first two operands and still not change the semantics in order
+	 to be optimized.  */
+      else if (i.types[src2].bitfield.class == Reg
+	       && i.op[src2].regs == i.op[dest].regs
+	       && optimize > 1
+	       && t->opcode_modifier.commutative)
+	match_dest_op = src2;
+    }
+  return match_dest_op;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7675,6 +7727,61 @@ match_template (char mnem_suffix)
 	  i.memshift = memshift;
 	}
 
+      /* If we can optimize a NDD insn to legacy insn, like
+	 add %r16, %r8, %r8 -> add %r16, %r8,
+	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
+	 Note that the semantics have not been changed.  */
+      if (optimize
+	  && !i.no_optimize
+	  && i.vec_encoding != vex_encoding_evex
+	  && t + 1 < current_templates->end
+	  && !t[1].opcode_modifier.evex
+	  && t[1].opcode_space <= SPACE_0F38
+	  && t->opcode_modifier.vexvvvv == VexVVVV_DST)
+	{
+	  unsigned int match_dest_op = can_convert_NDD_to_legacy (t);
+	  size_match = true;
+
+	  if (match_dest_op != (unsigned int) ~0)
+	    {
+	      /* We ensure that the next template has the same input
+		 operands as the original matching template by the first
+		 opernd (ATT), thus avoiding the error caused by the wrong order
+		 of insns in i386.tbl.  */
+	      overlap0 = operand_type_and (i.types[0],
+					   t[1].operand_types[0]);
+	      if (t->opcode_modifier.d)
+		overlap1 = operand_type_and (i.types[0],
+					     t[1].operand_types[1]);
+	      if (!operand_type_match (overlap0, i.types[0])
+		  && (!t->opcode_modifier.d
+		      || (t->opcode_modifier.d
+			  && !operand_type_match (overlap1, i.types[0]))))
+		size_match = false;
+
+	      if (size_match
+		  /* Optimizing some non-legacy-map0/1 without REX/REX2 prefix will be valuable.  */
+		  && (t[1].opcode_space <= SPACE_0F
+		      || (!check_EgprOperands (t + 1)
+			  && !check_RexOperands ()
+			  && !i.op[i.operands - 1].regs->reg_type.bitfield.qword)))
+		{
+		  unsigned int src1 = i.operands - 2;
+		  unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+		  if (match_dest_op == src2)
+		    swap_2_operands (match_dest_op, src1);
+
+		  --i.operands;
+		  --i.reg_operands;
+
+		  specific_error = progress (internal_error);
+		  continue;
+		}
+
+	    }
+	}
+
       /* We've found a match; break out of loop.  */
       break;
     }
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
new file mode 100644
index 00000000000..6f841a807a9
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d
@@ -0,0 +1,130 @@
+#as: -Os
+#objdump: -drw
+#name: x86-64 APX NDD optimized encoding
+#source: x86-64-apx-ndd-optimize.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 4d 01 f8          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 45 00 f8          	add    %r31b,%r8b
+\s*[a-f0-9]+:\s*d5 4d 01 f8          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 1d 03 c7          	add    %r31,%r8
+\s*[a-f0-9]+:\s*d5 4d 03 38          	add    \(%r8\),%r31
+\s*[a-f0-9]+:\s*d5 1d 03 07          	add    \(%r31\),%r8
+\s*[a-f0-9]+:\s*49 81 c7 33 44 34 12 	add    \$0x12344433,%r15
+\s*[a-f0-9]+:\s*49 81 c0 11 22 33 f4 	add    \$0xfffffffff4332211,%r8
+\s*[a-f0-9]+:\s*d5 19 ff c7          	inc    %r31
+\s*[a-f0-9]+:\s*d5 11 fe c7          	inc    %r31b
+\s*[a-f0-9]+:\s*d5 1c 29 f9          	sub    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 28 f9          	sub    %r15b,%r17b
+\s*[a-f0-9]+:\s*62 54 84 18 29 38    	sub    %r15,\(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 2b 04 07       	sub    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 ee 34 12 00 00 	sub    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 18 ff c9          	dec    %r17
+\s*[a-f0-9]+:\s*d5 10 fe c9          	dec    %r17b
+\s*[a-f0-9]+:\s*d5 1c 19 f9          	sbb    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 18 f9          	sbb    %r15b,%r17b
+\s*[a-f0-9]+:\s*62 54 84 18 19 38    	sbb    %r15,\(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 1b 04 07       	sbb    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 de 34 12 00 00 	sbb    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 21 f9          	and    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 20 f9          	and    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 23 38             	and    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 23 04 07       	and    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 11 81 e6 34 12 00 00 	and    \$0x1234,%r30d
+\s*[a-f0-9]+:\s*d5 1c 09 f9          	or     %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 08 f9          	or     %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 0b 38             	or     \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 0b 04 07       	or     \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 ce 34 12 00 00 	or     \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 31 f9          	xor    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 30 f9          	xor    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 33 38             	xor    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 33 04 07       	xor    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 f6 34 12 00 00 	xor    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 1c 11 f9          	adc    %r15,%r17
+\s*[a-f0-9]+:\s*d5 14 10 f9          	adc    %r15b,%r17b
+\s*[a-f0-9]+:\s*4d 13 38             	adc    \(%r8\),%r15
+\s*[a-f0-9]+:\s*d5 49 13 04 07       	adc    \(%r15,%rax,1\),%r16
+\s*[a-f0-9]+:\s*d5 19 81 d6 34 12 00 00 	adc    \$0x1234,%r30
+\s*[a-f0-9]+:\s*d5 18 f7 d9          	neg    %r17
+\s*[a-f0-9]+:\s*d5 10 f6 d9          	neg    %r17b
+\s*[a-f0-9]+:\s*d5 18 f7 d1          	not    %r17
+\s*[a-f0-9]+:\s*d5 10 f6 d1          	not    %r17b
+\s*[a-f0-9]+:\s*67 0f af 90 09 09 09 00 	imul   0x90909\(%eax\),%edx
+\s*[a-f0-9]+:\s*d5 aa af 94 f8 09 09 00 00 	imul   0x909\(%rax,%r31,8\),%rdx
+\s*[a-f0-9]+:\s*48 0f af d0          	imul   %rax,%rdx
+\s*[a-f0-9]+:\s*d5 19 d1 c7          	rol    \$1,%r31
+\s*[a-f0-9]+:\s*d5 11 d0 c7          	rol    \$1,%r31b
+\s*[a-f0-9]+:\s*49 c1 c4 02          	rol    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 c4 02          	rol    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 cf          	ror    \$1,%r31
+\s*[a-f0-9]+:\s*d5 11 d0 cf          	ror    \$1,%r31b
+\s*[a-f0-9]+:\s*49 c1 cc 02          	ror    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 cc 02          	ror    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 d7          	rcl    \$1,%r31
+\s*[a-f0-9]+:\s*d5 11 d0 d7          	rcl    \$1,%r31b
+\s*[a-f0-9]+:\s*49 c1 d4 02          	rcl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 d4 02          	rcl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 df          	rcr    \$1,%r31
+\s*[a-f0-9]+:\s*d5 11 d0 df          	rcr    \$1,%r31b
+\s*[a-f0-9]+:\s*49 c1 dc 02          	rcr    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 dc 02          	rcr    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 e7          	shl    \$1,%r31
+\s*[a-f0-9]+:\s*d5 11 d0 e7          	shl    \$1,%r31b
+\s*[a-f0-9]+:\s*49 c1 e4 02          	shl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 e4 02          	shl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 e7          	shl    \$1,%r31
+\s*[a-f0-9]+:\s*d5 11 d0 e7          	shl    \$1,%r31b
+\s*[a-f0-9]+:\s*49 c1 e4 02          	shl    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 e4 02          	shl    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 ef          	shr    \$1,%r31
+\s*[a-f0-9]+:\s*d5 11 d0 ef          	shr    \$1,%r31b
+\s*[a-f0-9]+:\s*49 c1 ec 02          	shr    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 ec 02          	shr    \$0x2,%r12b
+\s*[a-f0-9]+:\s*d5 19 d1 ff          	sar    \$1,%r31
+\s*[a-f0-9]+:\s*d5 11 d0 ff          	sar    \$1,%r31b
+\s*[a-f0-9]+:\s*49 c1 fc 02          	sar    \$0x2,%r12
+\s*[a-f0-9]+:\s*41 c0 fc 02          	sar    \$0x2,%r12b
+\s*[a-f0-9]+:\s*62 74 9c 18 24 20 01 	shld   \$0x1,%r12,\(%rax\),%r12
+\s*[a-f0-9]+:\s*4d 0f a4 c4 02       	shld   \$0x2,%r8,%r12
+\s*[a-f0-9]+:\s*62 54 bc 18 24 c4 02 	shld   \$0x2,%r8,%r12,%r8
+\s*[a-f0-9]+:\s*62 74 b4 18 a5 08    	shld   %cl,%r9,\(%rax\),%r9
+\s*[a-f0-9]+:\s*d5 9c a5 e0          	shld   %cl,%r12,%r16
+\s*[a-f0-9]+:\s*62 7c 9c 18 a5 e0    	shld   %cl,%r12,%r16,%r12
+\s*[a-f0-9]+:\s*62 74 9c 18 2c 20 01 	shrd   \$0x1,%r12,\(%rax\),%r12
+\s*[a-f0-9]+:\s*4d 0f ac ec 01       	shrd   \$0x1,%r13,%r12
+\s*[a-f0-9]+:\s*62 54 94 18 2c ec 01 	shrd   \$0x1,%r13,%r12,%r13
+\s*[a-f0-9]+:\s*62 74 b4 18 ad 08    	shrd   %cl,%r9,\(%rax\),%r9
+\s*[a-f0-9]+:\s*d5 9c ad e0          	shrd   %cl,%r12,%r16
+\s*[a-f0-9]+:\s*62 7c 9c 18 ad e0    	shrd   %cl,%r12,%r16,%r12
+\s*[a-f0-9]+:\s*67 0f 40 90 90 90 90 90 	cmovo  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 41 90 90 90 90 90 	cmovno -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 42 90 90 90 90 90 	cmovb  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 43 90 90 90 90 90 	cmovae -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 44 90 90 90 90 90 	cmove  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 45 90 90 90 90 90 	cmovne -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 46 90 90 90 90 90 	cmovbe -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 47 90 90 90 90 90 	cmova  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 48 90 90 90 90 90 	cmovs  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 49 90 90 90 90 90 	cmovns -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4a 90 90 90 90 90 	cmovp  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4b 90 90 90 90 90 	cmovnp -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4c 90 90 90 90 90 	cmovl  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4d 90 90 90 90 90 	cmovge -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4e 90 90 90 90 90 	cmovle -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*67 0f 4f 90 90 90 90 90 	cmovg  -0x6f6f6f70\(%eax\),%edx
+\s*[a-f0-9]+:\s*66 0f 38 f6 c3       	adcx   %ebx,%eax
+\s*[a-f0-9]+:\s*66 0f 38 f6 c3       	adcx   %ebx,%eax
+\s*[a-f0-9]+:\s*62 f4 fd 18 66 c3    	adcx   %rbx,%rax,%rax
+\s*[a-f0-9]+:\s*62 54 bd 18 66 c7    	adcx   %r15,%r8,%r8
+\s*[a-f0-9]+:\s*67 66 0f 38 f6 04 0a 	adcx   \(%edx,%ecx,1\),%eax
+\s*[a-f0-9]+:\s*f3 0f 38 f6 c3       	adox   %ebx,%eax
+\s*[a-f0-9]+:\s*f3 0f 38 f6 c3       	adox   %ebx,%eax
+\s*[a-f0-9]+:\s*62 f4 fe 18 66 c3    	adox   %rbx,%rax,%rax
+\s*[a-f0-9]+:\s*62 54 be 18 66 c7    	adox   %r15,%r8,%r8
+\s*[a-f0-9]+:\s*67 f3 0f 38 f6 04 0a 	adox   \(%edx,%ecx,1\),%eax
diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
new file mode 100644
index 00000000000..4335ee6d7ae
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
@@ -0,0 +1,123 @@
+# Check 64bit APX NDD instructions with optimized encoding
+
+	.text
+_start:
+add    %r31,%r8,%r8
+addb   %r31b,%r8b,%r8b
+{store} add    %r31,%r8,%r8
+{load}  add    %r31,%r8,%r8
+add    %r31,(%r8),%r31
+add    (%r31),%r8,%r8
+add    $0x12344433,%r15,%r15
+add    $0xfffffffff4332211,%r8,%r8
+inc    %r31,%r31
+incb   %r31b,%r31b
+sub    %r15,%r17,%r17
+subb   %r15b,%r17b,%r17b
+sub    %r15,(%r8),%r15
+sub    (%r15,%rax,1),%r16,%r16
+sub    $0x1234,%r30,%r30
+dec    %r17,%r17
+decb   %r17b,%r17b
+sbb    %r15,%r17,%r17
+sbbb   %r15b,%r17b,%r17b
+sbb    %r15,(%r8),%r15
+sbb    (%r15,%rax,1),%r16,%r16
+sbb    $0x1234,%r30,%r30
+and    %r15,%r17,%r17
+andb   %r15b,%r17b,%r17b
+and    %r15,(%r8),%r15
+and    (%r15,%rax,1),%r16,%r16
+and    $0x1234,%r30,%r30
+or     %r15,%r17,%r17
+orb    %r15b,%r17b,%r17b
+or     %r15,(%r8),%r15
+or     (%r15,%rax,1),%r16,%r16
+or     $0x1234,%r30,%r30
+xor    %r15,%r17,%r17
+xorb   %r15b,%r17b,%r17b
+xor    %r15,(%r8),%r15
+xor    (%r15,%rax,1),%r16,%r16
+xor    $0x1234,%r30,%r30
+adc    %r15,%r17,%r17
+adcb   %r15b,%r17b,%r17b
+adc    %r15,(%r8),%r15
+adc    (%r15,%rax,1),%r16,%r16
+adc    $0x1234,%r30,%r30
+neg    %r17,%r17
+negb   %r17b,%r17b
+not    %r17,%r17
+notb   %r17b,%r17b
+imul   0x90909(%eax),%edx,%edx
+imul   0x909(%rax,%r31,8),%rdx,%rdx
+imul   %rdx,%rax,%rdx
+rol    %r31,%r31
+rolb   %r31b,%r31b
+rol    $0x2,%r12,%r12
+rolb   $0x2,%r12b,%r12b
+ror    %r31,%r31
+rorb   %r31b,%r31b
+ror    $0x2,%r12,%r12
+rorb   $0x2,%r12b,%r12b
+rcl    %r31,%r31
+rclb   %r31b,%r31b
+rcl    $0x2,%r12,%r12
+rclb   $0x2,%r12b,%r12b
+rcr    %r31,%r31
+rcrb   %r31b,%r31b
+rcr    $0x2,%r12,%r12
+rcrb   $0x2,%r12b,%r12b
+sal    %r31,%r31
+salb   %r31b,%r31b
+sal    $0x2,%r12,%r12
+salb   $0x2,%r12b,%r12b
+shl    %r31,%r31
+shlb   %r31b,%r31b
+shl    $0x2,%r12,%r12
+shlb   $0x2,%r12b,%r12b
+shr    %r31,%r31
+shrb   %r31b,%r31b
+shr    $0x2,%r12,%r12
+shrb   $0x2,%r12b,%r12b
+sar    %r31,%r31
+sarb   %r31b,%r31b
+sar    $0x2,%r12,%r12
+sarb   $0x2,%r12b,%r12b
+shld   $0x1,%r12,(%rax),%r12
+shld   $0x2,%r8,%r12,%r12
+shld   $0x2,%r8,%r12,%r8
+shld   %cl,%r9,(%rax),%r9
+shld   %cl,%r12,%r16,%r16
+shld   %cl,%r12,%r16,%r12
+shrd   $0x1,%r12,(%rax),%r12
+shrd   $0x1,%r13,%r12,%r12
+shrd   $0x1,%r13,%r12,%r13
+shrd   %cl,%r9,(%rax),%r9
+shrd   %cl,%r12,%r16,%r16
+shrd   %cl,%r12,%r16,%r12
+cmovo  0x90909090(%eax),%edx,%edx
+cmovno 0x90909090(%eax),%edx,%edx
+cmovb  0x90909090(%eax),%edx,%edx
+cmovae 0x90909090(%eax),%edx,%edx
+cmove  0x90909090(%eax),%edx,%edx
+cmovne 0x90909090(%eax),%edx,%edx
+cmovbe 0x90909090(%eax),%edx,%edx
+cmova  0x90909090(%eax),%edx,%edx
+cmovs  0x90909090(%eax),%edx,%edx
+cmovns 0x90909090(%eax),%edx,%edx
+cmovp  0x90909090(%eax),%edx,%edx
+cmovnp 0x90909090(%eax),%edx,%edx
+cmovl  0x90909090(%eax),%edx,%edx
+cmovge 0x90909090(%eax),%edx,%edx
+cmovle 0x90909090(%eax),%edx,%edx
+cmovg  0x90909090(%eax),%edx,%edx
+adcx   %ebx,%eax,%eax
+adcx   %eax,%ebx,%eax
+adcx   %rbx,%rax,%rax
+adcx   %r15,%r8,%r8
+adcx   (%edx,%ecx,1),%eax,%eax
+adox   %ebx,%eax,%eax
+adox   %eax,%ebx,%eax
+adox   %rbx,%rax,%rax
+adox   %r15,%r8,%r8
+adox   (%edx,%ecx,1),%eax,%eax
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index b834379a491..034fc49b180 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -558,6 +558,7 @@ run_dump_test "x86-64-optimize-6"
 run_list_test "x86-64-optimize-7a" "-I${srcdir}/$subdir -march=+noavx -al"
 run_dump_test "x86-64-optimize-7b"
 run_list_test "x86-64-optimize-8" "-I${srcdir}/$subdir -march=+noavx2 -al"
+run_dump_test "x86-64-apx-ndd-optimize"
 run_dump_test "x86-64-align-branch-1a"
 run_dump_test "x86-64-align-branch-1b"
 run_dump_test "x86-64-align-branch-1c"
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v3 9/9] Support APX JMPABS for disassembler
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
                   ` (6 preceding siblings ...)
  2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
@ 2023-11-24  7:02 ` Cui, Lili
  2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 69+ messages in thread
From: Cui, Lili @ 2023-11-24  7:02 UTC (permalink / raw)
  To: binutils; +Cc: jbeulich, hongjiu.lu, Hu, Lin1

From: "Hu, Lin1" <lin1.hu@intel.com>

gas/ChangeLog:

	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs.d: Ditto.
	* testsuite/gas/i386/x86-64-apx-jmpabs.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (JMPABS_Fixup): New Fixup function to disassemble jmpabs.
	(print_insn): Add #UD exception for jmpabs.
	(dis386): Modify a1 unit for support jmpabs.
	* i386-mnem.h: Regenerated.
	* i386-opc.tbl: New insns.
	* i386-tbl.h: Regenerated.
---
 .../gas/i386/x86-64-apx-jmpabs-intel.d        | 11 +++++
 .../gas/i386/x86-64-apx-jmpabs-inval.d        | 40 ++++++++++++++++++
 .../gas/i386/x86-64-apx-jmpabs-inval.s        | 15 +++++++
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d    | 11 +++++
 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s    |  5 +++
 gas/testsuite/gas/i386/x86-64.exp             |  3 ++
 opcodes/i386-dis.c                            | 41 +++++++++++++++++--
 7 files changed, 123 insertions(+), 3 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-jmpabs.s

diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
new file mode 100644
index 00000000000..8c229315904
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-intel.d
@@ -0,0 +1,11 @@
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 APX_F JMPABS insns (Intel disassembly)
+#source: x86-64-apx-jmpabs.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 00 a1 02 00 00 00 00 00 00 00[	 ]+jmpabs 0x2
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
new file mode 100644
index 00000000000..c3dc0b0ad79
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.d
@@ -0,0 +1,40 @@
+#as: --64
+#objdump: -dw
+#name: illegal decoding of APX_F jmpabs insns
+#source: x86-64-apx-jmpabs-inval.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <.text>:
+\s*[a-f0-9]+:	66 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	67 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f2 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f3 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	f0 d5 00 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	d5 08 a1[  	]+\(bad\)
+\s*[a-f0-9]+:	01 00[  	]+add    %eax,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*[a-f0-9]+:	00 00[  	]+add    %al,\(%rax\)
+\s*...
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
new file mode 100644
index 00000000000..de4440a5466
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs-inval.s
@@ -0,0 +1,15 @@
+# Check bytecode of APX_F jmpabs instructions with illegal encode.
+
+	.text
+# With 66 prefix
+	.byte 0x66,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With 67 prefix
+	.byte 0x67,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With F2 prefix
+	.byte 0xf2,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With F3 prefix
+	.byte 0xf3,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# With LOCK prefix
+	.byte 0xf0,0xd5,0x00,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
+# REX2.M0 = 0 REX2.W = 1
+	.byte 0xd5,0x08,0xa1,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
new file mode 100644
index 00000000000..f2dbd617527
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.d
@@ -0,0 +1,11 @@
+#as:
+#objdump: -dw
+#name: x86_64 APX_F JMPABS insns
+#source: x86-64-apx-jmpabs.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*d5 00 a1 02 00 00 00 00 00 00 00[	 ]+jmpabs \$0x2
diff --git a/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
new file mode 100644
index 00000000000..69ffb763260
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-jmpabs.s
@@ -0,0 +1,5 @@
+# Check 64bit APX_F JMPABS instructions
+
+	.text
+ _start:
+	.byte 0xd5,0x00,0xa1,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 034fc49b180..8b41f9891a5 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -374,6 +374,9 @@ run_dump_test "x86-64-apx-evex-promoted"
 run_dump_test "x86-64-apx-evex-promoted-intel"
 run_dump_test "x86-64-apx-evex-egpr"
 run_dump_test "x86-64-apx-ndd"
+run_dump_test "x86-64-apx-jmpabs"
+run_dump_test "x86-64-apx-jmpabs-intel"
+run_dump_test "x86-64-apx-jmpabs-inval"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 0612b0cd4b4..b33b44d7c27 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -106,6 +106,7 @@ static bool MOVSXD_Fixup (instr_info *, int, int);
 static bool DistinctDest_Fixup (instr_info *, int, int);
 static bool PREFETCHI_Fixup (instr_info *, int, int);
 static bool PUSH2_POP2_Fixup (instr_info *, int, int);
+static bool JMPABS_Fixup (instr_info *, int, int);
 
 static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
 						enum disassembler_style,
@@ -2021,7 +2022,7 @@ static const struct dis386 dis386[] = {
   { "lahf",		{ XX }, 0 },
   /* a0 */
   { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
-  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ { JMPABS_Fixup, eAX_reg }, { JMPABS_Fixup, v_mode } }, PREFIX_REX2_ILLEGAL },
   { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
   { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
   { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
@@ -9689,7 +9690,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     }
 
   if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
-      && ins.last_rex2_prefix >= 0)
+      && ins.last_rex2_prefix >= 0 && (ins.rex2 & 16) == 0)
     {
       i386_dis_printf (info, dis_style_text, "(bad)");
       ret = ins.end_codep - priv.the_buffer;
@@ -9774,7 +9775,7 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     ins.all_prefixes[ins.last_rex_prefix] = 0;
 
   /* Check if the REX2 prefix is used.  */
-  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 7))
+  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 23))
     ins.all_prefixes[ins.last_rex2_prefix] = 0;
 
   /* Check if the SEG prefix is used.  */
@@ -13917,3 +13918,37 @@ PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
 
   return OP_VEX (ins, bytemode, sizeflag);
 }
+
+static bool
+JMPABS_Fixup (instr_info *ins, int bytemode, int sizeflag)
+{
+  if (ins->address_mode == mode_64bit
+      && ins->last_rex2_prefix >= 0
+      && (ins->rex2 & 0x80) == 0x0)
+    {
+      uint64_t op;
+
+      if (bytemode == eAX_reg)
+	return true;
+
+      if (!get64 (ins, &op))
+	return false;
+
+      if ((ins->prefixes & (PREFIX_OPCODE | PREFIX_ADDR | PREFIX_LOCK)) != 0x0
+	  || (ins->rex & REX_W) != 0x0)
+	{
+	  oappend (ins, "(bad)");
+	  return true;
+	}
+
+      ins->mnemonicendp = stpcpy (ins->obuf, "jmpabs");
+      ins->rex2 |= 16;
+      oappend_immediate (ins, op);
+
+      return true;
+    }
+
+  if (bytemode == eAX_reg)
+    return OP_IMREG (ins, bytemode, sizeflag);
+  return OP_OFF64 (ins, bytemode, sizeflag);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
                   ` (7 preceding siblings ...)
  2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
@ 2023-11-24  7:09 ` Jan Beulich
  2023-11-24 11:22   ` Cui, Lili
  2023-12-12  2:57 ` Lu, Hongjiu
  2023-12-12  8:16 ` Cui, Lili
  10 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-11-24  7:09 UTC (permalink / raw)
  To: Cui, Lili, binutils; +Cc: hongjiu.lu

On 24.11.2023 08:02, Cui, Lili wrote:
> Make const_1_mode print $1 in AT&T syntax, otherwise
> there will be correctness issues when it is extended
> to support APX NDD,

Looks fine to me, but I could easily imagine this to face H.J.'s opposition
(and hence my suggestion in this direction wasn't exactly this way). Since
iirc he's going to be back soon, may be best to wait until then. One request
though:

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -12090,6 +12090,8 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
>      case const_1_mode:
>        if (ins->intel_syntax)
>  	oappend (ins, "1");
> +      else
> +	oappend (ins, "$1");
>        return true;

This was already overlooked when output styling was introduced. Please
switch to oappend_immediate(ins, 1) here (i.e. replcaing the entire if/else).
As per above - from my pov okay with this change.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax
  2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
@ 2023-11-24 11:22   ` Cui, Lili
  2023-11-24 12:14     ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-11-24 11:22 UTC (permalink / raw)
  To: Beulich, Jan, binutils; +Cc: Lu, Hongjiu

> On 24.11.2023 08:02, Cui, Lili wrote:
> > Make const_1_mode print $1 in AT&T syntax, otherwise there will be
> > correctness issues when it is extended to support APX NDD,
> 
> Looks fine to me, but I could easily imagine this to face H.J.'s opposition (and
> hence my suggestion in this direction wasn't exactly this way). Since iirc he's
> going to be back soon, may be best to wait until then. One request
> though:
> 

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> > @@ -12090,6 +12090,8 @@ OP_I (instr_info *ins, int bytemode, int
> sizeflag)
> >      case const_1_mode:
> >        if (ins->intel_syntax)
> >  	oappend (ins, "1");
> > +      else
> > +	oappend (ins, "$1");
> >        return true;
> 
> This was already overlooked when output styling was introduced. Please
> switch to oappend_immediate(ins, 1) here (i.e. replcaing the entire if/else).
> As per above - from my pov okay with this change.
> 

If we use oappend_immediate(ins, 1), it will print $0x1 instead of $1, and then Imm1 and Imm8 will be confused.

regexp "^ +[a-f0-9]+:   d1 f0                   shl    \$1,%eax$"
line   " a57:   d1 f0                   shl    $0x1,%eax"

Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax
  2023-11-24 11:22   ` Cui, Lili
@ 2023-11-24 12:14     ` Jan Beulich
  0 siblings, 0 replies; 69+ messages in thread
From: Jan Beulich @ 2023-11-24 12:14 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 24.11.2023 12:22, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> Make const_1_mode print $1 in AT&T syntax, otherwise there will be
>>> correctness issues when it is extended to support APX NDD,
>>
>> Looks fine to me, but I could easily imagine this to face H.J.'s opposition (and
>> hence my suggestion in this direction wasn't exactly this way). Since iirc he's
>> going to be back soon, may be best to wait until then. One request
>> though:
>>
> 
>>> --- a/opcodes/i386-dis.c
>>> +++ b/opcodes/i386-dis.c
>>> @@ -12090,6 +12090,8 @@ OP_I (instr_info *ins, int bytemode, int
>> sizeflag)
>>>      case const_1_mode:
>>>        if (ins->intel_syntax)
>>>  	oappend (ins, "1");
>>> +      else
>>> +	oappend (ins, "$1");
>>>        return true;
>>
>> This was already overlooked when output styling was introduced. Please
>> switch to oappend_immediate(ins, 1) here (i.e. replcaing the entire if/else).
>> As per above - from my pov okay with this change.
>>
> 
> If we use oappend_immediate(ins, 1), it will print $0x1 instead of $1, and then Imm1 and Imm8 will be confused.
> 
> regexp "^ +[a-f0-9]+:   d1 f0                   shl    \$1,%eax$"
> line   " a57:   d1 f0                   shl    $0x1,%eax"

Hmm, yes. Albeit I'd like to drop pointless 0x output anyway from
oappend_immediate(). But for now I reduce my request to you then to
just get output correct styling-wise.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
  2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
@ 2023-12-04 16:30   ` Jan Beulich
  2023-12-05 13:31     ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-04 16:30 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> @@ -3865,6 +3873,12 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || t->opcode_modifier.evex;
>  }
>  
> +static INLINE bool
> +is_apx_rex2_encoding (void)
> +{
> +  return i.rex2 || i.rex2_encoding;
> +}

This function is used just once. Do we really need it? Or else why
don't you use it near the end of md_assemble()?

> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
>      i.vex.bytes[3] |= i.mask.reg->reg_num;
>  }
>  
> +/* Build (2 bytes) rex2 prefix.
> +   | D5h |
> +   | m | R4 X4 B4 | W R X B |
> +*/
> +static void
> +build_rex2_prefix (void)
> +{
> +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the same.  */

How do they handle it the same? (Also I don't think this is useful as
a code comment; it instead belongs in the description imo.)

> +  i.vex.length = 2;
> +  i.vex.bytes[0] = 0xd5;
> +  /* For the W R X B bits, the variables of rex prefix will be reused.  */
> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> +		    | (i.rex2 << 4) | i.rex);
> +}
> +
>  static void
>  process_immext (void)
>  {
> @@ -4385,12 +4414,16 @@ optimize_encoding (void)
>  	  i.suffix = 0;
>  	  /* Convert to byte registers.  */
>  	  if (i.types[1].bitfield.word)
> -	    j = 16;
> -	  else if (i.types[1].bitfield.dword)
> +	    /* There are 40 8-bit registers.  */
>  	    j = 32;
> +	  else if (i.types[1].bitfield.dword)
> +	    /* 32 8-bit registers + 32 16-bit registers.  */
> +	    j = 64;
>  	  else
> -	    j = 48;
> -	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
> +	    /* 32 8-bit registers + 32 16-bit registers
> +	       + 32 32-bit registers.  */
> +	    j = 96;
> +	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
>  	    j += 8;
>  	  i.op[1].regs -= j;
>  	}

I did comment on, in particular, the 8-bit register counts before.
Afaict the comments above are nevertheless unchanged and hence
still not really correct.

> @@ -5576,6 +5615,13 @@ md_assemble (char *line)
>  	  return;
>  	}
>  
> +      /* Check for explicit REX2 prefix.  */
> +      if (i.rex2_encoding)
> +	{
> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> +	  return;
> +	}

Again I'm pretty sure I pointed out before that i.rex2_encoding reflects
use of {rex2}. Which then the error message should correctly refer to.

> @@ -5615,11 +5661,12 @@ md_assemble (char *line)
>  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
>        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
>  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && i.rex != 0))
> +	  && (i.rex != 0 || i.rex2 != 0)))
>      {
>        int x;
>  
> -      i.rex |= REX_OPCODE;
> +      if (!i.rex2)
> +	i.rex |= REX_OPCODE;
>        for (x = 0; x < 2; x++)
>  	{
>  	  /* Look for 8 bit operand that uses old registers.  */
> @@ -5630,7 +5677,7 @@ md_assemble (char *line)
>  	      /* In case it is "hi" register, give up.  */
>  	      if (i.op[x].regs->reg_num > 3)
>  		as_bad (_("can't encode register '%s%s' in an "
> -			  "instruction requiring REX prefix."),
> +			  "instruction requiring REX/REX2 prefix."),
>  			register_prefix, i.op[x].regs->reg_name);
>  
>  	      /* Otherwise it is equivalent to the extended register.
> @@ -5642,11 +5689,11 @@ md_assemble (char *line)
>  	}
>      }
>  
> -  if (i.rex == 0 && i.rex_encoding)
> +  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 && i.rex2_encoding))

Doesn't this want to be

  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))

?

>      {
>        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
>  	 that uses legacy register.  If it is "hi" register, don't add
> -	 the REX_OPCODE byte.  */
> +	 rex and rex2 prefix.  */
>        int x;
>        for (x = 0; x < 2; x++)
>  	if (i.types[x].bitfield.class == Reg
> @@ -5656,6 +5703,7 @@ md_assemble (char *line)
>  	  {
>  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
>  	    i.rex_encoding = false;
> +	    i.rex2_encoding = false;
>  	    break;
>  	  }
>  
> @@ -5663,7 +5711,13 @@ md_assemble (char *line)
>  	i.rex = REX_OPCODE;
>      }
>  
> -  if (i.rex != 0)
> +  if (i.rex2 != 0 || i.rex2_encoding)
> +    {
> +      build_rex2_prefix ();
> +      /* The individual REX.RXBW bits got consumed.  */
> +      i.rex &= REX_OPCODE;
> +    }
> +  else if (i.rex != 0)
>      add_prefix (REX_OPCODE | i.rex);
>  
>    insert_lfence_before ();
> @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
>  		  /* {rex} */
>  		  i.rex_encoding = true;
>  		  break;
> +		case Prefix_REX2:
> +		  /* {rex2} */
> +		  i.rex2_encoding = true;
> +		  break;
>  		case Prefix_NoOptimize:
>  		  /* {nooptimize} */
>  		  i.no_optimize = true;
> @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if Egprs operands are valid for the instruction.  */
> +
> +static int
> +check_EgprOperands (const insn_template *t)
> +{
> +  if (!t->opcode_modifier.noegpr)
> +    return 0;
> +
> +  for (unsigned int op = 0; op < i.operands; op++)
> +    {
> +      if (i.types[op].bitfield.class != Reg
> +	  /* Special case for (%dx) while doing input/output op */
> +	  || i.input_output_operand)

Didn't we agree that this extra condition isn't necessary, once the
producer site correctly updates all state (which was supposed to be
done in a small prereq patch)?

> @@ -7107,7 +7204,9 @@ match_template (char mnem_suffix)
>        /* Do not verify operands when there are none.  */
>        if (!t->operands)
>  	{
> -	  if (VEX_check_encoding (t))
> +	  /* When there are no operands, we still need to use the
> +	     check_EgprOperands function to check whether {rex2} is valid.  */
> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))

As before imo either the function name wants changing (so it becomes
reasonable to use here, without the need for a comment explaining the
oddity), or you simply open-code the sole check that is needed here
(afaict: t->opcode_modifier.noegpr && i.rex2_encoding).

> @@ -7443,6 +7542,13 @@ match_template (char mnem_suffix)
>  	  continue;
>  	}
>  
> +      /* Check if EGRPS operands(r16-r31) are valid.  */

EGPR?

> --- a/gas/doc/c-i386.texi
> +++ b/gas/doc/c-i386.texi
> @@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
>  @code{avx10.1/256},
>  @code{avx10.1/128},
>  @code{user_msr},
> +@code{apx_f},
>  @code{amx_int8},
>  @code{amx_bf16},
>  @code{amx_fp16},
> @@ -983,6 +984,9 @@ Different encoding options can be specified via pseudo prefixes:
>  instructions (x86-64 only).  Note that this differs from the @samp{rex}
>  prefix which generates REX prefix unconditionally.
>  
> +@item
> +@samp{@{rex2@}} -- encode with REX2 prefix

This isn't in line with what's said for {rex}. Iirc we were in
agreement that we want both to behave consistently. In which case
documentation also needs to describe them consistently.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> @@ -0,0 +1,86 @@
> +# Check 64bit instructions with rex2 prefix encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +         test	$0x7, %r24b
> +         test	$0x7, %r24d
> +         test	$0x7, %r24
> +         test	$0x7, %r24w
> +## REX2.M bit
> +         imull	%eax, %r15d
> +         imull	%eax, %r16d
> +         punpckldq (%r18), %mm2
> +## REX2.R4 bit
> +         leal	(%rax), %r16d
> +         leal	(%rax), %r17d
> +         leal	(%rax), %r18d
> +         leal	(%rax), %r19d
> +         leal	(%rax), %r20d
> +         leal	(%rax), %r21d
> +         leal	(%rax), %r22d
> +         leal	(%rax), %r23d
> +         leal	(%rax), %r24d
> +         leal	(%rax), %r25d
> +         leal	(%rax), %r26d
> +         leal	(%rax), %r27d
> +         leal	(%rax), %r28d
> +         leal	(%rax), %r29d
> +         leal	(%rax), %r30d
> +         leal	(%rax), %r31d
> +## REX2.X4 bit
> +         leal	(,%r16), %eax
> +         leal	(,%r17), %eax
> +         leal	(,%r18), %eax
> +         leal	(,%r19), %eax
> +         leal	(,%r20), %eax
> +         leal	(,%r21), %eax
> +         leal	(,%r22), %eax
> +         leal	(,%r23), %eax
> +         leal	(,%r24), %eax
> +         leal	(,%r25), %eax
> +         leal	(,%r26), %eax
> +         leal	(,%r27), %eax
> +         leal	(,%r28), %eax
> +         leal	(,%r29), %eax
> +         leal	(,%r30), %eax
> +         leal	(,%r31), %eax
> +## REX.B4 bit

Further up you properly say REX2. Here and below it's only REX?

> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> @@ -5,3 +5,61 @@ pseudos:
>  	{rex} vmovaps %xmm7,%xmm2
>  	{rex} vmovaps %xmm17,%xmm2
>  	{rex} rorx $7,%eax,%ebx
> +	{rex2} vmovaps %xmm7,%xmm2
> +	{rex2} xsave (%rax)
> +	{rex2} xsaves (%ecx)
> +	{rex2} xsaves64 (%ecx)
> +	{rex2} xsavec (%ecx)
> +	{rex2} xrstors (%ecx)
> +	{rex2} xrstors64 (%ecx)
> +
> +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
> +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> +	{rex2} mov    0x90909090,%al
> +	{rex2} movabs 0x1,%al
> +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
> +	{rex2} lodsb
> +	{rex2} lods   %ds:(%esi),%al
> +	{rex2} lodsb   (%esi)
> +	{rex2} movs
> +	{rex2} movs   (%esi), (%edi)
> +	{rex2} scasl
> +	{rex2} scas   %es:(%edi),%eax
> +	{rex2} scasb   (%edi)
> +	{rex2} stosb
> +	{rex2} stosb   (%edi)
> +	{rex2} stos   %eax,%es:(%edi)
> +
> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.

This also covers map 1 row 8, doesn't it?

> +	{rex2} jo     .+2-0x70
> +	{rex2} jno    .+2-0x70
> +	{rex2} jb     .+2-0x70
> +	{rex2} jae    .+2-0x70
> +	{rex2} je     .+2-0x70
> +	{rex2} jne    .+2-0x70
> +	{rex2} jbe    .+2-0x70
> +	{rex2} ja     .+2-0x70
> +	{rex2} js     .+2-0x70
> +	{rex2} jns    .+2-0x70
> +	{rex2} jp     .+2-0x70
> +	{rex2} jnp    .+2-0x70
> +	{rex2} jl     .+2-0x70
> +	{rex2} jge    .+2-0x70
> +	{rex2} jle    .+2-0x70
> +	{rex2} jg     .+2-0x70
> +
> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> +	{rex2} in $0x90,%al
> +	{rex2} in $0x90
> +	{rex2} out $0x90,%al
> +	{rex2} out $0x90
> +	{rex2} jmp  *%eax
> +	{rex2} loop foo

Isn't this row 0xE?

> +	#All opcodes in the row 0xf3* prefixed REX2 are illegal.

This comment continues to be confusing: 0xf3 is a REP prefix. Perhaps
best to either say "map 1" and omit the "f" or at least write 0x0f3*
or slightly better 0x0f 0x3*.

> --- a/gas/testsuite/gas/i386/x86-64-pseudos.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
> @@ -360,6 +360,19 @@ _start:
>  	{rex} movaps (%r8),%xmm2
>  	{rex} phaddw (%rcx),%mm0
>  	{rex} phaddw (%r8),%mm0
> +	{rex2} mov %al,%ah
> +	{rex2} shl %cl, %eax
> +	{rex2} cmp %cl, %dl
> +	{rex2} mov $1, %bl
> +	{rex2} movl %eax,%ebx
> +	{rex2} movl %eax,%r14d
> +	{rex2} movl %eax,(%r8)
> +	{rex2} movaps %xmm7,%xmm2
> +	{rex2} movaps %xmm7,%xmm12
> +	{rex2} movaps (%rcx),%xmm2
> +	{rex2} movaps (%r8),%xmm2
> +	{rex2} pmullw %mm0,%mm6
> +
>  
>  	movb (%rbp),%al
>  	{disp8} movb (%rbp),%al

No double blank lines please.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c

Disassembler comments (if any) in a separate (later) mail again.

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -275,6 +275,8 @@ static const dependency isa_dependencies[] =
>      "64" },
>    { "USER_MSR",
>      "64" },
> +  { "APX_F",
> +    "XSAVE|64" },
>  };
>  
>  /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
> @@ -397,6 +399,7 @@ static bitfield cpu_flags[] =
>    BITFIELD (FRED),
>    BITFIELD (LKGS),
>    BITFIELD (USER_MSR),
> +  BITFIELD (APX_F),
>    BITFIELD (MWAITX),
>    BITFIELD (CLZERO),
>    BITFIELD (OSPKE),
> @@ -486,6 +489,7 @@ static bitfield opcode_modifiers[] =
>    BITFIELD (ATTSyntax),
>    BITFIELD (IntelSyntax),
>    BITFIELD (ISA64),
> +  BITFIELD (NoEgpr),
>  };
>  
>  #define CLASS(n) #n, n
> @@ -1072,10 +1076,48 @@ get_element_size (char **opnd, int lineno)
>    return elem_size;
>  }
>  
> +static bool
> +rex2_disallowed (const unsigned long long opcode, unsigned int space,
> +			       const char *cpu_flags)
> +{
> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> +      || strcmp (cpu_flags, "Xsave") >= 0
> +      || strcmp (cpu_flags, "Xsaveopt") >= 0)
> +    return true;

Wasn't this intended to be dropped, being redundant with the opcode table
attributes?

> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> +     are reserved under REX2 and triggers #UD when prefixed with REX2 */
> +  if (space == 0)
> +    switch (opcode >> 4)

Both here and ...

> +      {
> +      case 0x4:
> +      case 0x7:
> +      case 0xA:
> +      case 0xE:
> +	return true;
> +      default:
> +	return false;
> +    }
> +
> +  if (space == SPACE_0F)
> +    switch (opcode >> 4)

... here, don't you also need to mask off further bits? There are
quite a few opcodes which have a kind-of ModR/M byte encoded directly
in the opcode, for example.

> +      {
> +      case 0x3:
> +      case 0x8:
> +	return true;
> +      default:
> +	return false;
> +      }
> +
> +  return false;
> +}
> +
>  static void
>  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  			      unsigned int prefix, const char *extension_opcode,
> -			      char **opnd, int lineno)
> +			      char **opnd, int lineno, bool rex2_disallowed)
>  {
>    char *str, *next, *last;
>    bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
> @@ -1202,6 +1244,12 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  	  || modifiers[SAE].value))
>      modifiers[EVex].value = EVEXDYN;
>  
> +  /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
> +     For template supports both Vex and EVex allowing EGPR.  */

"Templates supporting both Vex and EVex allow EGPR."

> +  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
> +      && !modifiers[EVex].value)
> +    modifiers[NoEgpr].value = 1;
> +
>    output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
>  }
>  
> @@ -1425,8 +1473,11 @@ output_i386_opcode (FILE *table, const char *name, char *str,
>  	   ident, 2 * (int)length, opcode, end, i);
>    free (ident);
>  
> +  /* Add some specilal handle for current entry.  */
> +  bool  has_special_handle = rex2_disallowed (opcode, space, cpu_flags);

The local variable (if one is needed in the first place) wants naming as
usefully as the function now is named. Similarly the comment would want
improving alonmg those lines.

>    process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
> -				extension_opcode, operand_types, lineno);
> +				extension_opcode, operand_types, lineno,
> +				has_special_handle);
>  
>    process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
>  
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -138,6 +138,7 @@
>  #define Vsz256 Vsz=VSZ256
>  #define Vsz512 Vsz=VSZ512
>  
> +
>  // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
>  // the bit to mark commutative VEX encodings where swapping the source
>  // operands may allow to switch from 3-byte to 2-byte VEX encoding.

Stray change (in general please avoid introducing double blank lines, as
those make patch context less useful).

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
  2023-12-04 16:30   ` Jan Beulich
@ 2023-12-05 13:31     ` Cui, Lili
  2023-12-06  7:52       ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-05 13:31 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 24.11.2023 08:02, Cui, Lili wrote:
> > @@ -3865,6 +3873,12 @@ is_any_vex_encoding (const insn_template *t)
> >    return t->opcode_modifier.vex || t->opcode_modifier.evex;  }
> >
> > +static INLINE bool
> > +is_apx_rex2_encoding (void)
> > +{
> > +  return i.rex2 || i.rex2_encoding;
> > +}
> 
> This function is used just once. Do we really need it? Or else why don't you
> use it near the end of md_assemble()?
> 

Yes, I also found this issue and used this function instead of " (i.rex2 != 0 || i.rex2_encoding)" at the end of md_assemble().

> > @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
> >      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >
> > +/* Build (2 bytes) rex2 prefix.
> > +   | D5h |
> > +   | m | R4 X4 B4 | W R X B |
> > +*/
> > +static void
> > +build_rex2_prefix (void)
> > +{
> > +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the
> > +same.  */
> 
> How do they handle it the same? (Also I don't think this is useful as a code
> comment; it instead belongs in the description imo.)
> 

Moved the comment to the functions description.

/* Build (2 bytes) rex2 prefix.
   | D5h |
   | m | R4 X4 B4 | W R X B |

   Rex2 reuses i.vex as they handle i.tm.opcode_space the same way.  */
static void
build_rex2_prefix (void)


In function "output_insn",  some handle like this.

      if (!i.vex.length)
        switch (i.tm.opcode_space)
          {
          case SPACE_BASE:
            break;
          case SPACE_0F:
            ++j;
            break;
          case SPACE_0F38:
          case SPACE_0F3A:
            j += 2;
            break;
          default:
            abort ();
          }
.....
         if (!i.vex.length
              && i.tm.opcode_space != SPACE_BASE)
            {
              *p++ = 0x0f;
              if (i.tm.opcode_space != SPACE_0F)
                *p++ = i.tm.opcode_space == SPACE_0F38
                       ? 0x38 : 0x3a;
            }

> > +  i.vex.length = 2;
> > +  i.vex.bytes[0] = 0xd5;
> > +  /* For the W R X B bits, the variables of rex prefix will be
> > +reused.  */
> > +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> > +		    | (i.rex2 << 4) | i.rex);
> > +}
> > +
> >  static void
> >  process_immext (void)
> >  {
> > @@ -4385,12 +4414,16 @@ optimize_encoding (void)
> >  	  i.suffix = 0;
> >  	  /* Convert to byte registers.  */
> >  	  if (i.types[1].bitfield.word)
> > -	    j = 16;
> > -	  else if (i.types[1].bitfield.dword)
> > +	    /* There are 40 8-bit registers.  */
> >  	    j = 32;
> > +	  else if (i.types[1].bitfield.dword)
> > +	    /* 32 8-bit registers + 32 16-bit registers.  */
> > +	    j = 64;
> >  	  else
> > -	    j = 48;
> > -	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
> > +	    /* 32 8-bit registers + 32 16-bit registers
> > +	       + 32 32-bit registers.  */
> > +	    j = 96;
> > +	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum
> > +< 4)
> >  	    j += 8;
> >  	  i.op[1].regs -= j;
> >  	}
> 
> I did comment on, in particular, the 8-bit register counts before.
> Afaict the comments above are nevertheless unchanged and hence still not
> really correct.
> 

Changed to :

      if (flag_code == CODE_64BIT || base_regnum < 4)
        {
          i.types[1].bitfield.byte = 1;
          /* Ignore the suffix.  */
          i.suffix = 0;
          /* Convert to byte registers. 8-bit registers are special,
             RegRex64 and non-RegRex64 each have 8 registers.  */
          if (i.types[1].bitfield.word)
            /* 32 (or 40) 8-bit registers.  */
            j = 32;
          else if (i.types[1].bitfield.dword)
            /* 32 (or 40)8-bit registers + 32 16-bit registers.  */
            j = 64;
          else
            /* 32 (or 40) 8-bit registers + 32 16-bit registers
               + 32 32-bit registers.  */
            j = 96;

          if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
            j += 8;
          i.op[1].regs -= j;
        }

> > @@ -5576,6 +5615,13 @@ md_assemble (char *line)
> >  	  return;
> >  	}
> >
> > +      /* Check for explicit REX2 prefix.  */
> > +      if (i.rex2_encoding)
> > +	{
> > +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> > +	  return;
> > +	}
> 
> Again I'm pretty sure I pointed out before that i.rex2_encoding reflects use of
> {rex2}. Which then the error message should correctly refer to.
> 

Changed to 

      /* Check for explicit REX2 prefix.  */
      if (i.rex2_encoding)
        {
          as_bad (_("{rex2} prefix invalid with `%s'"), insn_name (&i.tm));
          return;
        }

> > @@ -5615,11 +5661,12 @@ md_assemble (char *line)
> >  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> >        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> >  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && i.rex != 0))
> > +	  && (i.rex != 0 || i.rex2 != 0)))
> >      {
> >        int x;
> >
> > -      i.rex |= REX_OPCODE;
> > +      if (!i.rex2)
> > +	i.rex |= REX_OPCODE;
> >        for (x = 0; x < 2; x++)
> >  	{
> >  	  /* Look for 8 bit operand that uses old registers.  */ @@ -5630,7
> > +5677,7 @@ md_assemble (char *line)
> >  	      /* In case it is "hi" register, give up.  */
> >  	      if (i.op[x].regs->reg_num > 3)
> >  		as_bad (_("can't encode register '%s%s' in an "
> > -			  "instruction requiring REX prefix."),
> > +			  "instruction requiring REX/REX2 prefix."),
> >  			register_prefix, i.op[x].regs->reg_name);
> >
> >  	      /* Otherwise it is equivalent to the extended register.
> > @@ -5642,11 +5689,11 @@ md_assemble (char *line)
> >  	}
> >      }
> >
> > -  if (i.rex == 0 && i.rex_encoding)
> > +  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 &&
> > + i.rex2_encoding))
> 
> Doesn't this want to be
> 
>   if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))
> 
> ?

Done.

> 
> >      {
> >        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
> >  	 that uses legacy register.  If it is "hi" register, don't add
> > -	 the REX_OPCODE byte.  */
> > +	 rex and rex2 prefix.  */
> >        int x;
> >        for (x = 0; x < 2; x++)
> >  	if (i.types[x].bitfield.class == Reg @@ -5656,6 +5703,7 @@
> > md_assemble (char *line)
> >  	  {
> >  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> >  	    i.rex_encoding = false;
> > +	    i.rex2_encoding = false;
> >  	    break;
> >  	  }
> >
> > @@ -5663,7 +5711,13 @@ md_assemble (char *line)
> >  	i.rex = REX_OPCODE;
> >      }
> >
> > -  if (i.rex != 0)
> > +  if (i.rex2 != 0 || i.rex2_encoding)
> > +    {
> > +      build_rex2_prefix ();
> > +      /* The individual REX.RXBW bits got consumed.  */
> > +      i.rex &= REX_OPCODE;
> > +    }
> > +  else if (i.rex != 0)
> >      add_prefix (REX_OPCODE | i.rex);
> >
> >    insert_lfence_before ();
> > @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic,
> bool prefix_only)
> >  		  /* {rex} */
> >  		  i.rex_encoding = true;
> >  		  break;
> > +		case Prefix_REX2:
> > +		  /* {rex2} */
> > +		  i.rex2_encoding = true;
> > +		  break;
> >  		case Prefix_NoOptimize:
> >  		  /* {nooptimize} */
> >  		  i.no_optimize = true;
> > @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if Egprs operands are valid for the instruction.  */
> > +
> > +static int
> > +check_EgprOperands (const insn_template *t) {
> > +  if (!t->opcode_modifier.noegpr)
> > +    return 0;
> > +
> > +  for (unsigned int op = 0; op < i.operands; op++)
> > +    {
> > +      if (i.types[op].bitfield.class != Reg
> > +	  /* Special case for (%dx) while doing input/output op */
> > +	  || i.input_output_operand)
> 
> Didn't we agree that this extra condition isn't necessary, once the producer
> site correctly updates all state (which was supposed to be done in a small
> prereq patch)?
> 

I tried adding "Unspecified | BaseIndex" to the InOutPortReg, then some related instructions had two memory operands, so it raised a lot of invalid test case fail, and more ugly code needed to be added. In the end, I felt that this simple modification might be better.

@@ -13137,6 +13137,7 @@ i386_att_operand (char *operand_string)
          && !operand_type_check (i.types[this_operand], disp))
        {
          i.types[this_operand] = i.base_reg->reg_type;
+         i.types[this_operand].bitfield.class = 0;
          i.input_output_operand = true;
          return 1;

> > @@ -7107,7 +7204,9 @@ match_template (char mnem_suffix)
> >        /* Do not verify operands when there are none.  */
> >        if (!t->operands)
> >  	{
> > -	  if (VEX_check_encoding (t))
> > +	  /* When there are no operands, we still need to use the
> > +	     check_EgprOperands function to check whether {rex2} is valid.  */
> > +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> 
> As before imo either the function name wants changing (so it becomes
> reasonable to use here, without the need for a comment explaining the
> oddity), or you simply open-code the sole check that is needed here
> (afaict: t->opcode_modifier.noegpr && i.rex2_encoding).
> 

Changed to 

          if (VEX_check_encoding (t))
            {
              specific_error = progress (i.error);
              continue;
            }

          /* Check if pseudo prefix {rex2} is valid.  */
          if (t->opcode_modifier.noegpr && i.rex2_encoding)
            {
              i.error = invalid_pseudo_prefix;
              specific_error = progress (i.error);
              continue;
            }

> > @@ -7443,6 +7542,13 @@ match_template (char mnem_suffix)
> >  	  continue;
> >  	}
> >
> > +      /* Check if EGRPS operands(r16-r31) are valid.  */
> 
> EGPR?
> 

Done.

> > --- a/gas/doc/c-i386.texi
> > +++ b/gas/doc/c-i386.texi
> > @@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
> > @code{avx10.1/256},  @code{avx10.1/128},  @code{user_msr},
> > +@code{apx_f},
> >  @code{amx_int8},
> >  @code{amx_bf16},
> >  @code{amx_fp16},
> > @@ -983,6 +984,9 @@ Different encoding options can be specified via
> pseudo prefixes:
> >  instructions (x86-64 only).  Note that this differs from the
> > @samp{rex}  prefix which generates REX prefix unconditionally.
> >
> > +@item
> > +@samp{@{rex2@}} -- encode with REX2 prefix
> 
> This isn't in line with what's said for {rex}. Iirc we were in agreement that we
> want both to behave consistently. In which case documentation also needs to
> describe them consistently.
> 

Changed to 

@item
@samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
instructions (APX_F only).  Note that this differs from the @samp{rex2}
prefix which generates REX2 prefix unconditionally.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> > @@ -0,0 +1,86 @@
> > +# Check 64bit instructions with rex2 prefix encoding
> > +         leal	(,%r16), %eax
> > +         leal	(,%r17), %eax
> > +         leal	(,%r18), %eax
> > +         leal	(,%r19), %eax
> > +         leal	(,%r20), %eax
> > +         leal	(,%r21), %eax
> > +         leal	(,%r22), %eax
> > +         leal	(,%r23), %eax
> > +         leal	(,%r24), %eax
> > +         leal	(,%r25), %eax
> > +         leal	(,%r26), %eax
> > +         leal	(,%r27), %eax
> > +         leal	(,%r28), %eax
> > +         leal	(,%r29), %eax
> > +         leal	(,%r30), %eax
> > +         leal	(,%r31), %eax
> > +## REX.B4 bit
> 
> Further up you properly say REX2. Here and below it's only REX?
> 

Done.

> > --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> > +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> > @@ -5,3 +5,61 @@ pseudos:
> >  	{rex} vmovaps %xmm7,%xmm2
> >  	{rex} vmovaps %xmm17,%xmm2
> >  	{rex} rorx $7,%eax,%ebx
> > +	{rex2} vmovaps %xmm7,%xmm2
> > +	{rex2} xsave (%rax)
> > +	{rex2} xsaves (%ecx)
> > +	{rex2} xsaves64 (%ecx)
> > +	{rex2} xsavec (%ecx)
> > +	{rex2} xrstors (%ecx)
> > +	{rex2} xrstors64 (%ecx)
> > +
> > +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
> > +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> > +	{rex2} mov    0x90909090,%al
> > +	{rex2} movabs 0x1,%al
> > +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
> > +	{rex2} lodsb
> > +	{rex2} lods   %ds:(%esi),%al
> > +	{rex2} lodsb   (%esi)
> > +	{rex2} movs
> > +	{rex2} movs   (%esi), (%edi)
> > +	{rex2} scasl
> > +	{rex2} scas   %es:(%edi),%eax
> > +	{rex2} scasb   (%edi)
> > +	{rex2} stosb
> > +	{rex2} stosb   (%edi)
> > +	{rex2} stos   %eax,%es:(%edi)
> > +
> > +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> 
> This also covers map 1 row 8, doesn't it?
> 

No, I didn't find 0xf8* in opcode table.

> > +	{rex2} jo     .+2-0x70
> > +	{rex2} jno    .+2-0x70
> > +	{rex2} jb     .+2-0x70
> > +	{rex2} jae    .+2-0x70
> > +	{rex2} je     .+2-0x70
> > +	{rex2} jne    .+2-0x70
> > +	{rex2} jbe    .+2-0x70
> > +	{rex2} ja     .+2-0x70
> > +	{rex2} js     .+2-0x70
> > +	{rex2} jns    .+2-0x70
> > +	{rex2} jp     .+2-0x70
> > +	{rex2} jnp    .+2-0x70
> > +	{rex2} jl     .+2-0x70
> > +	{rex2} jge    .+2-0x70
> > +	{rex2} jle    .+2-0x70
> > +	{rex2} jg     .+2-0x70
> > +
> > +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> > +	{rex2} in $0x90,%al
> > +	{rex2} in $0x90
> > +	{rex2} out $0x90,%al
> > +	{rex2} out $0x90
> > +	{rex2} jmp  *%eax
> > +	{rex2} loop foo
> > +	#All opcodes in the row 0xf3* prefixed REX2 are illegal.
> 
> This comment continues to be confusing: 0xf3 is a REP prefix. Perhaps best to
> either say "map 1" and omit the "f" or at least write 0x0f3* or slightly better
> 0x0f 0x3*.
> 

Done.

> > --- a/gas/testsuite/gas/i386/x86-64-pseudos.s
> > +++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
> > @@ -360,6 +360,19 @@ _start:
> >  	{rex} movaps (%r8),%xmm2
> >  	{rex} phaddw (%rcx),%mm0
> >  	{rex} phaddw (%r8),%mm0
> > +	{rex2} mov %al,%ah
> > +	{rex2} shl %cl, %eax
> > +	{rex2} cmp %cl, %dl
> > +	{rex2} mov $1, %bl
> > +	{rex2} movl %eax,%ebx
> > +	{rex2} movl %eax,%r14d
> > +	{rex2} movl %eax,(%r8)
> > +	{rex2} movaps %xmm7,%xmm2
> > +	{rex2} movaps %xmm7,%xmm12
> > +	{rex2} movaps (%rcx),%xmm2
> > +	{rex2} movaps (%r8),%xmm2
> > +	{rex2} pmullw %mm0,%mm6
> > +
> >
> >  	movb (%rbp),%al
> >  	{disp8} movb (%rbp),%al
> 
> No double blank lines please.
> 

Done.

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> 
> Disassembler comments (if any) in a separate (later) mail again.
> 

OK.

> > --- a/opcodes/i386-gen.c
> > +++ b/opcodes/i386-gen.c
> > @@ -275,6 +275,8 @@ static const dependency isa_dependencies[] =
> >      "64" },
> >    { "USER_MSR",
> >      "64" },
> > +  { "APX_F",
> > +    "XSAVE|64" },
> >  };
> >
> >  /* This array is populated as process_i386_initializers() walks
> > cpu_flags[].  */ @@ -397,6 +399,7 @@ static bitfield cpu_flags[] =
> >    BITFIELD (FRED),
> >    BITFIELD (LKGS),
> >    BITFIELD (USER_MSR),
> > +  BITFIELD (APX_F),
> >    BITFIELD (MWAITX),
> >    BITFIELD (CLZERO),
> >    BITFIELD (OSPKE),
> > @@ -486,6 +489,7 @@ static bitfield opcode_modifiers[] =
> >    BITFIELD (ATTSyntax),
> >    BITFIELD (IntelSyntax),
> >    BITFIELD (ISA64),
> > +  BITFIELD (NoEgpr),
> >  };
> >
> >  #define CLASS(n) #n, n
> > @@ -1072,10 +1076,48 @@ get_element_size (char **opnd, int lineno)
> >    return elem_size;
> >  }
> >
> > +static bool
> > +rex2_disallowed (const unsigned long long opcode, unsigned int space,
> > +			       const char *cpu_flags)
> > +{
> > +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> > +#UD.  */
> > +  if (strcmp (cpu_flags, "XSAVES") >= 0
> > +      || strcmp (cpu_flags, "XSAVEC") >= 0
> > +      || strcmp (cpu_flags, "Xsave") >= 0
> > +      || strcmp (cpu_flags, "Xsaveopt") >= 0)
> > +    return true;
> 
> Wasn't this intended to be dropped, being redundant with the opcode table
> attributes?
>

Yes, dropped.
 
> > +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> > +     are reserved under REX2 and triggers #UD when prefixed with REX2
> > + */  if (space == 0)
> > +    switch (opcode >> 4)
> 
> Both here and ...
>
> > +      {
> > +      case 0x4:
> > +      case 0x7:
> > +      case 0xA:
> > +      case 0xE:
> > +	return true;
> > +      default:
> > +	return false;
> > +    }
> > +
> > +  if (space == SPACE_0F)
> > +    switch (opcode >> 4)
> 
> ... here, don't you also need to mask off further bits? There are quite a few
> opcodes which have a kind-of ModR/M byte encoded directly in the opcode,
> for example.
> 

Thanks for reminding. Added the code like this.

/* Some opcodes encode a ModR/M byte directly in the opcode.  */
  unsigned long long
  base_opcode = (length > 1) ? opcode >> (8 * length - 8) : opcode;

/* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
     are reserved under REX2 and triggers #UD when prefixed with REX2 */
  if (space == 0)
    switch (base_opcode >> 4)
      {
      case 0x4:
      case 0x7:
      case 0xA:
      case 0xE:
        return true;
      default:
        return false;
    }

  if (space == SPACE_0F)
    switch (base_opcode >> 4)

> > +      {
> > +      case 0x3:
> > +      case 0x8:
> > +	return true;
> > +      default:
> > +	return false;
> > +      }
> > +
> > +  return false;
> > +}
> > +
> >  static void
> >  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int
> space,
> >  			      unsigned int prefix, const char
> *extension_opcode,
> > -			      char **opnd, int lineno)
> > +			      char **opnd, int lineno, bool rex2_disallowed)
> >  {
> >    char *str, *next, *last;
> >    bitfield modifiers [ARRAY_SIZE (opcode_modifiers)]; @@ -1202,6
> > +1244,12 @@ process_i386_opcode_modifier (FILE *table, char *mod,
> unsigned int space,
> >  	  || modifiers[SAE].value))
> >      modifiers[EVex].value = EVEXDYN;
> >
> > +  /* Vex, legacy map2 and map3 and rex2_disallowed do not support
> EGPR.
> > +     For template supports both Vex and EVex allowing EGPR.  */
> 
> "Templates supporting both Vex and EVex allow EGPR."
> 

Done.

> > +  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
> > +      && !modifiers[EVex].value)
> > +    modifiers[NoEgpr].value = 1;
> > +
> >    output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
> > }
> >
> > @@ -1425,8 +1473,11 @@ output_i386_opcode (FILE *table, const char
> *name, char *str,
> >  	   ident, 2 * (int)length, opcode, end, i);
> >    free (ident);
> >
> > +  /* Add some specilal handle for current entry.  */  bool
> > + has_special_handle = rex2_disallowed (opcode, space, cpu_flags);
> 
> The local variable (if one is needed in the first place) wants naming as usefully
> as the function now is named. Similarly the comment would want improving
> alonmg those lines.
> 

Dropped the local variable. Changed to

  process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
                                extension_opcode, operand_types, lineno,
                                rex2_disallowed (opcode, length, space,
                                                 cpu_flags));

> >    process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
> > -				extension_opcode, operand_types, lineno);
> > +				extension_opcode, operand_types, lineno,
> > +				has_special_handle);
> >
> >    process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
> >
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -138,6 +138,7 @@
> >  #define Vsz256 Vsz=VSZ256
> >  #define Vsz512 Vsz=VSZ512
> >
> > +
> >  // The EVEX purpose of StaticRounding appears only together with SAE.
> > Re-use  // the bit to mark commutative VEX encodings where swapping
> > the source  // operands may allow to switch from 3-byte to 2-byte VEX
> encoding.
> 
> Stray change (in general please avoid introducing double blank lines, as those
> make patch context less useful).
> 
Done.

Thanks,
Lili.



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
  2023-12-05 13:31     ` Cui, Lili
@ 2023-12-06  7:52       ` Jan Beulich
  2023-12-06 12:43         ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-06  7:52 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 05.12.2023 14:31, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
>>>
>>> +/* Build (2 bytes) rex2 prefix.
>>> +   | D5h |
>>> +   | m | R4 X4 B4 | W R X B |
>>> +*/
>>> +static void
>>> +build_rex2_prefix (void)
>>> +{
>>> +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the
>>> +same.  */
>>
>> How do they handle it the same? (Also I don't think this is useful as a code
>> comment; it instead belongs in the description imo.)
>>
> 
> Moved the comment to the functions description.
> 
> /* Build (2 bytes) rex2 prefix.
>    | D5h |
>    | m | R4 X4 B4 | W R X B |
> 
>    Rex2 reuses i.vex as they handle i.tm.opcode_space the same way.  */
> static void
> build_rex2_prefix (void)
> 
> 
> In function "output_insn",  some handle like this.
> 
>       if (!i.vex.length)
>         switch (i.tm.opcode_space)
>           {
>           case SPACE_BASE:
>             break;
>           case SPACE_0F:
>             ++j;
>             break;
>           case SPACE_0F38:
>           case SPACE_0F3A:
>             j += 2;
>             break;
>           default:
>             abort ();
>           }
> .....
>          if (!i.vex.length
>               && i.tm.opcode_space != SPACE_BASE)
>             {
>               *p++ = 0x0f;
>               if (i.tm.opcode_space != SPACE_0F)
>                 *p++ = i.tm.opcode_space == SPACE_0F38
>                        ? 0x38 : 0x3a;
>             }

Oh, I see. That's pretty remote. How about replacing "the same way"? Perhaps
"Rex2 reuses i.vex as they both encode i.tm.opcode_space in their prefixes"?

While in that form it's fine to remain in a code comment, just a general
clarification: When I say something wants saying in the "description", it's
(almost) always that I mean the patch description, not anything else.

>>> @@ -4385,12 +4414,16 @@ optimize_encoding (void)
>>>  	  i.suffix = 0;
>>>  	  /* Convert to byte registers.  */
>>>  	  if (i.types[1].bitfield.word)
>>> -	    j = 16;
>>> -	  else if (i.types[1].bitfield.dword)
>>> +	    /* There are 40 8-bit registers.  */
>>>  	    j = 32;
>>> +	  else if (i.types[1].bitfield.dword)
>>> +	    /* 32 8-bit registers + 32 16-bit registers.  */
>>> +	    j = 64;
>>>  	  else
>>> -	    j = 48;
>>> -	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
>>> +	    /* 32 8-bit registers + 32 16-bit registers
>>> +	       + 32 32-bit registers.  */
>>> +	    j = 96;
>>> +	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum
>>> +< 4)
>>>  	    j += 8;
>>>  	  i.op[1].regs -= j;
>>>  	}
>>
>> I did comment on, in particular, the 8-bit register counts before.
>> Afaict the comments above are nevertheless unchanged and hence still not
>> really correct.
>>
> 
> Changed to :
> 
>       if (flag_code == CODE_64BIT || base_regnum < 4)
>         {
>           i.types[1].bitfield.byte = 1;
>           /* Ignore the suffix.  */
>           i.suffix = 0;
>           /* Convert to byte registers. 8-bit registers are special,
>              RegRex64 and non-RegRex64 each have 8 registers.  */
>           if (i.types[1].bitfield.word)
>             /* 32 (or 40) 8-bit registers.  */
>             j = 32;
>           else if (i.types[1].bitfield.dword)
>             /* 32 (or 40)8-bit registers + 32 16-bit registers.  */

Nit: Missing blank.

>             j = 64;
>           else
>             /* 32 (or 40) 8-bit registers + 32 16-bit registers
>                + 32 32-bit registers.  */
>             j = 96;
> 
>           if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
>             j += 8;
>           i.op[1].regs -= j;
>         }

I won't insist on further changes, but imo as you're adding comments,
also adding a comment to this last if() (which finally takes care of
the 8-bit reg special case) would be advisable.

>>> @@ -5663,7 +5711,13 @@ md_assemble (char *line)
>>>  	i.rex = REX_OPCODE;
>>>      }
>>>
>>> -  if (i.rex != 0)
>>> +  if (i.rex2 != 0 || i.rex2_encoding)
>>> +    {
>>> +      build_rex2_prefix ();
>>> +      /* The individual REX.RXBW bits got consumed.  */
>>> +      i.rex &= REX_OPCODE;
>>> +    }
>>> +  else if (i.rex != 0)
>>>      add_prefix (REX_OPCODE | i.rex);
>>>
>>>    insert_lfence_before ();
>>> @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic,
>> bool prefix_only)
>>>  		  /* {rex} */
>>>  		  i.rex_encoding = true;
>>>  		  break;
>>> +		case Prefix_REX2:
>>> +		  /* {rex2} */
>>> +		  i.rex2_encoding = true;
>>> +		  break;
>>>  		case Prefix_NoOptimize:
>>>  		  /* {nooptimize} */
>>>  		  i.no_optimize = true;
>>> @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
>>>    return 0;
>>>  }
>>>
>>> +/* Check if Egprs operands are valid for the instruction.  */
>>> +
>>> +static int
>>> +check_EgprOperands (const insn_template *t) {
>>> +  if (!t->opcode_modifier.noegpr)
>>> +    return 0;
>>> +
>>> +  for (unsigned int op = 0; op < i.operands; op++)
>>> +    {
>>> +      if (i.types[op].bitfield.class != Reg
>>> +	  /* Special case for (%dx) while doing input/output op */
>>> +	  || i.input_output_operand)
>>
>> Didn't we agree that this extra condition isn't necessary, once the producer
>> site correctly updates all state (which was supposed to be done in a small
>> prereq patch)?
>>
> 
> I tried adding "Unspecified | BaseIndex" to the InOutPortReg, then some related instructions had two memory operands, so it raised a lot of invalid test case fail, and more ugly code needed to be added. In the end, I felt that this simple modification might be better.

Changing InOutPortReg of course isn't going to be easy. But that also wasn't
what we had discussed. Instead (I thought) we agreed on ...

> @@ -13137,6 +13137,7 @@ i386_att_operand (char *operand_string)
>           && !operand_type_check (i.types[this_operand], disp))
>         {
>           i.types[this_operand] = i.base_reg->reg_type;
> +         i.types[this_operand].bitfield.class = 0;
>           i.input_output_operand = true;
>           return 1;

amending this code to also correctly set i.op[].regs. Perhaps it would also
be best to actually clear i.base_reg (for there not being any memory operand).
(FTAOD: All of this in a separate prereq patch, not here. The code creating
inconsistent state has been a [latent] bug for a long time.)

>>> --- a/gas/doc/c-i386.texi
>>> +++ b/gas/doc/c-i386.texi
>>> @@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
>>> @code{avx10.1/256},  @code{avx10.1/128},  @code{user_msr},
>>> +@code{apx_f},
>>>  @code{amx_int8},
>>>  @code{amx_bf16},
>>>  @code{amx_fp16},
>>> @@ -983,6 +984,9 @@ Different encoding options can be specified via
>> pseudo prefixes:
>>>  instructions (x86-64 only).  Note that this differs from the
>>> @samp{rex}  prefix which generates REX prefix unconditionally.
>>>
>>> +@item
>>> +@samp{@{rex2@}} -- encode with REX2 prefix
>>
>> This isn't in line with what's said for {rex}. Iirc we were in agreement that we
>> want both to behave consistently. In which case documentation also needs to
>> describe them consistently.
>>
> 
> Changed to 
> 
> @item
> @samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
> instructions (APX_F only).  Note that this differs from the @samp{rex2}
> prefix which generates REX2 prefix unconditionally.

Except there's no "rex2" prefix according to the present implementation.

>>> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
>>> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
>>> @@ -5,3 +5,61 @@ pseudos:
>>>  	{rex} vmovaps %xmm7,%xmm2
>>>  	{rex} vmovaps %xmm17,%xmm2
>>>  	{rex} rorx $7,%eax,%ebx
>>> +	{rex2} vmovaps %xmm7,%xmm2
>>> +	{rex2} xsave (%rax)
>>> +	{rex2} xsaves (%ecx)
>>> +	{rex2} xsaves64 (%ecx)
>>> +	{rex2} xsavec (%ecx)
>>> +	{rex2} xrstors (%ecx)
>>> +	{rex2} xrstors64 (%ecx)
>>> +
>>> +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
>>> +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
>>> +	{rex2} mov    0x90909090,%al
>>> +	{rex2} movabs 0x1,%al
>>> +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
>>> +	{rex2} lodsb
>>> +	{rex2} lods   %ds:(%esi),%al
>>> +	{rex2} lodsb   (%esi)
>>> +	{rex2} movs
>>> +	{rex2} movs   (%esi), (%edi)
>>> +	{rex2} scasl
>>> +	{rex2} scas   %es:(%edi),%eax
>>> +	{rex2} scasb   (%edi)
>>> +	{rex2} stosb
>>> +	{rex2} stosb   (%edi)
>>> +	{rex2} stos   %eax,%es:(%edi)
>>> +
>>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
>>
>> This also covers map 1 row 8, doesn't it?
>>
> 
> No, I didn't find 0xf8* in opcode table.

Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or
wait, depends on what "opcode table" here means: The manual's or
opcodes/i386-opc.tbl? The latter of course doesn't have them, as
they're ...

>>> +	{rex2} jo     .+2-0x70
>>> +	{rex2} jno    .+2-0x70
>>> +	{rex2} jb     .+2-0x70
>>> +	{rex2} jae    .+2-0x70
>>> +	{rex2} je     .+2-0x70
>>> +	{rex2} jne    .+2-0x70
>>> +	{rex2} jbe    .+2-0x70
>>> +	{rex2} ja     .+2-0x70
>>> +	{rex2} js     .+2-0x70
>>> +	{rex2} jns    .+2-0x70
>>> +	{rex2} jp     .+2-0x70
>>> +	{rex2} jnp    .+2-0x70
>>> +	{rex2} jl     .+2-0x70
>>> +	{rex2} jge    .+2-0x70
>>> +	{rex2} jle    .+2-0x70
>>> +	{rex2} jg     .+2-0x70

... the disp32/disp16 forms of these branches, which are created only
during relaxation.

>>> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
>>> +     are reserved under REX2 and triggers #UD when prefixed with REX2
>>> + */  if (space == 0)
>>> +    switch (opcode >> 4)
>>
>> Both here and ...
>>
>>> +      {
>>> +      case 0x4:
>>> +      case 0x7:
>>> +      case 0xA:
>>> +      case 0xE:
>>> +	return true;
>>> +      default:
>>> +	return false;
>>> +    }
>>> +
>>> +  if (space == SPACE_0F)
>>> +    switch (opcode >> 4)
>>
>> ... here, don't you also need to mask off further bits? There are quite a few
>> opcodes which have a kind-of ModR/M byte encoded directly in the opcode,
>> for example.
>>
> 
> Thanks for reminding. Added the code like this.
> 
> /* Some opcodes encode a ModR/M byte directly in the opcode.  */
>   unsigned long long
>   base_opcode = (length > 1) ? opcode >> (8 * length - 8) : opcode;

Can length be 0? I didn't think so, and then

   base_opcode = opcode >> (8 * length - 8);

would be all you need.

Also in the comment, I think it would be slightly better to say "ModR/M-like
byte".

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
  2023-12-06  7:52       ` Jan Beulich
@ 2023-12-06 12:43         ` Cui, Lili
  2023-12-07  9:01           ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-06 12:43 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 05.12.2023 14:31, Cui, Lili wrote:
> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
> >>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >>>
> >>> +/* Build (2 bytes) rex2 prefix.
> >>> +   | D5h |
> >>> +   | m | R4 X4 B4 | W R X B |
> >>> +*/
> >>> +static void
> >>> +build_rex2_prefix (void)
> >>> +{
> >>> +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the
> >>> +same.  */
> >>
> >> How do they handle it the same? (Also I don't think this is useful as
> >> a code comment; it instead belongs in the description imo.)
> >>
> >
> > Moved the comment to the functions description.
> >
> > /* Build (2 bytes) rex2 prefix.
> >    | D5h |
> >    | m | R4 X4 B4 | W R X B |
> >
> >    Rex2 reuses i.vex as they handle i.tm.opcode_space the same way.
> > */ static void build_rex2_prefix (void)
> >
> >
> > In function "output_insn",  some handle like this.
> >
> >       if (!i.vex.length)
> >         switch (i.tm.opcode_space)
> >           {
> >           case SPACE_BASE:
> >             break;
> >           case SPACE_0F:
> >             ++j;
> >             break;
> >           case SPACE_0F38:
> >           case SPACE_0F3A:
> >             j += 2;
> >             break;
> >           default:
> >             abort ();
> >           }
> > .....
> >          if (!i.vex.length
> >               && i.tm.opcode_space != SPACE_BASE)
> >             {
> >               *p++ = 0x0f;
> >               if (i.tm.opcode_space != SPACE_0F)
> >                 *p++ = i.tm.opcode_space == SPACE_0F38
> >                        ? 0x38 : 0x3a;
> >             }
> 
> Oh, I see. That's pretty remote. How about replacing "the same way"?
> Perhaps
> "Rex2 reuses i.vex as they both encode i.tm.opcode_space in their prefixes"?
> 

Done.

> While in that form it's fine to remain in a code comment, just a general
> clarification: When I say something wants saying in the "description", it's
> (almost) always that I mean the patch description, not anything else.
> 

I see.

> >> I did comment on, in particular, the 8-bit register counts before.
> >> Afaict the comments above are nevertheless unchanged and hence still
> >> not really correct.
> >>
> >
> > Changed to :
> >
> >       if (flag_code == CODE_64BIT || base_regnum < 4)
> >         {
> >           i.types[1].bitfield.byte = 1;
> >           /* Ignore the suffix.  */
> >           i.suffix = 0;
> >           /* Convert to byte registers. 8-bit registers are special,
> >              RegRex64 and non-RegRex64 each have 8 registers.  */
> >           if (i.types[1].bitfield.word)
> >             /* 32 (or 40) 8-bit registers.  */
> >             j = 32;
> >           else if (i.types[1].bitfield.dword)
> >             /* 32 (or 40)8-bit registers + 32 16-bit registers.  */
> 
> Nit: Missing blank.
> 

Done.

> >             j = 64;
> >           else
> >             /* 32 (or 40) 8-bit registers + 32 16-bit registers
> >                + 32 32-bit registers.  */
> >             j = 96;
> >
> >           if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum <
> 4)
> >             j += 8;
> >           i.op[1].regs -= j;
> >         }
> 
> I won't insist on further changes, but imo as you're adding comments, also
> adding a comment to this last if() (which finally takes care of the 8-bit reg
> special case) would be advisable.
> 

Added.

          /* In 64-bit mode, the following byte registers cannot be accessed
             if using the Rex and Rex2 prefix: AH, BH, CH, DH */
          if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
            j += 8;

> >>> +/* Check if Egprs operands are valid for the instruction.  */
> >>> +
> >>> +static int
> >>> +check_EgprOperands (const insn_template *t) {
> >>> +  if (!t->opcode_modifier.noegpr)
> >>> +    return 0;
> >>> +
> >>> +  for (unsigned int op = 0; op < i.operands; op++)
> >>> +    {
> >>> +      if (i.types[op].bitfield.class != Reg
> >>> +	  /* Special case for (%dx) while doing input/output op */
> >>> +	  || i.input_output_operand)
> >>
> >> Didn't we agree that this extra condition isn't necessary, once the
> >> producer site correctly updates all state (which was supposed to be
> >> done in a small prereq patch)?
> >>
> >
> > I tried adding "Unspecified | BaseIndex" to the InOutPortReg, then some
> related instructions had two memory operands, so it raised a lot of invalid
> test case fail, and more ugly code needed to be added. In the end, I felt that
> this simple modification might be better.
> 
> Changing InOutPortReg of course isn't going to be easy. But that also wasn't
> what we had discussed. Instead (I thought) we agreed on ...
> 
> > @@ -13137,6 +13137,7 @@ i386_att_operand (char *operand_string)
> >           && !operand_type_check (i.types[this_operand], disp))
> >         {
> >           i.types[this_operand] = i.base_reg->reg_type;
> > +         i.types[this_operand].bitfield.class = 0;
> >           i.input_output_operand = true;
> >           return 1;
> 
> amending this code to also correctly set i.op[].regs. Perhaps it would also be
> best to actually clear i.base_reg (for there not being any memory operand).
> (FTAOD: All of this in a separate prereq patch, not here. The code creating
> inconsistent state has been a [latent] bug for a long time.)
> 

Added i.base_reg = NULL. Just discussing it here, I'll create a new patch for it.

@@ -13016,6 +13016,8 @@ i386_att_operand (char *operand_string)
          && !operand_type_check (i.types[this_operand], disp))
        {
          i.types[this_operand] = i.base_reg->reg_type;
+         i.types[this_operand].bitfield.class = 0;
+         i.base_reg = NULL;
          i.input_output_operand = true;
          return 1;
        }

> >>> --- a/gas/doc/c-i386.texi
> >>> +++ b/gas/doc/c-i386.texi
> >>> @@ -217,6 +217,7 @@ accept various extension mnemonics.  For
> >>> example, @code{avx10.1/256},  @code{avx10.1/128},  @code{user_msr},
> >>> +@code{apx_f},
> >>>  @code{amx_int8},
> >>>  @code{amx_bf16},
> >>>  @code{amx_fp16},
> >>> @@ -983,6 +984,9 @@ Different encoding options can be specified via
> >> pseudo prefixes:
> >>>  instructions (x86-64 only).  Note that this differs from the
> >>> @samp{rex}  prefix which generates REX prefix unconditionally.
> >>>
> >>> +@item
> >>> +@samp{@{rex2@}} -- encode with REX2 prefix
> >>
> >> This isn't in line with what's said for {rex}. Iirc we were in
> >> agreement that we want both to behave consistently. In which case
> >> documentation also needs to describe them consistently.
> >>
> >
> > Changed to
> >
> > @item
> > @samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
> > instructions (APX_F only).  Note that this differs from the
> > @samp{rex2} prefix which generates REX2 prefix unconditionally.
> 
> Except there's no "rex2" prefix according to the present implementation.
>
 
Remove them for current implementation.

@item
@samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
instructions (APX_F only).

> >>> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> >>> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> >>> @@ -5,3 +5,61 @@ pseudos:
> >>>  	{rex} vmovaps %xmm7,%xmm2
> >>>  	{rex} vmovaps %xmm17,%xmm2
> >>>  	{rex} rorx $7,%eax,%ebx
> >>> +	{rex2} vmovaps %xmm7,%xmm2
> >>> +	{rex2} xsave (%rax)
> >>> +	{rex2} xsaves (%ecx)
> >>> +	{rex2} xsaves64 (%ecx)
> >>> +	{rex2} xsavec (%ecx)
> >>> +	{rex2} xrstors (%ecx)
> >>> +	{rex2} xrstors64 (%ecx)
> >>> +
> >>> +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
> >>> +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> >>> +	{rex2} mov    0x90909090,%al
> >>> +	{rex2} movabs 0x1,%al
> >>> +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
> >>> +	{rex2} lodsb
> >>> +	{rex2} lods   %ds:(%esi),%al
> >>> +	{rex2} lodsb   (%esi)
> >>> +	{rex2} movs
> >>> +	{rex2} movs   (%esi), (%edi)
> >>> +	{rex2} scasl
> >>> +	{rex2} scas   %es:(%edi),%eax
> >>> +	{rex2} scasb   (%edi)
> >>> +	{rex2} stosb
> >>> +	{rex2} stosb   (%edi)
> >>> +	{rex2} stos   %eax,%es:(%edi)
> >>> +
> >>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> >>
> >> This also covers map 1 row 8, doesn't it?
> >>
> >
> > No, I didn't find 0xf8* in opcode table.
> 
> Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or wait,
> depends on what "opcode table" here means: The manual's or opcodes/i386-
> opc.tbl? The latter of course doesn't have them, as they're ...
> 
> >>> +	{rex2} jo     .+2-0x70
> >>> +	{rex2} jno    .+2-0x70
> >>> +	{rex2} jb     .+2-0x70
> >>> +	{rex2} jae    .+2-0x70
> >>> +	{rex2} je     .+2-0x70
> >>> +	{rex2} jne    .+2-0x70
> >>> +	{rex2} jbe    .+2-0x70
> >>> +	{rex2} ja     .+2-0x70
> >>> +	{rex2} js     .+2-0x70
> >>> +	{rex2} jns    .+2-0x70
> >>> +	{rex2} jp     .+2-0x70
> >>> +	{rex2} jnp    .+2-0x70
> >>> +	{rex2} jl     .+2-0x70
> >>> +	{rex2} jge    .+2-0x70
> >>> +	{rex2} jle    .+2-0x70
> >>> +	{rex2} jg     .+2-0x70
> 
> ... the disp32/disp16 forms of these branches, which are created only during
> relaxation.
>

Oh,  I see,  I found them in sdm and added testcase for them.

        #All opcodes in the row 0x8* (map1) prefixed REX2 are illegal.
        {rex2} jo     .+6+0x90909090
        {rex2} jno    .+6+0x90909090
        {rex2} jb     .+6+0x90909090
        {rex2} jae    .+6+0x90909090
        {rex2} je     .+6+0x90909090
        {rex2} jne    .+6+0x90909090
        {rex2} jbe    .+6+0x90909090
        {rex2} ja     .+6+0x90909090
        {rex2} js     .+6+0x90909090
        {rex2} jns    .+6+0x90909090
        {rex2} jp     .+6+0x90909090
        {rex2} jnp    .+6+0x90909090
        {rex2} jl     .+6+0x90909090
        {rex2} jge    .+6+0x90909090
        {rex2} jle    .+6+0x90909090
        {rex2} jg     .+6+0x90909090
 
> >>> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> >>> +     are reserved under REX2 and triggers #UD when prefixed with
> >>> + REX2 */  if (space == 0)
> >>> +    switch (opcode >> 4)
> >>
> >> Both here and ...
> >>
> >>> +      {
> >>> +      case 0x4:
> >>> +      case 0x7:
> >>> +      case 0xA:
> >>> +      case 0xE:
> >>> +	return true;
> >>> +      default:
> >>> +	return false;
> >>> +    }
> >>> +
> >>> +  if (space == SPACE_0F)
> >>> +    switch (opcode >> 4)
> >>
> >> ... here, don't you also need to mask off further bits? There are
> >> quite a few opcodes which have a kind-of ModR/M byte encoded directly
> >> in the opcode, for example.
> >>
> >
> > Thanks for reminding. Added the code like this.
> >
> > /* Some opcodes encode a ModR/M byte directly in the opcode.  */
> >   unsigned long long
> >   base_opcode = (length > 1) ? opcode >> (8 * length - 8) : opcode;
> 
> Can length be 0? I didn't think so, and then
> 
>    base_opcode = opcode >> (8 * length - 8);
> 
> would be all you need.
>

yes good way.

> Also in the comment, I think it would be slightly better to say "ModR/M-like
> byte".
> 

Done.

Thanks,
Lili.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
  2023-12-06 12:43         ` Cui, Lili
@ 2023-12-07  9:01           ` Jan Beulich
  2023-12-08  3:10             ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-07  9:01 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 06.12.2023 13:43, Cui, Lili wrote:
>> On 05.12.2023 14:31, Cui, Lili wrote:
>>>> On 24.11.2023 08:02, Cui, Lili wrote:
>>>>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
>>>>
>>>> This also covers map 1 row 8, doesn't it?
>>>>
>>>
>>> No, I didn't find 0xf8* in opcode table.
>>
>> Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or wait,
>> depends on what "opcode table" here means: The manual's or opcodes/i386-
>> opc.tbl? The latter of course doesn't have them, as they're ...
>>
>>>>> +	{rex2} jo     .+2-0x70
>>>>> +	{rex2} jno    .+2-0x70
>>>>> +	{rex2} jb     .+2-0x70
>>>>> +	{rex2} jae    .+2-0x70
>>>>> +	{rex2} je     .+2-0x70
>>>>> +	{rex2} jne    .+2-0x70
>>>>> +	{rex2} jbe    .+2-0x70
>>>>> +	{rex2} ja     .+2-0x70
>>>>> +	{rex2} js     .+2-0x70
>>>>> +	{rex2} jns    .+2-0x70
>>>>> +	{rex2} jp     .+2-0x70
>>>>> +	{rex2} jnp    .+2-0x70
>>>>> +	{rex2} jl     .+2-0x70
>>>>> +	{rex2} jge    .+2-0x70
>>>>> +	{rex2} jle    .+2-0x70
>>>>> +	{rex2} jg     .+2-0x70
>>
>> ... the disp32/disp16 forms of these branches, which are created only during
>> relaxation.
>>
> 
> Oh,  I see,  I found them in sdm and added testcase for them.
> 
>         #All opcodes in the row 0x8* (map1) prefixed REX2 are illegal.
>         {rex2} jo     .+6+0x90909090
>         {rex2} jno    .+6+0x90909090
>         {rex2} jb     .+6+0x90909090
>         {rex2} jae    .+6+0x90909090
>         {rex2} je     .+6+0x90909090
>         {rex2} jne    .+6+0x90909090
>         {rex2} jbe    .+6+0x90909090
>         {rex2} ja     .+6+0x90909090
>         {rex2} js     .+6+0x90909090
>         {rex2} jns    .+6+0x90909090
>         {rex2} jp     .+6+0x90909090
>         {rex2} jnp    .+6+0x90909090
>         {rex2} jl     .+6+0x90909090
>         {rex2} jge    .+6+0x90909090
>         {rex2} jle    .+6+0x90909090
>         {rex2} jg     .+6+0x90909090

I don't mind the addition, but I don't think this actually tests anything that
the other block didn't already test. Hence why I suggested to merely update
the comment there.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
@ 2023-12-07 12:38   ` Jan Beulich
  2023-12-08 15:21     ` Cui, Lili
  2023-12-07 13:34   ` Jan Beulich
  2023-12-11 11:50   ` Jan Beulich
  2 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-07 12:38 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -409,6 +409,9 @@ struct _i386_insn
>      /* Compressed disp8*N attribute.  */
>      unsigned int memshift;
>  
> +    /* No CSPAZO flags update.*/
> +    bool has_nf;

As before I don't see the point in adding this field when it's not used
in the change. Note that this is unrelated to the introduction of the NF
attribute right here, which has a reason.

> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>  
>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> -  {
> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> -	   || maybe_cpu (t, CpuFMA))
> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> +    {
> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>  	{
>  	  if (need_evex_encoding ())

There are several issues here:
- Why did you need to change (to the worse) the original code?
- Why did you not model the addition after that original code?
- How come APX_F (CpuAVX512*) constructs appear here, when no AVX512 insn
  can be VEX-encoded?
- If these new macros are really needed for whatever reason, they shouldn't
  be added to opcodes/i386-opc.h when they're useful only in the assembler.
- Style requires a blank before the opening parenthesis in function
  invocations (which also covers function-like macro invocations).

I think I asked before: How is it that you get away without altering
cpu_flags_match(), containing related and quite similar logic?

> @@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || t->opcode_modifier.evex;
>  }
>  
> +static INLINE bool
> +is_apx_evex_encoding (void)
> +{
> +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> +    || (i.vex.register_specifier
> +	&& i.vex.register_specifier->reg_flags & RegRex2);
> +}

If you want this to be a function despite being used just once, you'll
need to add a comment mentioning the constraint when calling it (or
else the use of i.rex2 in particular is confusing). I'm sure I commented
on this before, and I thought such a comment had already appeared.

> @@ -5655,17 +5693,17 @@ md_assemble (char *line)
>       instruction already has a prefix, we need to convert old
>       registers to new ones.  */
>  
> -  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> -       && (i.op[0].regs->reg_flags & RegRex64) != 0)
> -      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> -	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> -      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> -	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && (i.rex != 0 || i.rex2 != 0)))
> +  if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> +	&& (i.op[0].regs->reg_flags & RegRex64) != 0)
> +       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> +	   && (i.op[1].regs->reg_flags & RegRex64) != 0)
> +       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> +	    || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> +	   && (i.rex != 0 || i.rex2 != 0))))

I'm having trouble spotting the change here: There's an outer pair of
parentheses being added, but that's for no reason unless there's another
change well hidden. Please clarify.

>      {
>        int x;
>  
> -      if (!i.rex2)
> +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
>  	i.rex |= REX_OPCODE;

Why the change to is_apx_rex2_encoding()? If that's wanted / needed
here, shouldn't that be put in place by the earlier patch?

> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry *r)
>        if (!cpu_arch_flags.bitfield.cpuapx_f
>  	  || flag_code != CODE_64BIT)
>  	return false;
> +
> +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
> +	 For the later install_template function.  */
> +      if (current_templates->start->opcode_modifier.vex
> +	  && current_templates->start->opcode_modifier.evex)
> +	i.vec_encoding = vex_encoding_evex;

I'm afraid I don't understand the 2nd sentence of the comment. This may
be related to my question regarding cpu_flags_match() further up.

The first sentence isn't quite correct either - you don't mark any
template here (and you can't, because we don't even know yet which
template we're going to use).

Finally - do you really need the .evex check here? (I won't exclude
that this yields a better diagnostic in certain cases, but this wants
clarifying if so.)

> --- a/gas/testsuite/gas/i386/x86-64.exp
> +++ b/gas/testsuite/gas/i386/x86-64.exp
> @@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
>  run_dump_test "x86-64-movbe"
>  run_dump_test "x86-64-movbe-intel"
>  run_dump_test "x86-64-movbe-suffix"
> -run_list_test "x86-64-inval-movbe" "-al"
> +run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -al"

I can see why you add the -march=, as we've been through this before.
But why the -I ?

> @@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
>  <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>                        load:Load:0, store:Store:0, +
>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> +                      rex:REX:x64, rex2:REX2:APX_F, nooptimize:NoOptimize:0>

This change wants to go into the earlier patch?

> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
>  
>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }

Seeing these: Are there any Map4 encodings which aren't EVex128? If not (and
if you're also not hiddenly aware of some appearing in the near future),
please consider making EVexMap4 include this right away. Even if in the
longer run other encodings appear, it'll then be easy to simply replace all
the EVexMap4 uses in a purely mechanical way. Until then shorter template
lines are preferable.

> @@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {}
>  xsetbv, 0xf01d1, Xsave, NoSuf, {}
>  
>  // xsaveopt
> -
>  xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
>  xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }

Iirc the earlier patch added that blank line. Why would you do such back
and forth?

> @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
>  
>  // BMI2 instructions.
>  
> -bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
> -sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> +bzhi, 0xf5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }

Hmm, I had specifically suggested a pre-processor macro to use in place of the
open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here and
below)?

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
  2023-12-07 12:38   ` Jan Beulich
@ 2023-12-07 13:34   ` Jan Beulich
  2023-12-11  6:16     ` Cui, Lili
  2023-12-11 11:50   ` Jan Beulich
  2 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-07 13:34 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> --- /dev/null
> +++ b/opcodes/i386-dis-evex-x86-64.h
> @@ -0,0 +1,60 @@
> +  /* X86_64_EVEX_0F90 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F90) },
> +  },
> +  /* X86_64_EVEX_0F91 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F91) },
> +  },
> +  /* X86_64_EVEX_0F92 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F92) },
> +  },
> +  /* X86_64_EVEX_0F93 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F93) },
> +  },
> +  /* X86_64_EVEX_0F3849 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
> +  },
> +  /* X86_64_EVEX_0F384B */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },
> +  },
> +  /* X86_64_EVEX_0F38F2 */
> +  {
> +    { Bad_Opcode },
> +    { EVEX_LEN_TABLE (EVEX_LEN_0F38F2) },
> +  },
> +  /* X86_64_EVEX_0F38F3 */
> +  {
> +    { Bad_Opcode },
> +    { EVEX_LEN_TABLE (EVEX_LEN_0F38F3) },
> +  },
> +  /* X86_64_EVEX_0F38F5 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F38F5) },
> +  },
> +  /* X86_64_EVEX_0F38F6 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F38F6) },
> +  },
> +  /* X86_64_EVEX_0F38F7 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F38F7) },
> +  },
> +  /* X86_64_EVEX_0F3AF0 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F3AF0) },
> +  },

I'm puzzled here: There are two uses of EVEX_LEN_TABLE() and several more
of VEX_LEN_TABLE(). Yet the underlying pattern of those insns is all the
same. I may guess that this is related to PREFIX_OPCODE use in the
respective VEX table entries, yet isn't it then cheaper overall to have
VEX encodings also go through prefix_table[], and then sharing those
entries with EVEX encodings?

What's further puzzling: When setting evex_from_vex you already check
L'L == 0, so there's no reason to go through evex_len_table[] /
vex_len_table[].

> @@ -1268,7 +1296,21 @@ enum
>    X86_64_VEX_0F38ED,
>    X86_64_VEX_0F38EE,
>    X86_64_VEX_0F38EF,
> +
>    X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
> +
> +  X86_64_EVEX_0F90,
> +  X86_64_EVEX_0F91,
> +  X86_64_EVEX_0F92,
> +  X86_64_EVEX_0F93,
> +  X86_64_EVEX_0F3849,
> +  X86_64_EVEX_0F384B,

For these two, won't the respective VEX enumerators and table entries
do?

> @@ -4524,10 +4568,11 @@ static const struct dis386 x86_64_table[][2] = {
>  
>    /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
>    {
> -    { Bad_Opcode },
> -    { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
> +      { Bad_Opcode },
> +      { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
>    },

Actively corrupting indentation here?

> @@ -8733,6 +8778,17 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>        dp = &prefix_table[dp->op[1].bytemode][vindex];
>        break;
>  
> +    case USE_X86_64_EVEX_FROM_VEX_TABLE:
> +      ins->evex_type = evex_from_vex;
> +      /* EVEX from evex instrucions require that EVEX.z, EVEX.L’L, EVEX.b and

"EVEX from VEX ..."?

> +	 the lower 2 bits of EVEX.aaa must be 0.  */
> +      if ((ins->vex.mask_register_specifier & 0x3) != 0
> +	  || ins->vex.ll != 0
> +	  || ins->vex.zeroing != 0
> +	  || ins->vex.b)
> +	return &bad_opcode;
> +
> +      /* Fall through.  */
>      case USE_X86_64_TABLE:

Instead of falling through here to go through x86_64_table[] (where in all
cases the non-64-bit slot is "bad"), can't you avoid that step and go to
the next step (uniformly the LEN one) right away, saving all those new
table entries (along the lines of what you do below when processing into
evex_from_legacy)?

> @@ -8978,9 +9034,13 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>        if (!fetch_code (ins->info, ins->codep + 4))
>  	return &err_opcode;
>        /* The first byte after 0x62.  */
> +      if (*ins->codep & 0x8)
> +	ins->rex2 |= REX_B;
> +      if (!(*ins->codep & 0x10))
> +	ins->rex2 |= REX_R;
> +
>        ins->rex = ~(*ins->codep >> 5) & 0x7;
> -      ins->vex.r = *ins->codep & 0x10;
> -      switch ((*ins->codep & 0xf))
> +      switch ((*ins->codep & 0x7))

Please can you take the opportunity and drop the excess parentheses?

> @@ -9041,12 +9106,24 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  
>        if (ins->address_mode != mode_64bit)
>  	{
> +	  if (ins->evex_type != evex_default
> +	      || (ins->rex2 & (REX_B | REX_X)))
> +	    return &bad_opcode;

What's special about X and B?

> @@ -9460,6 +9537,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        dp = get_valid_dis386 (dp, &ins);
>        if (dp == &err_opcode)
>  	goto fetch_error_out;
> +
> +      /* For APX instructions promoted from legacy maps 0/1, prefix
> +	 0x66 is interpreted as the operand size override.  */
> +      if (ins.evex_type == evex_from_legacy
> +	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
> +	sizeflag ^= DFLAG;

I think the comment wants to say "embedded prefix", as "prefix 0x66" is
simply invalid to use with EVEX.

> @@ -9639,6 +9723,24 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        if (ins.last_repnz_prefix >= 0)
>  	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
>        break;
> +
> +    case PREFIX_NP_OR_DATA:
> +      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)

~DATA_PREFIX_OPCODE == 0x99, which likely isn't what you mean here? Do
you perhaps mean e.g. "> DATA_PREFIX_OPCODE"? (Using the opcodes in
vex.prefix is questionable anyway, but that's a pre-existing oddity.)

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 5/9] Add tests for APX GPR32 with extend evex prefix
  2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
@ 2023-12-07 14:05   ` Jan Beulich
  2023-12-11  6:16     ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-07 14:05 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
> @@ -12,4 +12,192 @@
>  .*:16: Error: unsupported extended GPR for addressing for `xsaveopt64'
>  .*:17: Error: unsupported extended GPR for addressing for `xsavec'
>  .*:18: Error: unsupported extended GPR for addressing for `xsavec64'
> +.*:20: Error: unsupported extended GPR for addressing for `blendpd'
> +.*:21: Error: unsupported extended GPR for addressing for `blendps'
> +.*:22: Error: unsupported extended GPR for addressing for `blendvpd'
> +.*:23: Error: unsupported extended GPR for addressing for `blendvpd'
> +.*:24: Error: unsupported extended GPR for addressing for `blendvps'
> +.*:25: Error: unsupported extended GPR for addressing for `blendvps'
> +.*:26: Error: unsupported extended GPR for addressing for `dppd'
> +.*:27: Error: unsupported extended GPR for addressing for `dpps'

Seeing this diagnostic in action, I'm afraid I think this would better
be closer to the diagnostics i386_index_check() issues. E.g.
"extended GPR cannot be used as base/index for ...".

> --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> @@ -1,4 +1,4 @@
> -# Check Illegal 64bit APX_F instructions
> +# Check illegal 64bit APX_F instructions

???

> +#VEX without evex
> +	vaesimc (%r27), %xmm3
> +	vaeskeygenassist $7,(%r27),%xmm3
> +	vblendpd $7,(%r27),%xmm6,%xmm2
> +	vblendpd $7,(%r27),%ymm6,%ymm2
> +	vblendps $7,(%r27),%xmm6,%xmm2
> +	vblendps $7,(%r27),%ymm6,%ymm2
> +	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
> +	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
> +	vblendvps %xmm4,(%r27),%xmm2,%xmm7
> +	vblendvps %ymm4,(%r27),%ymm2,%ymm7
> +	vdppd $7,(%r27),%xmm6,%xmm2
> +	vdpps $7,(%r27),%xmm6,%xmm2
> +	vdpps $7,(%r27),%ymm6,%ymm2
> +	vhaddpd (%r27),%xmm6,%xmm5
> +	vhaddpd (%r27),%ymm6,%ymm5
> +	vhsubps (%r27),%xmm6,%xmm5
> +	vhsubps (%r27),%ymm6,%ymm5
> +	vlddqu (%r27),%xmm4
> +	vlddqu (%r27),%ymm4
> +	vldmxcsr (%r27)
> +	vmaskmovpd %xmm4,%xmm6,(%r27)
> +	vmaskmovpd %ymm4,%ymm6,(%r27)
> +	vmaskmovpd (%r27),%xmm4,%xmm6
> +	vmaskmovpd (%r27),%ymm4,%ymm6
> +	vmaskmovps %xmm4,%xmm6,(%r27)
> +	vmaskmovps %ymm4,%ymm6,(%r27)
> +	vmaskmovps (%r27),%xmm4,%xmm6
> +	vmaskmovps (%r27),%ymm4,%ymm6
> +	vmovmskpd %xmm4,%r27d
> +	vmovmskpd %xmm8,%r27d
> +	vmovmskps %xmm4,%r27d
> +	vmovmskps %ymm8,%r27d
> +	vpblendd $7,(%r27),%xmm6,%xmm2
> +	vpblendd $7,(%r27),%ymm6,%ymm2
> +	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
> +	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
> +	vpblendw $7,(%r27),%xmm6,%xmm2
> +	vpblendw $7,(%r27),%ymm6,%ymm2
> +	vpcmpeqb (%r26),%ymm6,%ymm2
> +	vpcmpeqd (%r26),%ymm6,%ymm2
> +	vpcmpeqq (%r16),%ymm6,%ymm2
> +	vpcmpeqw (%r16),%ymm6,%ymm2
> +	vpcmpestri $7,(%r27),%xmm6
> +	vpcmpestrm $7,(%r27),%xmm6
> +	vpcmpgtb (%r26),%ymm6,%ymm2
> +	vpcmpgtd (%r26),%ymm6,%ymm2
> +	vpcmpgtq (%r16),%ymm6,%ymm2
> +	vpcmpgtw (%r16),%ymm6,%ymm2
> +	vpcmpistri $100,(%r25),%xmm6
> +	vpcmpistrm $100,(%r25),%xmm6
> +	vperm2f128 $7,(%r27),%ymm6,%ymm2
> +	vperm2i128 $7,(%r27),%ymm6,%ymm2
> +	vphaddd (%r27),%xmm6,%xmm7
> +	vphaddd (%r27),%ymm6,%ymm7
> +	vphaddsw (%r27),%xmm6,%xmm7
> +	vphaddsw (%r27),%ymm6,%ymm7
> +	vphaddw (%r27),%xmm6,%xmm7
> +	vphaddw (%r27),%ymm6,%ymm7
> +	vphminposuw (%r27),%xmm6
> +	vphsubd (%r27),%xmm6,%xmm7
> +	vphsubd (%r27),%ymm6,%ymm7
> +	vphsubsw (%r27),%xmm6,%xmm7
> +	vphsubsw (%r27),%ymm6,%ymm7
> +	vphsubw (%r27),%xmm6,%xmm7
> +	vphsubw (%r27),%ymm6,%ymm7
> +	vpmaskmovd %xmm4,%xmm6,(%r27)
> +	vpmaskmovd %ymm4,%ymm6,(%r27)
> +	vpmaskmovd (%r27),%xmm4,%xmm6
> +	vpmaskmovd (%r27),%ymm4,%ymm6
> +	vpmaskmovq %xmm4,%xmm6,(%r27)
> +	vpmaskmovq %ymm4,%ymm6,(%r27)
> +	vpmaskmovq (%r27),%xmm4,%xmm6
> +	vpmaskmovq (%r27),%ymm4,%ymm6
> +	vpmovmskb %xmm4,%r27
> +	vpmovmskb %ymm4,%r27d
> +	vpsignb (%r27),%xmm6,%xmm7
> +	vpsignb (%r27),%xmm6,%xmm7
> +	vpsignd (%r27),%xmm6,%xmm7
> +	vpsignd (%r27),%xmm6,%xmm7
> +	vpsignw (%r27),%xmm6,%xmm7
> +	vpsignw (%r27),%xmm6,%xmm7
> +	vptest (%r27),%ymm6
> +	vrcpps (%r27),%xmm6
> +	vrcpps (%r27),%ymm6
> +	vrcpss (%r27),%xmm6,%xmm6
> +	vroundpd $1,(%r24),%xmm6
> +	vroundps $2,(%r24),%xmm6
> +	vroundsd $3,(%r24),%xmm6,%xmm3
> +	vroundss $4,(%r24),%xmm6,%xmm3

There's still the pending question of whether these really need to be
treated as invalid (rather than being converted to VRNDSCALE*). Also
(to a lesser degree) for {LD,ST}MXCSR.

> +	vrsqrtps (%r27),%xmm6
> +	vrsqrtps (%r27),%ymm6
> +	vrsqrtss (%r27),%xmm6,%xmm6
> +	vstmxcsr (%r27)
> +	vtestpd (%r27),%xmm6
> +	vtestpd (%r27),%ymm6
> +	vtestps (%r27),%xmm6
> +	vtestps (%r27),%ymm6
> +	vtestps (%r27),%ymm6
> +	vptest (%r27),%xmm6

This one wants moving up, now that sorting was mostly done.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> @@ -0,0 +1,28 @@
> +# Check Illegal prefix for 64bit EVEX-promoted instructions
> +
> +        .allow_index_reg
> +        .text
> +_start:
> +	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
> +	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
> +	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
> +	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
> +	#VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10] == 0
> +	#(illegal value).
> +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
> +	.byte 0xff

For the purpose of this test (whatever P[10] again is) you don't need a
32-bit displacement, do you? Shorter is (almost always) better in such
tests.

> +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.mm == b01 (illegal value).
> +	.insn EVEX.L0.66.M13.W0 0x60, %di, %ax
> +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.aa(P[17:16]) == b01 (illegal value).

There's aaa, but no aa afaik.

> +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
> +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == 0b11 (illegal value).

How's z relevant when the value is just a 2-bit one? And then z
should likely have a separate test (also for the from-VEX case below)?

> +	.insn EVEX.L0.66.M12.W0 0x60, %di, {rd-sae}, %ax
> +	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[17:16](EVEX.aa) == 1 (illegal value).
> +	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax, %ebx, %ecx{%k1}
> +	.byte 0xff, 0xff, 0xff
> +	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[22:21](EVEX.L’L) == 1 (illegal value).
> +	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax, {rd-sae}, %ebx, %ecx
> +	.byte 0xff, 0xff, 0xff

If you arranged for a ModR/M byte of 0xc9 (among other possibilities) in
both of these cases, you could avoid the .byte lines altogether afaict.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
  2023-12-07  9:01           ` Jan Beulich
@ 2023-12-08  3:10             ` Cui, Lili
  0 siblings, 0 replies; 69+ messages in thread
From: Cui, Lili @ 2023-12-08  3:10 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 06.12.2023 13:43, Cui, Lili wrote:
> >> On 05.12.2023 14:31, Cui, Lili wrote:
> >>>> On 24.11.2023 08:02, Cui, Lili wrote:
> >>>>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> >>>>
> >>>> This also covers map 1 row 8, doesn't it?
> >>>>
> >>>
> >>> No, I didn't find 0xf8* in opcode table.
> >>
> >> Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or
> >> wait, depends on what "opcode table" here means: The manual's or
> >> opcodes/i386- opc.tbl? The latter of course doesn't have them, as they're ...
> >>
> >>>>> +	{rex2} jo     .+2-0x70
> >>>>> +	{rex2} jno    .+2-0x70
> >>>>> +	{rex2} jb     .+2-0x70
> >>>>> +	{rex2} jae    .+2-0x70
> >>>>> +	{rex2} je     .+2-0x70
> >>>>> +	{rex2} jne    .+2-0x70
> >>>>> +	{rex2} jbe    .+2-0x70
> >>>>> +	{rex2} ja     .+2-0x70
> >>>>> +	{rex2} js     .+2-0x70
> >>>>> +	{rex2} jns    .+2-0x70
> >>>>> +	{rex2} jp     .+2-0x70
> >>>>> +	{rex2} jnp    .+2-0x70
> >>>>> +	{rex2} jl     .+2-0x70
> >>>>> +	{rex2} jge    .+2-0x70
> >>>>> +	{rex2} jle    .+2-0x70
> >>>>> +	{rex2} jg     .+2-0x70
> >>
> >> ... the disp32/disp16 forms of these branches, which are created only
> >> during relaxation.
> >>
> >
> > Oh,  I see,  I found them in sdm and added testcase for them.
> >
> >         #All opcodes in the row 0x8* (map1) prefixed REX2 are illegal.
> >         {rex2} jo     .+6+0x90909090
> >         {rex2} jno    .+6+0x90909090
> >         {rex2} jb     .+6+0x90909090
> >         {rex2} jae    .+6+0x90909090
> >         {rex2} je     .+6+0x90909090
> >         {rex2} jne    .+6+0x90909090
> >         {rex2} jbe    .+6+0x90909090
> >         {rex2} ja     .+6+0x90909090
> >         {rex2} js     .+6+0x90909090
> >         {rex2} jns    .+6+0x90909090
> >         {rex2} jp     .+6+0x90909090
> >         {rex2} jnp    .+6+0x90909090
> >         {rex2} jl     .+6+0x90909090
> >         {rex2} jge    .+6+0x90909090
> >         {rex2} jle    .+6+0x90909090
> >         {rex2} jg     .+6+0x90909090
> 
> I don't mind the addition, but I don't think this actually tests anything that the
> other block didn't already test. Hence why I suggested to merely update the
> comment there.
> 

Moved new test cases together with 0x7* (map0).

        #All opcodes in the row 0x7* (map0) and 0x8* (map1) prefixed REX2 are illegal.
        {rex2} jo     .+2-0x70
        {rex2} jno    .+2-0x70
        {rex2} jb     .+2-0x70
        {rex2} jae    .+2-0x70
        {rex2} je     .+2-0x70
        {rex2} jne    .+2-0x70
        {rex2} jbe    .+2-0x70
        {rex2} ja     .+2-0x70
        {rex2} js     .+2-0x70
        {rex2} jns    .+2-0x70
        {rex2} jp     .+2-0x70
        {rex2} jnp    .+2-0x70
        {rex2} jl     .+2-0x70
        {rex2} jge    .+2-0x70
        {rex2} jle    .+2-0x70
        {rex2} jg     .+2-0x70
        {rex2} jo     .+6+0x90909090
        {rex2} jno    .+6+0x90909090
        {rex2} jb     .+6+0x90909090
        {rex2} jae    .+6+0x90909090
        {rex2} je     .+6+0x90909090
        {rex2} jne    .+6+0x90909090
        {rex2} jbe    .+6+0x90909090
        {rex2} ja     .+6+0x90909090
        {rex2} js     .+6+0x90909090
        {rex2} jns    .+6+0x90909090
        {rex2} jp     .+6+0x90909090
        {rex2} jnp    .+6+0x90909090
        {rex2} jl     .+6+0x90909090
        {rex2} jge    .+6+0x90909090
        {rex2} jle    .+6+0x90909090
        {rex2} jg     .+6+0x90909090

Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
@ 2023-12-08 14:12   ` Jan Beulich
  2023-12-11 13:36     ` Cui, Lili
  2024-03-22 10:02     ` Jan Beulich
  2023-12-08 14:27   ` Jan Beulich
  1 sibling, 2 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-08 14:12 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, konglin1, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> @@ -8870,25 +8890,33 @@ build_modrm_byte (void)
>  				     || i.vec_encoding == vex_encoding_evex));
>      }
>  
> -  for (v = source + 1; v < dest; ++v)
> -    if (v != reg_slot)
> -      break;
> -  if (v >= dest)
> -    v = ~0;
> -  if (i.tm.extension_opcode != None)
> +  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
>      {
> -      if (dest != source)
> -	v = dest;
> -      dest = ~0;
> +      v = dest;
> +      dest-- ;

Nit: Stray blank.

>      }
> -  gas_assert (source < dest);

Starting from this line, do you really need to move that into the "else"
branch? It looks to me as it it could stay here. (Maybe I'm wrong with
the assertion itself, but ...

> -  if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
> -      && source != op)

... this entire if() pretty surely can stay as is, as there are no
templates with both DstVVVV and SwapSources afaict. (Thing is - as
before - that it isn't easy to see that what is happening here is
really just re-indentation. Iirc in an earlier version there actually
were hidden changes.) If you want this moved as an optimization,
please do so in a separate patch.

> --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
> @@ -27,4 +27,6 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:[ 	]+c8 ff ff ff[ 	]+enter  \$0xffff,\$0xff
>  [ 	]*[a-f0-9]+:[ 	]+67 62 f2 7c 18 f5[ 	]+addr32 \(bad\)
>  [ 	]*[a-f0-9]+:[ 	]+0b ff[ 	]+or     %edi,%edi
> +[ 	]*[a-f0-9]+:[ 	]+62 f4 fc 08 ff[ 	]+\(bad\)
> +[ 	]*[a-f0-9]+:[ 	]+d8[ 	]+.byte 0xd8
>  #pass
> --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> @@ -26,3 +26,5 @@ _start:
>  	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[20](EVEX.b) == 1 (illegal value).
>  	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax ,(%ebx){1to8}, %ecx
>  	.byte 0xff
> +	#{evex} inc %rax %rbx EVEX.vvvv' != 1111 && EVEX.ND = 0.
> +	.insn EVEX.L0.NP.M4.W1 0xff, %rax, %rbx

I don't think this does what you want. In the .d file the 4 bits are
all set. I think you mean something like

	.insn EVEX.L0.NP.M4.W1 0xff/0, %rcx, %rbx

(i.e. ModR/M.reg specified as opcode extension _and_ the first operand
not the accumulator). The reason disassembly fails for what you've used
looks to be ModR/M.reg == 0b011 (resulting from the use of %rbx).

(Also, nit: What's EVEX.vvvv' ? I.e. what's the ' there about?)

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> @@ -0,0 +1,155 @@
> +# Check 64bit APX NDD instructions with evex prefix encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +	adc    $0x1234,%ax,%r30w
> +	adc    %r15b,%r17b,%r18b
> +	adc    %r15d,(%r8),%r18d
> +	adc    (%r15,%rax,1),%r16b,%r8b
> +	adc    (%r15,%rax,1),%r16w,%r8w
> +	adcl   $0x11,(%r19,%rax,4),%r20d
> +	adcx   %r15d,%r8d,%r18d
> +	adcx   (%r15,%r31,1),%r8
> +	adcx   (%r15,%r31,1),%r8d,%r18d
> +	add    $0x1234,%ax,%r30w
> +	add    $0x12344433,%r15,%r16
> +	add    $0x34,%r13b,%r17b
> +	add    $0xfffffffff4332211,%rax,%r8
> +	add    %r31,%r8,%r16
> +	add    %r31,(%r8),%r16
> +	add    %r31,(%r8,%r16,8),%r16
> +	add    %r31b,%r8b,%r16b
> +	add    %r31d,%r8d,%r16d
> +	add    %r31w,%r8w,%r16w
> +	add    (%r31),%r8,%r16
> +	add    0x9090(%r31,%r16,1),%r8,%r16
> +	addb    %r31b,%r8b,%r16b
> +	addl    %r31d,%r8d,%r16d
> +	addl   $0x11,(%r19,%rax,4),%r20d
> +	addq    %r31,%r8,%r16
> +	addq   $0x12344433,(%r15,%rcx,4),%r16
> +	addw    %r31w,%r8w,%r16w
> +	adox   %r15d,%r8d,%r18d

Nit: Inconsistent blank padding.

> +	{load}  add    %r31,%r8,%r16
> +	{store} add    %r31,%r8,%r16
> +	adox   (%r15,%r31,1),%r8
> +	adox   (%r15,%r31,1),%r8d,%r18d
> +	and    $0x1234,%ax,%r30w
> +	and    %r15b,%r17b,%r18b
> +	and    %r15d,(%r8),%r18d
> +	and    (%r15,%rax,1),%r16b,%r8b
> +	and    (%r15,%rax,1),%r16w,%r8w
> +	andl   $0x11,(%r19,%rax,4),%r20d
> +	cmova  0x90909090(%eax),%edx,%r8d
> +	cmovae 0x90909090(%eax),%edx,%r8d
> +	cmovb  0x90909090(%eax),%edx,%r8d
> +	cmovbe 0x90909090(%eax),%edx,%r8d
> +	cmove  0x90909090(%eax),%edx,%r8d
> +	cmovg  0x90909090(%eax),%edx,%r8d
> +	cmovge 0x90909090(%eax),%edx,%r8d
> +	cmovl  0x90909090(%eax),%edx,%r8d
> +	cmovle 0x90909090(%eax),%edx,%r8d
> +	cmovne 0x90909090(%eax),%edx,%r8d
> +	cmovno 0x90909090(%eax),%edx,%r8d
> +	cmovnp 0x90909090(%eax),%edx,%r8d
> +	cmovns 0x90909090(%eax),%edx,%r8d
> +	cmovo  0x90909090(%eax),%edx,%r8d
> +	cmovp  0x90909090(%eax),%edx,%r8d
> +	cmovs  0x90909090(%eax),%edx,%r8d
> +	dec    %rax,%r17
> +	decb   (%r31,%r12,1),%r8b
> +	imul   0x909(%rax,%r31,8),%rdx,%r25
> +	imul   0x90909(%eax),%edx,%r8d
> +	inc    %r31,%r16
> +	inc    %r31,%r8
> +	inc    %rax,%rbx
> +	neg    %rax,%r17
> +	negb   (%r31,%r12,1),%r8b
> +	not    %rax,%r17
> +	notb   (%r31,%r12,1),%r8b
> +	or     $0x1234,%ax,%r30w
> +	or     %r15b,%r17b,%r18b
> +	or     %r15d,(%r8),%r18d
> +	or     (%r15,%rax,1),%r16b,%r8b
> +	or     (%r15,%rax,1),%r16w,%r8w
> +	orl    $0x11,(%r19,%rax,4),%r20d
> +	rcl    $0x2,%r12b,%r31b
> +	rcl    %cl,%r16b,%r8b
> +	rclb   $0x1, (%rax),%r31b
> +	rcll   $0x2,(%rax),%r31d
> +	rclw   $0x1, (%rax),%r31w

Nit: Would be nice if there consistently were or were not blanks after
the commas.

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -139,9 +139,13 @@
>  #define Vsz256 Vsz=VSZ256
>  #define Vsz512 Vsz=VSZ512
>  
> +#define DstVVVV VexVVVV=VexVVVV_DST
> +
>  // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
>  // the bit to mark commutative VEX encodings where swapping the source
>  // operands may allow to switch from 3-byte to 2-byte VEX encoding.
> +// And re-use the bit to mark some NDD insns that swapping the source operands
> +// may allow to switch from EVEX encoding to REX2 encoding.
>  #define C StaticRounding
>  
>  #define FP 387|287|8087
> @@ -288,26 +292,40 @@ std, 0xfd, 0, NoSuf, {}
>  sti, 0xfb, 0, NoSuf, {}
>  
>  // Arithmetic.
> +add, 0x0, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }

There is _still_ Byte|Word|Dword|Qword in here (and below), when I think I
pointed out more than once before that in new templates such redundancy
wants omitting.

Since this isn't the first instance of earlier review comments not taken
care of, may I please ask that you make reasonably sure that new versions
aren't sent out like this?

>  add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
> +add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
>  add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
>  inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
> +inc, 0xfe/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64}
>  inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
> +sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Optimize|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }

Here and elsewhere, what's Optimize for? It not being there on other templates,
it can't be for the EVEX->REX2 optimization? If there are further optimization
plans, that's (again) something to mention in the description. Yet better would
be if such attributes were added only when respective optimizations are actually
introduced. Unlike e.g. NF, which would mean another bulk update if not added
right away, new optimizations typically affect only a few templates at a time.

>  sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +sub, 0x83/5, APX_F, Modrm|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  sub, 0x2c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
> +sub, 0x80/5, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }

There are still only 3 new templates here (and also above for add, plus for
other similar insns), when ...

>  dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
> +dec, 0xfe/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
> +sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +sbb, 0x83/3, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> +sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
> +sbb, 0x80/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }

... there are 6 new templates here. This is again an aspect I had pointed
out before. You cannot defer the addition of the other 3 until the NF patch,
as you want to make sure that with just this patch in place something both

    {evex} sbb %eax, %eax

and

    {evex} sub %eax, %eax

actually assemble, and to EVEX encodings. I can't see how that would work
in the latter case without those further templates.

The alternative is to also defer adding the 2-operand SBB templates (and
any others you add here which don't use DstVVVV).

>  cmp, 0x38, 0, D|W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  cmp, 0x83/7, 0, Modrm|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> @@ -318,31 +336,50 @@ test, 0x84, 0, D|W|C|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, R
>  test, 0xa8, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
>  test, 0xf6/0, 0, W|Modrm|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
> +and, 0x20, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
> +and, 0x80/4, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
> +or, 0x8, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
> +or, 0x80/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
> +xor, 0x30, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
> +xor, 0x80/6, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
>  // clr with 1 operand is really xor with 2 operands.
>  clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }

Btw., for consistency this may also want accompanying with an EVEX counterpart.

> +adc, 0x10, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  adc, 0x10, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> +adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>  adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword }
> +adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +adc, 0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
> +neg, 0xf6/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +
> +not, 0xf6/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +not, 0xf6/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  
>  aaa, 0x37, No64, NoSuf, {}
>  aas, 0x3f, No64, NoSuf, {}
> @@ -375,6 +412,7 @@ cqto, 0x99, x64, Size64|NoSuf, {}
>  // These multiplies can only be selected with single operand forms.
>  mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +imul, 0xaf, APX_F, C|Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }

Missing NF?

>  imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64 }
>  imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
>  imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
> @@ -389,52 +427,98 @@ div, 0xf6/6, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|
>  idiv, 0xf6/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>  idiv, 0xf6/7, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword }
>  
> +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
> +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }

Didn't we agree to avoid adding this (and its sibling) template, for the omitted
shift count being ambiguous? Consider

    rol %cl, %al

Is this a rotate by %cl, or a 1-bit NDD rotate?

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
  2023-12-08 14:12   ` Jan Beulich
@ 2023-12-08 14:27   ` Jan Beulich
  2023-12-12  5:53     ` Cui, Lili
  1 sibling, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-08 14:27 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, konglin1, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> --- a/opcodes/i386-dis-evex-reg.h
> +++ b/opcodes/i386-dis-evex-reg.h
> @@ -56,3 +56,58 @@
>      { "blsmskS",	{ VexGdq, Edq }, 0 },
>      { "blsiS",	{ VexGdq, Edq }, 0 },
>    },
> +  /* REG_EVEX_MAP4_80 */
> +  {
> +    { "addA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> +    { "orA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> +    { "adcA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> +    { "sbbA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> +    { "andA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> +    { "subA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> +    { "xorA",	{ VexGb, Eb, Ib }, NO_PREFIX },

Don't these need to use PREFIX_NP_OR_DATA? The doc clearly says
".IGNORED" there. (Applies to other byte ops as well then, of course.)

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-07 12:38   ` Jan Beulich
@ 2023-12-08 15:21     ` Cui, Lili
  2023-12-11  8:34       ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-08 15:21 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, December 7, 2023 8:39 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
> 
> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -409,6 +409,9 @@ struct _i386_insn
> >      /* Compressed disp8*N attribute.  */
> >      unsigned int memshift;
> >
> > +    /* No CSPAZO flags update.*/
> > +    bool has_nf;
> 
> As before I don't see the point in adding this field when it's not used in the
> change. Note that this is unrelated to the introduction of the NF attribute right
> here, which has a reason.
> 

Moved.

> > @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
> >
> >    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
> >    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> > -  {
> > -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> > -	   || maybe_cpu (t, CpuFMA))
> > -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> > +    {
> > +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> > +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> > +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> > +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >  	{
> >  	  if (need_evex_encoding ())
> 
> There are several issues here:
> - Why did you need to change (to the worse) the original code?
> - Why did you not model the addition after that original code?
> - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512
> insn can be VEX-encoded?

 I don't understand what you mean, we have this combination.

kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }

> - If these new macros are really needed for whatever reason, they shouldn't
>   be added to opcodes/i386-opc.h when they're useful only in the assembler.
> - Style requires a blank before the opening parenthesis in function
>   invocations (which also covers function-like macro invocations).
> 
> I think I asked before: How is it that you get away without altering
> cpu_flags_match(), containing related and quite similar logic?
> 

For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket and the content in the following brackets can be combined arbitrarily. I think it is Inaccurate. So I give examples one by one for each identified combination.

Just found cpu_flags_match() has similar logic, I think the following is the only code related to CPUID alerts, but none of our combinations are related to cpuavx.

          if (all.bitfield.cpuavx)
            {
              /* We need to check SSE2AVX with AVX.  */
              if (!t->opcode_modifier.sse2avx
                  || (sse2avx && !i.prefix[DATA_PREFIX]))
                match |= CPU_FLAGS_ARCH_MATCH;
            }

> > @@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t)
> >    return t->opcode_modifier.vex || t->opcode_modifier.evex;  }
> >
> > +static INLINE bool
> > +is_apx_evex_encoding (void)
> > +{
> > +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> > +    || (i.vex.register_specifier
> > +	&& i.vex.register_specifier->reg_flags & RegRex2); }
> 
> If you want this to be a function despite being used just once, you'll need to
> add a comment mentioning the constraint when calling it (or else the use of
> i.rex2 in particular is confusing). I'm sure I commented on this before, and I
> thought such a comment had already appeared.
> 

I also have the impression that it was added, anyway I will add it.

+/* We can use this function only when the current encoding is evex.  */
 static INLINE bool
 is_apx_evex_encoding (void)
 {

> > @@ -5655,17 +5693,17 @@ md_assemble (char *line)
> >       instruction already has a prefix, we need to convert old
> >       registers to new ones.  */
> >
> > -  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > -       && (i.op[0].regs->reg_flags & RegRex64) != 0)
> > -      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > -	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > -      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > -	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && (i.rex != 0 || i.rex2 != 0)))
> > +  if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > +	&& (i.op[0].regs->reg_flags & RegRex64) != 0)
> > +       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > +	   && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > +       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > +	    || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > +	   && (i.rex != 0 || i.rex2 != 0))))
> 
> I'm having trouble spotting the change here: There's an outer pair of
> parentheses being added, but that's for no reason unless there's another
> change well hidden. Please clarify.
> 

Removed.

> >      {
> >        int x;
> >
> > -      if (!i.rex2)
> > +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
> >  	i.rex |= REX_OPCODE;
> 
> Why the change to is_apx_rex2_encoding()? If that's wanted / needed here,
> shouldn't that be put in place by the earlier patch?
>

Moved to the corresponding patch.

> > @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
> *r)
> >        if (!cpu_arch_flags.bitfield.cpuapx_f
> >  	  || flag_code != CODE_64BIT)
> >  	return false;
> > +
> > +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as
> EVEX.
> > +	 For the later install_template function.  */
> > +      if (current_templates->start->opcode_modifier.vex
> > +	  && current_templates->start->opcode_modifier.evex)
> > +	i.vec_encoding = vex_encoding_evex;
> 
> I'm afraid I don't understand the 2nd sentence of the comment. This may be
> related to my question regarding cpu_flags_match() further up.
> 
> The first sentence isn't quite correct either - you don't mark any template here
> (and you can't, because we don't even know yet which template we're going
> to use).
> 
> Finally - do you really need the .evex check here? (I won't exclude that this
> yields a better diagnostic in certain cases, but this wants clarifying if so.)
> 

If you look at install_template(), you'll see that before this function we need to know if the current encoding is evex. We need to check opcode_modifier.evex here, it is a fix for issues caused by the merge of VEX and EVEX.
  if (t->opcode_modifier.vex && t->opcode_modifier.evex)
    {
      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
          || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
          || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
          || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
        {
          if (need_evex_encoding ())
            {

> > --- a/gas/testsuite/gas/i386/x86-64.exp
> > +++ b/gas/testsuite/gas/i386/x86-64.exp
> > @@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
> >  run_dump_test "x86-64-movbe"
> >  run_dump_test "x86-64-movbe-intel"
> >  run_dump_test "x86-64-movbe-suffix"
> > -run_list_test "x86-64-inval-movbe" "-al"
> > +run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -
> al"
> 
> I can see why you add the -march=, as we've been through this before.
> But why the -I ?
> 

Removed, It is redundant.

> > @@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
> > <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >                        load:Load:0, store:Store:0, +
> >                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> > -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> > +                      rex:REX:x64, rex2:REX2:APX_F,
> > + nooptimize:NoOptimize:0>
> 
> This change wants to go into the earlier patch?
> 

Done.

> > @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
> >
> >  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> >  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> 
> Seeing these: Are there any Map4 encodings which aren't EVex128? If not
> (and if you're also not hiddenly aware of some appearing in the near future),
> please consider making EVexMap4 include this right away. Even if in the longer
> run other encodings appear, it'll then be easy to simply replace all the
> EVexMap4 uses in a purely mechanical way. Until then shorter template lines
> are preferable.
> 

Would you mind defining it this way? Since #define EVex128 is behind it. Considering that you don't like unnecessary changes.

+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128

> > @@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {}  xsetbv,
> > 0xf01d1, Xsave, NoSuf, {}
> >
> >  // xsaveopt
> > -
> >  xsaveopt, 0xfae/6, Xsaveopt,
> > Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr,
> { Unspecified|BaseIndex
> > }  xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, {
> > Unspecified|BaseIndex }
> 
> Iirc the earlier patch added that blank line. Why would you do such back and
> forth?
> 

Done.

> > @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
> >
> >  // BMI2 instructions.
> >
> > -bzhi, 0xf5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -mulx, 0xf2f6, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -pdep, 0xf2f5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -pext, 0xf3f5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -rorx, 0xf2f0, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSu
> f, {
> > Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex,
> Reg32|Reg64
> > } -sarx, 0xf3f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -shlx, 0x66f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -shrx, 0xf2f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 }
> > +bzhi, 0xf5, BMI2&(BMI2|APX_F),
> >
> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
> ources|N
> > +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
> > +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> 
> Hmm, I had specifically suggested a pre-processor macro to use in place of the
> open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here
> and below)?
> 

There are many different types of combinations, and each combination appears relatively few times, so I think adding a #define for each combination feels a bit wasteful.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-07 13:34   ` Jan Beulich
@ 2023-12-11  6:16     ` Cui, Lili
  2023-12-11  8:43       ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-11  6:16 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- /dev/null
> > +++ b/opcodes/i386-dis-evex-x86-64.h
> > @@ -0,0 +1,60 @@
> > +  /* X86_64_EVEX_0F90 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F90) },  },
> > +  /* X86_64_EVEX_0F91 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F91) },  },
> > +  /* X86_64_EVEX_0F92 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F92) },  },
> > +  /* X86_64_EVEX_0F93 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F93) },  },
> > +  /* X86_64_EVEX_0F3849 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },  },
> > +  /* X86_64_EVEX_0F384B */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },  },
> > +  /* X86_64_EVEX_0F38F2 */
> > +  {
> > +    { Bad_Opcode },
> > +    { EVEX_LEN_TABLE (EVEX_LEN_0F38F2) },  },
> > +  /* X86_64_EVEX_0F38F3 */
> > +  {
> > +    { Bad_Opcode },
> > +    { EVEX_LEN_TABLE (EVEX_LEN_0F38F3) },  },
> > +  /* X86_64_EVEX_0F38F5 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F38F5) },  },
> > +  /* X86_64_EVEX_0F38F6 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F38F6) },  },
> > +  /* X86_64_EVEX_0F38F7 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F38F7) },  },
> > +  /* X86_64_EVEX_0F3AF0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F3AF0) },  },
> 
> I'm puzzled here: There are two uses of EVEX_LEN_TABLE() and several more
> of VEX_LEN_TABLE(). Yet the underlying pattern of those insns is all the same. I
> may guess that this is related to PREFIX_OPCODE use in the respective VEX
> table entries, yet isn't it then cheaper overall to have VEX encodings also go
> through prefix_table[], and then sharing those entries with EVEX encodings?
> 

Done.

> What's further puzzling: When setting evex_from_vex you already check L'L ==
> 0, so there's no reason to go through evex_len_table[] / vex_len_table[].
> 

Directly use the next level of len_table[].

> > @@ -1268,7 +1296,21 @@ enum
> >    X86_64_VEX_0F38ED,
> >    X86_64_VEX_0F38EE,
> >    X86_64_VEX_0F38EF,
> > +
> >    X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
> > +
> > +  X86_64_EVEX_0F90,
> > +  X86_64_EVEX_0F91,
> > +  X86_64_EVEX_0F92,
> > +  X86_64_EVEX_0F93,
> > +  X86_64_EVEX_0F3849,
> > +  X86_64_EVEX_0F384B,
> 
> For these two, won't the respective VEX enumerators and table entries do?
> 

Done.

> > @@ -4524,10 +4568,11 @@ static const struct dis386 x86_64_table[][2] =
> > {
> >
> >    /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
> >    {
> > -    { Bad_Opcode },
> > -    { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
> > +      { Bad_Opcode },
> > +      { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
> >    },
> 
> Actively corrupting indentation here?
> 

Done.

> > @@ -8733,6 +8778,17 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >        dp = &prefix_table[dp->op[1].bytemode][vindex];
> >        break;
> >
> > +    case USE_X86_64_EVEX_FROM_VEX_TABLE:
> > +      ins->evex_type = evex_from_vex;
> > +      /* EVEX from evex instrucions require that EVEX.z, EVEX.L’L,
> > + EVEX.b and
> 
> "EVEX from VEX ..."?
> 

Done.

> > +	 the lower 2 bits of EVEX.aaa must be 0.  */
> > +      if ((ins->vex.mask_register_specifier & 0x3) != 0
> > +	  || ins->vex.ll != 0
> > +	  || ins->vex.zeroing != 0
> > +	  || ins->vex.b)
> > +	return &bad_opcode;
> > +
> > +      /* Fall through.  */
> >      case USE_X86_64_TABLE:
> 
> Instead of falling through here to go through x86_64_table[] (where in all
> cases the non-64-bit slot is "bad"), can't you avoid that step and go to the
> next step (uniformly the LEN one) right away, saving all those new table entries
> (along the lines of what you do below when processing into
> evex_from_legacy)?
> 

It's not very clear to me here, do you want to add the vex_len_table to delete all entries in i386-dis-evex-x86-64.h?  but in this way, there are still some instructions that need to go through x86_64_table[], such as X86_64_VEX_0F38E*.

    case USE_X86_64_EVEX_FROM_VEX_TABLE:
      ins->evex_type = evex_from_vex;
      /* EVEX from VEX instrucions require that EVEX.z, EVEX.L’L, EVEX.b and
         the lower 2 bits of EVEX.aaa must be 0.  */
      if ((ins->vex.mask_register_specifier & 0x3) != 0
          || ins->vex.ll != 0
          || ins->vex.zeroing != 0
          || ins->vex.b)
        return &bad_opcode;

     dp = &vex_len_table[dp->op[1].bytemode][0];
break;

> > @@ -8978,9 +9034,13 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >        if (!fetch_code (ins->info, ins->codep + 4))
> >  	return &err_opcode;
> >        /* The first byte after 0x62.  */
> > +      if (*ins->codep & 0x8)
> > +	ins->rex2 |= REX_B;
> > +      if (!(*ins->codep & 0x10))
> > +	ins->rex2 |= REX_R;
> > +
> >        ins->rex = ~(*ins->codep >> 5) & 0x7;
> > -      ins->vex.r = *ins->codep & 0x10;
> > -      switch ((*ins->codep & 0xf))
> > +      switch ((*ins->codep & 0x7))
> 
> Please can you take the opportunity and drop the excess parentheses?
> 

Done.

> > @@ -9041,12 +9106,24 @@ get_valid_dis386 (const struct dis386 *dp,
> > instr_info *ins)
> >
> >        if (ins->address_mode != mode_64bit)
> >  	{
> > +	  if (ins->evex_type != evex_default
> > +	      || (ins->rex2 & (REX_B | REX_X)))
> > +	    return &bad_opcode;
> 
> What's special about X and B?
> 

For evex_default, the values of these two bits are fixed. Comment added.

      if (ins->address_mode != mode_64bit)
        {
          /* Report bad for !evex_default and when two fixed values of evex
             change..  */
          if (ins->evex_type != evex_default
              || (ins->rex2 & (REX_B | REX_X)))
            return &bad_opcode;

> > @@ -9460,6 +9537,13 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        dp = get_valid_dis386 (dp, &ins);
> >        if (dp == &err_opcode)
> >  	goto fetch_error_out;
> > +
> > +      /* For APX instructions promoted from legacy maps 0/1, prefix
> > +	 0x66 is interpreted as the operand size override.  */
> > +      if (ins.evex_type == evex_from_legacy
> > +	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
> > +	sizeflag ^= DFLAG;
> 
> I think the comment wants to say "embedded prefix", as "prefix 0x66" is
> simply invalid to use with EVEX.
> 

Done, thanks.

> > @@ -9639,6 +9723,24 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        if (ins.last_repnz_prefix >= 0)
> >  	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
> >        break;
> > +
> > +    case PREFIX_NP_OR_DATA:
> > +      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
> 
> ~DATA_PREFIX_OPCODE == 0x99, which likely isn't what you mean here? Do
> you perhaps mean e.g. "> DATA_PREFIX_OPCODE"? (Using the opcodes in
> vex.prefix is questionable anyway, but that's a pre-existing oddity.)
> 

(A || 0) & ~A must be 0. It's hard to read.  

How about this ? This is more intuitive and easy to understand.

    case PREFIX_NP_OR_DATA:
      if (ins.vex.prefix == REPE_PREFIX_OPCODE
          || ins.vex.prefix == REPNE_PREFIX_OPCODE)
        {
          i386_dis_printf (info, dis_style_text, "(bad)");
          ret = ins.end_codep - priv.the_buffer;
          goto out;
        }

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 5/9] Add tests for APX GPR32 with extend evex prefix
  2023-12-07 14:05   ` Jan Beulich
@ 2023-12-11  6:16     ` Cui, Lili
  2023-12-11  8:55       ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-11  6:16 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
> > @@ -12,4 +12,192 @@
> >  .*:16: Error: unsupported extended GPR for addressing for `xsaveopt64'
> >  .*:17: Error: unsupported extended GPR for addressing for `xsavec'
> >  .*:18: Error: unsupported extended GPR for addressing for `xsavec64'
> > +.*:20: Error: unsupported extended GPR for addressing for `blendpd'
> > +.*:21: Error: unsupported extended GPR for addressing for `blendps'
> > +.*:22: Error: unsupported extended GPR for addressing for `blendvpd'
> > +.*:23: Error: unsupported extended GPR for addressing for `blendvpd'
> > +.*:24: Error: unsupported extended GPR for addressing for `blendvps'
> > +.*:25: Error: unsupported extended GPR for addressing for `blendvps'
> > +.*:26: Error: unsupported extended GPR for addressing for `dppd'
> > +.*:27: Error: unsupported extended GPR for addressing for `dpps'
> 
> Seeing this diagnostic in action, I'm afraid I think this would better be closer to
> the diagnostics i386_index_check() issues. E.g.
> "extended GPR cannot be used as base/index for ...".
> 

Done.

> > --- a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
> > @@ -1,4 +1,4 @@
> > -# Check Illegal 64bit APX_F instructions
> > +# Check illegal 64bit APX_F instructions
> 
> ???
> 

Changed in the original patch.

> > +#VEX without evex
> > +	vaesimc (%r27), %xmm3
> > +	vaeskeygenassist $7,(%r27),%xmm3
> > +	vblendpd $7,(%r27),%xmm6,%xmm2
> > +	vblendpd $7,(%r27),%ymm6,%ymm2
> > +	vblendps $7,(%r27),%xmm6,%xmm2
> > +	vblendps $7,(%r27),%ymm6,%ymm2
> > +	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
> > +	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
> > +	vblendvps %xmm4,(%r27),%xmm2,%xmm7
> > +	vblendvps %ymm4,(%r27),%ymm2,%ymm7
> > +	vdppd $7,(%r27),%xmm6,%xmm2
> > +	vdpps $7,(%r27),%xmm6,%xmm2
> > +	vdpps $7,(%r27),%ymm6,%ymm2
> > +	vhaddpd (%r27),%xmm6,%xmm5
> > +	vhaddpd (%r27),%ymm6,%ymm5
> > +	vhsubps (%r27),%xmm6,%xmm5
> > +	vhsubps (%r27),%ymm6,%ymm5
> > +	vlddqu (%r27),%xmm4
> > +	vlddqu (%r27),%ymm4
> > +	vldmxcsr (%r27)
> > +	vmaskmovpd %xmm4,%xmm6,(%r27)
> > +	vmaskmovpd %ymm4,%ymm6,(%r27)
> > +	vmaskmovpd (%r27),%xmm4,%xmm6
> > +	vmaskmovpd (%r27),%ymm4,%ymm6
> > +	vmaskmovps %xmm4,%xmm6,(%r27)
> > +	vmaskmovps %ymm4,%ymm6,(%r27)
> > +	vmaskmovps (%r27),%xmm4,%xmm6
> > +	vmaskmovps (%r27),%ymm4,%ymm6
> > +	vmovmskpd %xmm4,%r27d
> > +	vmovmskpd %xmm8,%r27d
> > +	vmovmskps %xmm4,%r27d
> > +	vmovmskps %ymm8,%r27d
> > +	vpblendd $7,(%r27),%xmm6,%xmm2
> > +	vpblendd $7,(%r27),%ymm6,%ymm2
> > +	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
> > +	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
> > +	vpblendw $7,(%r27),%xmm6,%xmm2
> > +	vpblendw $7,(%r27),%ymm6,%ymm2
> > +	vpcmpeqb (%r26),%ymm6,%ymm2
> > +	vpcmpeqd (%r26),%ymm6,%ymm2
> > +	vpcmpeqq (%r16),%ymm6,%ymm2
> > +	vpcmpeqw (%r16),%ymm6,%ymm2
> > +	vpcmpestri $7,(%r27),%xmm6
> > +	vpcmpestrm $7,(%r27),%xmm6
> > +	vpcmpgtb (%r26),%ymm6,%ymm2
> > +	vpcmpgtd (%r26),%ymm6,%ymm2
> > +	vpcmpgtq (%r16),%ymm6,%ymm2
> > +	vpcmpgtw (%r16),%ymm6,%ymm2
> > +	vpcmpistri $100,(%r25),%xmm6
> > +	vpcmpistrm $100,(%r25),%xmm6
> > +	vperm2f128 $7,(%r27),%ymm6,%ymm2
> > +	vperm2i128 $7,(%r27),%ymm6,%ymm2
> > +	vphaddd (%r27),%xmm6,%xmm7
> > +	vphaddd (%r27),%ymm6,%ymm7
> > +	vphaddsw (%r27),%xmm6,%xmm7
> > +	vphaddsw (%r27),%ymm6,%ymm7
> > +	vphaddw (%r27),%xmm6,%xmm7
> > +	vphaddw (%r27),%ymm6,%ymm7
> > +	vphminposuw (%r27),%xmm6
> > +	vphsubd (%r27),%xmm6,%xmm7
> > +	vphsubd (%r27),%ymm6,%ymm7
> > +	vphsubsw (%r27),%xmm6,%xmm7
> > +	vphsubsw (%r27),%ymm6,%ymm7
> > +	vphsubw (%r27),%xmm6,%xmm7
> > +	vphsubw (%r27),%ymm6,%ymm7
> > +	vpmaskmovd %xmm4,%xmm6,(%r27)
> > +	vpmaskmovd %ymm4,%ymm6,(%r27)
> > +	vpmaskmovd (%r27),%xmm4,%xmm6
> > +	vpmaskmovd (%r27),%ymm4,%ymm6
> > +	vpmaskmovq %xmm4,%xmm6,(%r27)
> > +	vpmaskmovq %ymm4,%ymm6,(%r27)
> > +	vpmaskmovq (%r27),%xmm4,%xmm6
> > +	vpmaskmovq (%r27),%ymm4,%ymm6
> > +	vpmovmskb %xmm4,%r27
> > +	vpmovmskb %ymm4,%r27d
> > +	vpsignb (%r27),%xmm6,%xmm7
> > +	vpsignb (%r27),%xmm6,%xmm7
> > +	vpsignd (%r27),%xmm6,%xmm7
> > +	vpsignd (%r27),%xmm6,%xmm7
> > +	vpsignw (%r27),%xmm6,%xmm7
> > +	vpsignw (%r27),%xmm6,%xmm7
> > +	vptest (%r27),%ymm6
> > +	vrcpps (%r27),%xmm6
> > +	vrcpps (%r27),%ymm6
> > +	vrcpss (%r27),%xmm6,%xmm6
> > +	vroundpd $1,(%r24),%xmm6
> > +	vroundps $2,(%r24),%xmm6
> > +	vroundsd $3,(%r24),%xmm6,%xmm3
> > +	vroundss $4,(%r24),%xmm6,%xmm3
> 
> There's still the pending question of whether these really need to be treated
> as invalid (rather than being converted to VRNDSCALE*). Also (to a lesser
> degree) for {LD,ST}MXCSR.
> 

GCC already performs these conversions, and many instructions require this. it has converted vstmxcsr/vldmxcsr to ldmxcsr/stmxcsr under APX.

> > +	vrsqrtps (%r27),%xmm6
> > +	vrsqrtps (%r27),%ymm6
> > +	vrsqrtss (%r27),%xmm6,%xmm6
> > +	vstmxcsr (%r27)
> > +	vtestpd (%r27),%xmm6
> > +	vtestpd (%r27),%ymm6
> > +	vtestps (%r27),%xmm6
> > +	vtestps (%r27),%ymm6
> > +	vtestps (%r27),%ymm6
> > +	vptest (%r27),%xmm6
> 
> This one wants moving up, now that sorting was mostly done.
> 

Done.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > @@ -0,0 +1,28 @@
> > +# Check Illegal prefix for 64bit EVEX-promoted instructions
> > +
> > +        .allow_index_reg
> > +        .text
> > +_start:
> > +	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
> > +	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
> > +	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
> > +	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
> > +	#VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10]
> == 0
> > +	#(illegal value).
> > +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
> > +	.byte 0xff
> 
> For the purpose of this test (whatever P[10] again is) you don't need a 32-bit
> displacement, do you? Shorter is (almost always) better in such tests.
> 

P[10] is a fixed value, in normal EVEX format we don't use this bit.  Dropped 0x7b.

> > +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.mm == b01 (illegal value).
> > +	.insn EVEX.L0.66.M13.W0 0x60, %di, %ax
> > +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.aa(P[17:16]) == b01 (illegal
> value).
> 
> There's aaa, but no aa afaik.
> 

Change it to EVEX.a1a0, aaa is split into two parts in EVEX-promoted format, a3 is NF and a1a0 is a fixed value.

EVEX.a1a0 (P[17:16]) == b01

> > +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
> > +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == 0b11 (illegal value).
> 
> How's z relevant when the value is just a 2-bit one? And then z should likely
> have a separate test (also for the from-VEX case below)?
> 

Modified it and added EVEX.z testcase for MAP4 and from-VEX.

> > +	.insn EVEX.L0.66.M12.W0 0x60, %di, {rd-sae}, %ax
> > +	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[17:16](EVEX.aa) == 1
> (illegal value).
> > +	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax, %ebx, %ecx{%k1}
> > +	.byte 0xff, 0xff, 0xff
> > +	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[22:21](EVEX.L’L) == 1
> (illegal value).
> > +	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax, {rd-sae}, %ebx, %ecx
> > +	.byte 0xff, 0xff, 0xff
> 
> If you arranged for a ModR/M byte of 0xc9 (among other possibilities) in both
> of these cases, you could avoid the .byte lines altogether afaict.
> 
 
Use other value instead of 0xc9,

        #EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[17:16](EVEX.aa) == 0b01
        #(illegal value).
        .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx{%k1}
        #EVEX from VEX bzhi %rax,(%rax,%rbx),%ecx EVEX.P[22:21](EVEX.L’L) == 0b01
        #(illegal value).
        .insn EVEX.L1.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
        #EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[23](EVEX.z) == 0b1
        #(illegal value).
        .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx {%k7}{z}
        #EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[20](EVEX.b) == 0b1
        #(illegal value).
        .insn EVEX.L0.NP.0f38.W0 0xf5, %rax ,(%rax,%rbx){1to8}, %rcx       

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-08 15:21     ` Cui, Lili
@ 2023-12-11  8:34       ` Jan Beulich
  2023-12-12 10:44         ` Cui, Lili
  2023-12-12 12:58         ` Cui, Lili
  0 siblings, 2 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-11  8:34 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 08.12.2023 16:21, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, December 7, 2023 8:39 PM
>>
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>>>
>>>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>> -  {
>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>> -	   || maybe_cpu (t, CpuFMA))
>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>>> +    {
>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>> APX_F(CpuCMPCCXADD)
>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>> APX_F(CpuAVX512DQ)
>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>  	{
>>>  	  if (need_evex_encoding ())
>>
>> There are several issues here:
>> - Why did you need to change (to the worse) the original code?
>> - Why did you not model the addition after that original code?
>> - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512
>> insn can be VEX-encoded?
> 
>  I don't understand what you mean, we have this combination.
> 
> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }

Oh, I'm sorry: I forgot about the mask register insns.

>> - If these new macros are really needed for whatever reason, they shouldn't
>>   be added to opcodes/i386-opc.h when they're useful only in the assembler.
>> - Style requires a blank before the opening parenthesis in function
>>   invocations (which also covers function-like macro invocations).
>>
>> I think I asked before: How is it that you get away without altering
>> cpu_flags_match(), containing related and quite similar logic?
>>
> 
> For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket and the content in the following brackets can be combined arbitrarily. I think it is Inaccurate.

In which way? If there are issues with the existing code, these issues want
taking care of in separate (prereq) patches. Of course there are assumptions
made here about the CPU combinations that can (and cannot) occur in any of
our templates. Similar assumptions are imo fine to make in the APX additions.

Note how I used two nested if()s despite that not having been necessary at
that time. I did so in anticipation that for APX you'd want to add another
(separate) inner if(), rather than altering the one that's there.

> So I give examples one by one for each identified combination.

Which examples are you talking about? I see none given in your reply.

> Just found cpu_flags_match() has similar logic, I think the following is the only code related to CPUID alerts, but none of our combinations are related to cpuavx.
> 
>           if (all.bitfield.cpuavx)
>             {
>               /* We need to check SSE2AVX with AVX.  */
>               if (!t->opcode_modifier.sse2avx
>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>                 match |= CPU_FLAGS_ARCH_MATCH;
>             }

Not sure why you pick out this one. This special case is needed for sse2avx;
I don't see how it's related here. What I've been pointing you at is the
code in that function which follows a similar "Dual VEX/EVEX templates ..."
comment.

>>> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
>> *r)
>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
>>>  	  || flag_code != CODE_64BIT)
>>>  	return false;
>>> +
>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as
>> EVEX.
>>> +	 For the later install_template function.  */
>>> +      if (current_templates->start->opcode_modifier.vex
>>> +	  && current_templates->start->opcode_modifier.evex)
>>> +	i.vec_encoding = vex_encoding_evex;
>>
>> I'm afraid I don't understand the 2nd sentence of the comment. This may be
>> related to my question regarding cpu_flags_match() further up.
>>
>> The first sentence isn't quite correct either - you don't mark any template here
>> (and you can't, because we don't even know yet which template we're going
>> to use).
>>
>> Finally - do you really need the .evex check here? (I won't exclude that this
>> yields a better diagnostic in certain cases, but this wants clarifying if so.)
>>
> 
> If you look at install_template(), you'll see that before this function we need to know if the current encoding is evex.

"This function" being check_register()? If so, then no, we can't know up front
whether EVEX encoding is going to be needed, as operand parsing happens ahead
of template selection. If instead you mean "that function" and hence
install_template(), then yes, we need to know whether to use EVEX there. Yet
how does that result in a need for the .evex check here? (Or maybe your reply
was really to the first of the three parts of my earlier one?)

But anyway - as said earlier on, using current_templates here looks wrong in
the first place. check_register() deals with only a register, without regard
to the context it is used in (with the sole exception of allow_pseudo_reg).
May I remind you that earlier on I already indicated that I suspect you'll
need a new enumerator to put in i.vec_encoding for this new purpose?

> We need to check opcode_modifier.evex here, it is a fix for issues caused by the merge of VEX and EVEX.
>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>     {
>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>         {
>           if (need_evex_encoding ())
>             {
>[...]
>>> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
>>>
>>>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
>>> Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
>>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
>>> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
>>> +Oword|Unspecified|BaseIndex, Reg64 }
>>>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
>>> Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
>>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
>>> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
>>> +Oword|Unspecified|BaseIndex, Reg64 }
>>
>> Seeing these: Are there any Map4 encodings which aren't EVex128? If not
>> (and if you're also not hiddenly aware of some appearing in the near future),
>> please consider making EVexMap4 include this right away. Even if in the longer
>> run other encodings appear, it'll then be easy to simply replace all the
>> EVexMap4 uses in a purely mechanical way. Until then shorter template lines
>> are preferable.
>>
> 
> Would you mind defining it this way? Since #define EVex128 is behind it. Considering that you don't like unnecessary changes.
> 
> +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128

The order of #define-s doesn't matter. There's no reason not to use EVex128 here
even if it's #define-d only a few lines later.

>>> @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
>>>
>>>  // BMI2 instructions.
>>>
>>> -bzhi, 0xf5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -mulx, 0xf2f6, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -pdep, 0xf2f5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -pext, 0xf3f5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -rorx, 0xf2f0, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSu
>> f, {
>>> Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex,
>> Reg32|Reg64
>>> } -sarx, 0xf3f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -shlx, 0x66f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -shrx, 0xf2f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 }
>>> +bzhi, 0xf5, BMI2&(BMI2|APX_F),
>>>
>> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
>> ources|N
>>> +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
>>> +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
>>
>> Hmm, I had specifically suggested a pre-processor macro to use in place of the
>> open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here
>> and below)?
> 
> There are many different types of combinations, and each combination appears relatively few times, so I think adding a #define for each combination feels a bit wasteful.

I never suggested using multiple #define-s. I suggested a single APX_F()
macro which would be used uniformly here and elsewhere (here: APX_F(BMI2)).
And that macro would come with a comment explaining why the expression is
the (seemingly strange) way it is. Right now there's no such explanation
anywhere, and it would also be hard to find a good (central) place where to
put it.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-11  6:16     ` Cui, Lili
@ 2023-12-11  8:43       ` Jan Beulich
  0 siblings, 0 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-11  8:43 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 11.12.2023 07:16, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> +	 the lower 2 bits of EVEX.aaa must be 0.  */
>>> +      if ((ins->vex.mask_register_specifier & 0x3) != 0
>>> +	  || ins->vex.ll != 0
>>> +	  || ins->vex.zeroing != 0
>>> +	  || ins->vex.b)
>>> +	return &bad_opcode;
>>> +
>>> +      /* Fall through.  */
>>>      case USE_X86_64_TABLE:
>>
>> Instead of falling through here to go through x86_64_table[] (where in all
>> cases the non-64-bit slot is "bad"), can't you avoid that step and go to the
>> next step (uniformly the LEN one) right away, saving all those new table entries
>> (along the lines of what you do below when processing into
>> evex_from_legacy)?
>>
> 
> It's not very clear to me here, do you want to add the vex_len_table to delete all entries in i386-dis-evex-x86-64.h?

Indeed I think that nothing there is really needed / warranted. The case can
be handled with no new table entries at all, I think.

>  but in this way, there are still some instructions that need to go through x86_64_table[], such as X86_64_VEX_0F38E*.

Why would these need special treatment? All EVEX-from-VEX encodings are uniform
in being defined for 64-bit code only.

>>> @@ -9041,12 +9106,24 @@ get_valid_dis386 (const struct dis386 *dp,
>>> instr_info *ins)
>>>
>>>        if (ins->address_mode != mode_64bit)
>>>  	{
>>> +	  if (ins->evex_type != evex_default
>>> +	      || (ins->rex2 & (REX_B | REX_X)))
>>> +	    return &bad_opcode;
>>
>> What's special about X and B?
>>
> 
> For evex_default, the values of these two bits are fixed. Comment added.
> 
>       if (ins->address_mode != mode_64bit)
>         {
>           /* Report bad for !evex_default and when two fixed values of evex
>              change..  */
>           if (ins->evex_type != evex_default
>               || (ins->rex2 & (REX_B | REX_X)))
>             return &bad_opcode;

Maybe you didn't get my point: What's wrong with just checking ins->rex2 here
as a whole, rather than specially treating two of the bits?

>>> @@ -9639,6 +9723,24 @@ print_insn (bfd_vma pc, disassemble_info *info,
>> int intel_syntax)
>>>        if (ins.last_repnz_prefix >= 0)
>>>  	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
>>>        break;
>>> +
>>> +    case PREFIX_NP_OR_DATA:
>>> +      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
>>
>> ~DATA_PREFIX_OPCODE == 0x99, which likely isn't what you mean here? Do
>> you perhaps mean e.g. "> DATA_PREFIX_OPCODE"? (Using the opcodes in
>> vex.prefix is questionable anyway, but that's a pre-existing oddity.)
>>
> 
> (A || 0) & ~A must be 0. It's hard to read.  
> 
> How about this ? This is more intuitive and easy to understand.
> 
>     case PREFIX_NP_OR_DATA:
>       if (ins.vex.prefix == REPE_PREFIX_OPCODE
>           || ins.vex.prefix == REPNE_PREFIX_OPCODE)
>         {
>           i386_dis_printf (info, dis_style_text, "(bad)");
>           ret = ins.end_codep - priv.the_buffer;
>           goto out;
>         }

That's fine with me as well, sure.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 5/9] Add tests for APX GPR32 with extend evex prefix
  2023-12-11  6:16     ` Cui, Lili
@ 2023-12-11  8:55       ` Jan Beulich
  0 siblings, 0 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-11  8:55 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 11.12.2023 07:16, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> +#VEX without evex
>>> +	vaesimc (%r27), %xmm3
>>> +	vaeskeygenassist $7,(%r27),%xmm3
>>> +	vblendpd $7,(%r27),%xmm6,%xmm2
>>> +	vblendpd $7,(%r27),%ymm6,%ymm2
>>> +	vblendps $7,(%r27),%xmm6,%xmm2
>>> +	vblendps $7,(%r27),%ymm6,%ymm2
>>> +	vblendvpd %xmm4,(%r27),%xmm2,%xmm7
>>> +	vblendvpd %ymm4,(%r27),%ymm2,%ymm7
>>> +	vblendvps %xmm4,(%r27),%xmm2,%xmm7
>>> +	vblendvps %ymm4,(%r27),%ymm2,%ymm7
>>> +	vdppd $7,(%r27),%xmm6,%xmm2
>>> +	vdpps $7,(%r27),%xmm6,%xmm2
>>> +	vdpps $7,(%r27),%ymm6,%ymm2
>>> +	vhaddpd (%r27),%xmm6,%xmm5
>>> +	vhaddpd (%r27),%ymm6,%ymm5
>>> +	vhsubps (%r27),%xmm6,%xmm5
>>> +	vhsubps (%r27),%ymm6,%ymm5
>>> +	vlddqu (%r27),%xmm4
>>> +	vlddqu (%r27),%ymm4
>>> +	vldmxcsr (%r27)
>>> +	vmaskmovpd %xmm4,%xmm6,(%r27)
>>> +	vmaskmovpd %ymm4,%ymm6,(%r27)
>>> +	vmaskmovpd (%r27),%xmm4,%xmm6
>>> +	vmaskmovpd (%r27),%ymm4,%ymm6
>>> +	vmaskmovps %xmm4,%xmm6,(%r27)
>>> +	vmaskmovps %ymm4,%ymm6,(%r27)
>>> +	vmaskmovps (%r27),%xmm4,%xmm6
>>> +	vmaskmovps (%r27),%ymm4,%ymm6
>>> +	vmovmskpd %xmm4,%r27d
>>> +	vmovmskpd %xmm8,%r27d
>>> +	vmovmskps %xmm4,%r27d
>>> +	vmovmskps %ymm8,%r27d
>>> +	vpblendd $7,(%r27),%xmm6,%xmm2
>>> +	vpblendd $7,(%r27),%ymm6,%ymm2
>>> +	vpblendvb %xmm4,(%r27),%xmm2,%xmm7
>>> +	vpblendvb %ymm4,(%r27),%ymm2,%ymm7
>>> +	vpblendw $7,(%r27),%xmm6,%xmm2
>>> +	vpblendw $7,(%r27),%ymm6,%ymm2
>>> +	vpcmpeqb (%r26),%ymm6,%ymm2
>>> +	vpcmpeqd (%r26),%ymm6,%ymm2
>>> +	vpcmpeqq (%r16),%ymm6,%ymm2
>>> +	vpcmpeqw (%r16),%ymm6,%ymm2
>>> +	vpcmpestri $7,(%r27),%xmm6
>>> +	vpcmpestrm $7,(%r27),%xmm6
>>> +	vpcmpgtb (%r26),%ymm6,%ymm2
>>> +	vpcmpgtd (%r26),%ymm6,%ymm2
>>> +	vpcmpgtq (%r16),%ymm6,%ymm2
>>> +	vpcmpgtw (%r16),%ymm6,%ymm2
>>> +	vpcmpistri $100,(%r25),%xmm6
>>> +	vpcmpistrm $100,(%r25),%xmm6
>>> +	vperm2f128 $7,(%r27),%ymm6,%ymm2
>>> +	vperm2i128 $7,(%r27),%ymm6,%ymm2
>>> +	vphaddd (%r27),%xmm6,%xmm7
>>> +	vphaddd (%r27),%ymm6,%ymm7
>>> +	vphaddsw (%r27),%xmm6,%xmm7
>>> +	vphaddsw (%r27),%ymm6,%ymm7
>>> +	vphaddw (%r27),%xmm6,%xmm7
>>> +	vphaddw (%r27),%ymm6,%ymm7
>>> +	vphminposuw (%r27),%xmm6
>>> +	vphsubd (%r27),%xmm6,%xmm7
>>> +	vphsubd (%r27),%ymm6,%ymm7
>>> +	vphsubsw (%r27),%xmm6,%xmm7
>>> +	vphsubsw (%r27),%ymm6,%ymm7
>>> +	vphsubw (%r27),%xmm6,%xmm7
>>> +	vphsubw (%r27),%ymm6,%ymm7
>>> +	vpmaskmovd %xmm4,%xmm6,(%r27)
>>> +	vpmaskmovd %ymm4,%ymm6,(%r27)
>>> +	vpmaskmovd (%r27),%xmm4,%xmm6
>>> +	vpmaskmovd (%r27),%ymm4,%ymm6
>>> +	vpmaskmovq %xmm4,%xmm6,(%r27)
>>> +	vpmaskmovq %ymm4,%ymm6,(%r27)
>>> +	vpmaskmovq (%r27),%xmm4,%xmm6
>>> +	vpmaskmovq (%r27),%ymm4,%ymm6
>>> +	vpmovmskb %xmm4,%r27
>>> +	vpmovmskb %ymm4,%r27d
>>> +	vpsignb (%r27),%xmm6,%xmm7
>>> +	vpsignb (%r27),%xmm6,%xmm7
>>> +	vpsignd (%r27),%xmm6,%xmm7
>>> +	vpsignd (%r27),%xmm6,%xmm7
>>> +	vpsignw (%r27),%xmm6,%xmm7
>>> +	vpsignw (%r27),%xmm6,%xmm7
>>> +	vptest (%r27),%ymm6
>>> +	vrcpps (%r27),%xmm6
>>> +	vrcpps (%r27),%ymm6
>>> +	vrcpss (%r27),%xmm6,%xmm6
>>> +	vroundpd $1,(%r24),%xmm6
>>> +	vroundps $2,(%r24),%xmm6
>>> +	vroundsd $3,(%r24),%xmm6,%xmm3
>>> +	vroundss $4,(%r24),%xmm6,%xmm3
>>
>> There's still the pending question of whether these really need to be treated
>> as invalid (rather than being converted to VRNDSCALE*). Also (to a lesser
>> degree) for {LD,ST}MXCSR.
>>
> 
> GCC already performs these conversions, and many instructions require this. it has converted vstmxcsr/vldmxcsr to ldmxcsr/stmxcsr under APX.

What other instructions are covered by "many"? I don't see a similar pattern
applying for other than the named ones.

Also, how does it help an assembler programmer if gcc already does the
conversion? Or even a C programmer using inline assembly? It's still not
really clear to me how inline assembly is going to be dealt with in a
fully flexible, yet sufficiently restricting way. Hence any help that
can be provided to avoid non-standard constructs ought to be put in place
(imo).

>>> --- /dev/null
>>> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
>>> @@ -0,0 +1,28 @@
>>> +# Check Illegal prefix for 64bit EVEX-promoted instructions
>>> +
>>> +        .allow_index_reg
>>> +        .text
>>> +_start:
>>> +	#movbe %r23w,%ax set EVEX.pp = f3 (illegal value).
>>> +	.insn EVEX.L0.f3.M12.W0 0x60, %di, %ax
>>> +	#movbe %r23w,%ax set EVEX.pp = f2 (illegal value).
>>> +	.insn EVEX.L0.f2.M12.W0 0x60, %di, %ax
>>> +	#VSIB vpgatherqq 0x7b(%rbp,%zmm17,8),%zmm16{%k1} set EVEX.P[10]
>> == 0
>>> +	#(illegal value).
>>> +	.byte 0x62, 0xe2, 0xf9, 0x41, 0x91, 0x84, 0xcd, 0x7b, 0x00, 0x00, 0x00
>>> +	.byte 0xff
>>
>> For the purpose of this test (whatever P[10] again is) you don't need a 32-bit
>> displacement, do you? Shorter is (almost always) better in such tests.
>>
> 
> P[10] is a fixed value, in normal EVEX format we don't use this bit.  Dropped 0x7b.
> 
>>> +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.mm == b01 (illegal value).
>>> +	.insn EVEX.L0.66.M13.W0 0x60, %di, %ax
>>> +	#EVEX_MAP4 movbe %r23w,%ax set EVEX.aa(P[17:16]) == b01 (illegal
>> value).
>>
>> There's aaa, but no aa afaik.
>>
> 
> Change it to EVEX.a1a0, aaa is split into two parts in EVEX-promoted format, a3 is NF and a1a0 is a fixed value.
> 
> EVEX.a1a0 (P[17:16]) == b01

But a1a0 isn't a term documentation uses either. Just to repeat an earlier
request of mine: These comments need to be easy to decipher and follow.
Hence they want to use as easily understandable terminology as possible.
One way to express what you're after may be "EVEX.aaa[1:0] (P[17:16])".
I'm sure there are further ways while stay in line with what the SDM uses.

>>> +	.insn EVEX.L0.66.M12.W0 0x60, %di, %ax{%k1}
>>> +	#EVEX_MAP4 movbe %r18w,%ax set EVEX.zL'L == 0b11 (illegal value).
>>
>> How's z relevant when the value is just a 2-bit one? And then z should likely
>> have a separate test (also for the from-VEX case below)?
>>
> 
> Modified it and added EVEX.z testcase for MAP4 and from-VEX.
> 
>>> +	.insn EVEX.L0.66.M12.W0 0x60, %di, {rd-sae}, %ax
>>> +	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[17:16](EVEX.aa) == 1
>> (illegal value).
>>> +	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax, %ebx, %ecx{%k1}
>>> +	.byte 0xff, 0xff, 0xff
>>> +	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[22:21](EVEX.L’L) == 1
>> (illegal value).
>>> +	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax, {rd-sae}, %ebx, %ecx
>>> +	.byte 0xff, 0xff, 0xff
>>
>> If you arranged for a ModR/M byte of 0xc9 (among other possibilities) in both
>> of these cases, you could avoid the .byte lines altogether afaict.
>>
>  
> Use other value instead of 0xc9,
> 
>         #EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[17:16](EVEX.aa) == 0b01
>         #(illegal value).
>         .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx{%k1}
>         #EVEX from VEX bzhi %rax,(%rax,%rbx),%ecx EVEX.P[22:21](EVEX.L’L) == 0b01
>         #(illegal value).
>         .insn EVEX.L1.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx
>         #EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[23](EVEX.z) == 0b1
>         #(illegal value).
>         .insn EVEX.L0.NP.0f38.W0 0xf5, %rax, (%rax,%rbx), %rcx {%k7}{z}
>         #EVEX from VEX bzhi %rax,(%rax,%rbx),%rcx EVEX.P[20](EVEX.b) == 0b1
>         #(illegal value).
>         .insn EVEX.L0.NP.0f38.W0 0xf5, %rax ,(%rax,%rbx){1to8}, %rcx       

Hmm, yes, these are memory operands now. I didn't check what ModR/M bytes these
specifically encode to, but with the .byte gone I expect things are better now.

Btw, readability of these would greatly improve if between each .insn and the
following comment there was a blank line. That way what belongs together and
what is separate can be spotted at the first glance.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 7/9] Support APX Push2/Pop2
  2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
@ 2023-12-11 11:17   ` Jan Beulich
  2023-12-15  8:38     ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-11 11:17 UTC (permalink / raw)
  To: Cui, Lili, Mo, Zewei; +Cc: hongjiu.lu, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -248,6 +248,7 @@ enum i386_error
>      invalid_vector_register_set,
>      invalid_tmm_register_set,
>      invalid_dest_and_src_register_set,
> +    invalid_src_register_set,
>      invalid_pseudo_prefix,
>      unsupported_vector_index_register,
>      unsupported_broadcast,
> @@ -256,6 +257,7 @@ enum i386_error
>      mask_not_on_destination,
>      no_default_mask,
>      unsupported_rc_sae,
> +    unsupported_rsp_register,
>      invalid_register_operand,
>      internal_error,
>    };
> @@ -5398,6 +5400,9 @@ md_assemble (char *line)
>  	case invalid_dest_and_src_register_set:
>  	  err_msg = _("destination and source registers must be distinct");
>  	  break;
> +	case invalid_src_register_set:

Did you mean invalid_dest_register_set and ...

> +	  err_msg = _("two source registers must be distinct");

... "two destination ..."? This is for POP2, after all, which has no source
register at all.

> @@ -5422,6 +5427,9 @@ md_assemble (char *line)
>  	case unsupported_rc_sae:
>  	  err_msg = _("unsupported static rounding/sae");
>  	  break;
> +	case unsupported_rsp_register:
> +	  err_msg = _("cannot be used with %rsp register");
> +	  break;

While this wording looks okay as visible here, please consider it in the
context it is used in: "cannot be used with %rsp register for `push2'"
is, I'm sorry to say that, clumsy at best. If you want to stick to setting
err_msg, how about "%rsp register cannot be used"? Personally I'd prefer a
resulting output of "%rsp register cannot be used with `push2'", but I
wouldn't insist on you going that route if you don't like that.

> @@ -7113,6 +7121,33 @@ check_EgprOperands (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if APX operands are valid for the instruction.  */
> +static int

Please can functions returning boolean indicators have a return type of
"bool" (and perhaps use "true" as the success indicator, not "false")?

> +check_APX_operands (const insn_template *t)
> +{
> +  /* Push2* and Pop2* cannot use RSP and Pop2* cannot pop two same registers.
> +   */
> +  if (t->mnem_off == MN_push2 || t->mnem_off == MN_push2p
> +      || t->mnem_off == MN_pop2 || t->mnem_off == MN_pop2p)

Considering (perhaps just theoretical) further additions here, did you
consider using switch()? Even without further additions this would imo
be more legible (due to there being slightly less redundancy).

> --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> @@ -28,3 +28,9 @@ _start:
>  	.byte 0xff
>  	#{evex} inc %rax %rbx EVEX.vvvv' != 1111 && EVEX.ND = 0.
>  	.insn EVEX.L0.NP.M4.W1 0xff, %rax, %rbx
> +	.byte 0xff
> +	# pop2 %rax, %rbx set EVEX.ND=0.
> +	.byte 0x62,0xf4,0x64,0x08,0x8f,0xc0
> +	.byte 0xff, 0xff, 0xff
> +	# pop2 %rax set EVEX.vvvv' = 1111.

Another instance of the unclear EVEX.vvvv' (i.e. the questionable nature
if ' here). Yet then - what is the test below checking? EVEX.vvvv encodes
one of the two operands, so all values are valid? Isn't this about both
operands being the same? That would better be said then explicitly, e.g.
simply

	# pop2 %rax, %rax (twice same destination)

> +	.byte 0x62,0xf4,0x7c,0x18,0x8f,0xc0

Also again both new tests use .byte instead of .insn: Is there a particular
reason? Here are a couple of examples that I have readily available (Intel
syntax again, ftaod):

	.insn EVEX.L0.M4.W0 0x8f/0, r8, rax{sae}	; pop2 r8, rax
	.insn EVEX.L0.M4.W0 0x8f/0, xmm16, rax{sae}	; pop2 r16, rax
	.insn EVEX.L0.M4.W0 0x8f/0, rax, r8{sae}	; pop2 rax, r8
	.insn EVEX.L0.M12.W0 0x8f/0, rax, rax{sae}	; pop2 rax, r16
	.insn EVEX.L0.M4.W1 0x8f/0, rax, rcx{sae}	; pop2.x rax, rcx

I'm sure you can derive from them what you're actually after.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
> @@ -0,0 +1,39 @@
> +# Check 64bit APX-Push2Pop2 instructions
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +	push2 %rbx, %rax
> +	push2 %r17, %r8
> +	push2 %r9, %r31
> +	push2 %r31, %r24
> +	push2p %rbx, %rax
> +	push2p %r17, %r8
> +	push2p %r9, %r31
> +	push2p %r31, %r24
> +	pop2 %rax, %rbx
> +	pop2 %r8, %r17
> +	pop2 %r31, %r9
> +	pop2 %r24, %r31
> +	pop2p %rax, %rbx
> +	pop2p %r8, %r17
> +	pop2p %r31, %r9
> +	pop2p %r24, %r31
> +
> +.intel_syntax noprefix

Nit: Un-indented directive again.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -105,6 +105,7 @@ static bool FXSAVE_Fixup (instr_info *, int, int);
>  static bool MOVSXD_Fixup (instr_info *, int, int);
>  static bool DistinctDest_Fixup (instr_info *, int, int);
>  static bool PREFETCHI_Fixup (instr_info *, int, int);
> +static bool PUSH2_POP2_Fixup (instr_info *, int, int);
>  
>  static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
>  						enum disassembler_style,
> @@ -225,6 +226,9 @@ struct instr_info
>    }
>    vex;
>  
> +/* For APX EVEX-promoted prefix, EVEX.ND shares the same bit as vex.b.  */
> +#define nd b

Can this be moved ahead to patch 4, such that it can be used there (instead
of vex.b) as well? IOW ...

> @@ -9125,7 +9133,7 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  
>        /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
>  	 all bits of EVEX.vvvv and EVEX.V' must be 1.  */
> -      if (ins->evex_type == evex_from_legacy && !ins->vex.b
> +      if (ins->evex_type == evex_from_legacy && !ins->vex.nd
>  	  && (ins->vex.register_specifier || !ins->vex.v))
>  	return &bad_opcode;

... neither this nor ...

> @@ -13388,11 +13396,10 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>    if (!ins->need_vex)
>      return true;
>  
> -  /* Here vex.b is treated as "EVEX.ND".  */
>    if (ins->evex_type == evex_from_legacy)
>      {
>        ins->evex_used |= EVEX_b_used;
> -      if (!ins->vex.b)
> +      if (!ins->vex.nd)
>  	return true;
>      }

... this should require touching here.

> @@ -13884,3 +13894,26 @@ PREFETCHI_Fixup (instr_info *ins, int bytemode, int sizeflag)
>  
>    return OP_M (ins, bytemode, sizeflag);
>  }
> +
> +static bool
> +PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag)
> +{
> +  if (ins->modrm.mod != 3 || !ins->vex.b)

Did you mean vex.nd? Plus, considering the vex.nd check further down, why
is this checked both here and there?

> +    return true;

Doesn't this result in silently bogus/wrong output? Shouldn't you print
"(bad)" like you do further down? At which point it may make sense to
simply fold both if()s?

> --- a/opcodes/i386-opc.h
> +++ b/opcodes/i386-opc.h
> @@ -807,6 +807,7 @@ typedef struct i386_opcode_modifier
>    unsigned int isa64:2;
>    unsigned int noegpr:1;
>    unsigned int nf:1;
> +  unsigned int push2pop2:1;
>  } i386_opcode_modifier;

Still a new modifier despite my earlier request to avoid adding one when
you easily can? Here OperandConstraint is actually fully applicable to
use, as what you want to enforce is a constraint on operands.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
  2023-12-07 12:38   ` Jan Beulich
  2023-12-07 13:34   ` Jan Beulich
@ 2023-12-11 11:50   ` Jan Beulich
  2 siblings, 0 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-11 11:50 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> This patch adds non-ND, non-NF forms of EVEX promotion insn.
> 
> EVEX extension of legacy instructions:
>   All promoted legacy instructions are placed in EVEX map 4, which is
>   currently reserved.
> EVEX extension of EVEX instructions:
>   All existing EVEX instructions are extended by APX using the extended
>   EVEX prefix, so that they can access all 32 GPRs.
> EVEX extension of VEX instructions:
>   Promoting a VEX instruction into the EVEX space does not change the map
>   id, the opcode, or the operand encoding of the VEX instruction.
> 
> Note: The promoted versions of MOVBE will be extended to include the “MOVBE
>   reg1, reg2”.
> 
>   gas/ChangeLog:
> 
>   2023-11-21  Lingling Kong <lingling.kong@intel.com>
> 	      H.J. Lu  <hongjiu.lu@intel.com>
> 	      Lili Cui <lili.cui@intel.com>
> 	      Lin Hu   <lin1.hu@intel.com>
> 
> 	* config/tc-i386.c (cpu_flags_not_or_check): Add a new
> 	function for APX cpu flag checking.
> 	(cpu_flags_match): handle cpu_flags_not_or_check.
> 	(install_template): Add AMX_TILE and APX combine.
> 	(is_any_apx_evex_encoding): Test apx evex encoding.
> 	(build_apx_evex_prefix): Enabe APX evex prefix.
> 	(md_assemble): Handle apx with evex encoding.
> 	(check_EgprOperands): Add nodgpr check for apx.

Btw - these mechanical ChangeLog entries also need keeping up to date. Afaics
check_EgprOperands() isn't touched here (anymore?) at all, as I merely happened
to notice by searching for where the function first appears.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
@ 2023-12-11 12:27   ` Jan Beulich
  2023-12-12  3:18     ` Hu, Lin1
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-11 12:27 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, Hu, Lin1, binutils

On 24.11.2023 08:02, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if the instruction use the REX registers.  */
> +static bool
> +check_RexOperands ()
> +{
> +  for (unsigned int op = 0; op < i.operands; op++)
> +    {
> +      if (i.types[op].bitfield.class != Reg)
> +	continue;
> +
> +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
> +	return true;
> +    }
> +
> +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
> +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
> +    return true;
> +
> +  /* Check pseudo prefix {rex} are valid.  */
> +  return i.rex_encoding;

Can this actually happen, when we're converting from EVEX to legacy?
(Initially I wanted to ask about "rex" and alike prefixes, i.e. the non-
pseudo ones.)

> +}
> +
> +/* Optimize APX NDD insns to legacy insns.  */
> +static unsigned int
> +can_convert_NDD_to_legacy (const insn_template *t)
> +{
> +  unsigned int match_dest_op = ~0;
> +
> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST

No new callers are expected to appear (any time soon) and the sole caller
has checked this already.

Also with this check, ...

> +      && t->opcode_space == SPACE_EVEXMAP4

... what (further) effect is this one intended to have?

> +      && !i.has_nf
> +      && i.reg_operands >= 2)
> +    {
> +      unsigned int dest = i.operands - 1;
> +      unsigned int src1 = i.operands - 2;
> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> +
> +      if (i.types[src1].bitfield.class == Reg
> +	  && i.op[src1].regs == i.op[dest].regs)
> +	match_dest_op = src1;
> +      /* If the first operand is the same as the third operand,
> +	 these instructions need to support the ability to commutative
> +	 the first two operands and still not change the semantics in order
> +	 to be optimized.  */
> +      else if (i.types[src2].bitfield.class == Reg
> +	       && i.op[src2].regs == i.op[dest].regs
> +	       && optimize > 1
> +	       && t->opcode_modifier.commutative)

Based on the "cheap conditions first" principle and to also be better in
line with the comment, may I suggest

+      else if (optimize > 1
+	       && t->opcode_modifier.commutative
+	       && i.types[src2].bitfield.class == Reg
+	       && i.op[src2].regs == i.op[dest].regs)

?

> +	match_dest_op = src2;
> +    }
> +  return match_dest_op;
> +}
> +
>  /* Helper function for the progress() macro in match_template().  */
>  static INLINE enum i386_error progress (enum i386_error new,
>  					enum i386_error last,
> @@ -7675,6 +7727,61 @@ match_template (char mnem_suffix)
>  	  i.memshift = memshift;
>  	}
>  
> +      /* If we can optimize a NDD insn to legacy insn, like
> +	 add %r16, %r8, %r8 -> add %r16, %r8,
> +	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> +	 Note that the semantics have not been changed.  */
> +      if (optimize
> +	  && !i.no_optimize
> +	  && i.vec_encoding != vex_encoding_evex
> +	  && t + 1 < current_templates->end
> +	  && !t[1].opcode_modifier.evex
> +	  && t[1].opcode_space <= SPACE_0F38
> +	  && t->opcode_modifier.vexvvvv == VexVVVV_DST)
> +	{
> +	  unsigned int match_dest_op = can_convert_NDD_to_legacy (t);
> +	  size_match = true;

This would perhaps better ...

> +	  if (match_dest_op != (unsigned int) ~0)
> +	    {

... live here

> +	      /* We ensure that the next template has the same input
> +		 operands as the original matching template by the first
> +		 opernd (ATT), thus avoiding the error caused by the wrong order
> +		 of insns in i386.tbl.  */

I'm sorry, but I (still) can't make sense of this last part of the comment,
after the comma.

> +	      overlap0 = operand_type_and (i.types[0],
> +					   t[1].operand_types[0]);
> +	      if (t->opcode_modifier.d)
> +		overlap1 = operand_type_and (i.types[0],
> +					     t[1].operand_types[1]);
> +	      if (!operand_type_match (overlap0, i.types[0])
> +		  && (!t->opcode_modifier.d
> +		      || (t->opcode_modifier.d
> +			  && !operand_type_match (overlap1, i.types[0]))))

What's wrong with the simpler

		  && (!t->opcode_modifier.d
		      || !operand_type_match (overlap1, i.types[0])))

?

> +		size_match = false;

Yet still, and despite the improved comment, I don't really see what all of
this is about. What cases would be mis-handled if this wasn't there?

> +	      if (size_match
> +		  /* Optimizing some non-legacy-map0/1 without REX/REX2 prefix will be valuable.  */
> +		  && (t[1].opcode_space <= SPACE_0F

Where a comment is placed is meaningful to understanding what it's about. The
wayy you have it, is says "non-legacy-map0/1" on a check that the (next)
encoding is map0 or map1. I think this wants moving down by a line, and even
then also re-wording: If I didn't (vaguely) recall context, I don't think I
could derive what is meant. Iirc this is about legacy encodings being one
byte shorter for certain 0f38 space insns when they don't require a REX
prefix to encode. How about something like "Some non-legacy-map0/1 insns can
be shorter when legacy-encoded and when no REX prefix is required"?

> +		      || (!check_EgprOperands (t + 1)
> +			  && !check_RexOperands ()

I'm not going to insist that you adjust this, but these two calls side by
side demonstrate a curious inconsistency: The former requires t to be passed
in. If you keep it like that, I may change this down the road, the more that
the t-related aspect isn't relevant here at all (and could hence be moved
out of the function to the single place where it is needed).

> +			  && !i.op[i.operands - 1].regs->reg_type.bitfield.qword)))
> +		{
> +		  unsigned int src1 = i.operands - 2;
> +		  unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> +
> +		  if (match_dest_op == src2)
> +		    swap_2_operands (match_dest_op, src1);

Isn't it wrong (albeit benign) to swap when i.operands == 2? IOW wouldn't

		  if (i.reg_operands > 2 && match_dest_op == i.operands - 3)
		    swap_2_operands (match_dest_op, i.operands - 2);

be more in line with what's actually wanted?

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
> @@ -0,0 +1,123 @@
> +# Check 64bit APX NDD instructions with optimized encoding
> +
> +	.text
> +_start:
> +add    %r31,%r8,%r8
> +addb   %r31b,%r8b,%r8b
> +{store} add    %r31,%r8,%r8
> +{load}  add    %r31,%r8,%r8
> +add    %r31,(%r8),%r31
> +add    (%r31),%r8,%r8
> +add    $0x12344433,%r15,%r15
> +add    $0xfffffffff4332211,%r8,%r8
> +inc    %r31,%r31
> +incb   %r31b,%r31b
> +sub    %r15,%r17,%r17
> +subb   %r15b,%r17b,%r17b
> +sub    %r15,(%r8),%r15
> +sub    (%r15,%rax,1),%r16,%r16
> +sub    $0x1234,%r30,%r30
> +dec    %r17,%r17
> +decb   %r17b,%r17b
> +sbb    %r15,%r17,%r17
> +sbbb   %r15b,%r17b,%r17b
> +sbb    %r15,(%r8),%r15
> +sbb    (%r15,%rax,1),%r16,%r16
> +sbb    $0x1234,%r30,%r30
> +and    %r15,%r17,%r17
> +andb   %r15b,%r17b,%r17b
> +and    %r15,(%r8),%r15
> +and    (%r15,%rax,1),%r16,%r16
> +and    $0x1234,%r30,%r30
> +or     %r15,%r17,%r17
> +orb    %r15b,%r17b,%r17b
> +or     %r15,(%r8),%r15
> +or     (%r15,%rax,1),%r16,%r16
> +or     $0x1234,%r30,%r30
> +xor    %r15,%r17,%r17
> +xorb   %r15b,%r17b,%r17b
> +xor    %r15,(%r8),%r15
> +xor    (%r15,%rax,1),%r16,%r16
> +xor    $0x1234,%r30,%r30
> +adc    %r15,%r17,%r17
> +adcb   %r15b,%r17b,%r17b
> +adc    %r15,(%r8),%r15
> +adc    (%r15,%rax,1),%r16,%r16
> +adc    $0x1234,%r30,%r30
> +neg    %r17,%r17
> +negb   %r17b,%r17b
> +not    %r17,%r17
> +notb   %r17b,%r17b
> +imul   0x90909(%eax),%edx,%edx
> +imul   0x909(%rax,%r31,8),%rdx,%rdx
> +imul   %rdx,%rax,%rdx
> +rol    %r31,%r31
> +rolb   %r31b,%r31b
> +rol    $0x2,%r12,%r12
> +rolb   $0x2,%r12b,%r12b
> +ror    %r31,%r31
> +rorb   %r31b,%r31b
> +ror    $0x2,%r12,%r12
> +rorb   $0x2,%r12b,%r12b
> +rcl    %r31,%r31
> +rclb   %r31b,%r31b
> +rcl    $0x2,%r12,%r12
> +rclb   $0x2,%r12b,%r12b
> +rcr    %r31,%r31
> +rcrb   %r31b,%r31b
> +rcr    $0x2,%r12,%r12
> +rcrb   $0x2,%r12b,%r12b
> +sal    %r31,%r31
> +salb   %r31b,%r31b
> +sal    $0x2,%r12,%r12
> +salb   $0x2,%r12b,%r12b
> +shl    %r31,%r31
> +shlb   %r31b,%r31b
> +shl    $0x2,%r12,%r12
> +shlb   $0x2,%r12b,%r12b
> +shr    %r31,%r31
> +shrb   %r31b,%r31b
> +shr    $0x2,%r12,%r12
> +shrb   $0x2,%r12b,%r12b
> +sar    %r31,%r31
> +sarb   %r31b,%r31b
> +sar    $0x2,%r12,%r12
> +sarb   $0x2,%r12b,%r12b
> +shld   $0x1,%r12,(%rax),%r12
> +shld   $0x2,%r8,%r12,%r12
> +shld   $0x2,%r8,%r12,%r8
> +shld   %cl,%r9,(%rax),%r9
> +shld   %cl,%r12,%r16,%r16
> +shld   %cl,%r12,%r16,%r12
> +shrd   $0x1,%r12,(%rax),%r12
> +shrd   $0x1,%r13,%r12,%r12
> +shrd   $0x1,%r13,%r12,%r13
> +shrd   %cl,%r9,(%rax),%r9
> +shrd   %cl,%r12,%r16,%r16
> +shrd   %cl,%r12,%r16,%r12
> +cmovo  0x90909090(%eax),%edx,%edx
> +cmovno 0x90909090(%eax),%edx,%edx
> +cmovb  0x90909090(%eax),%edx,%edx
> +cmovae 0x90909090(%eax),%edx,%edx
> +cmove  0x90909090(%eax),%edx,%edx
> +cmovne 0x90909090(%eax),%edx,%edx
> +cmovbe 0x90909090(%eax),%edx,%edx
> +cmova  0x90909090(%eax),%edx,%edx
> +cmovs  0x90909090(%eax),%edx,%edx
> +cmovns 0x90909090(%eax),%edx,%edx
> +cmovp  0x90909090(%eax),%edx,%edx
> +cmovnp 0x90909090(%eax),%edx,%edx
> +cmovl  0x90909090(%eax),%edx,%edx
> +cmovge 0x90909090(%eax),%edx,%edx
> +cmovle 0x90909090(%eax),%edx,%edx
> +cmovg  0x90909090(%eax),%edx,%edx
> +adcx   %ebx,%eax,%eax
> +adcx   %eax,%ebx,%eax
> +adcx   %rbx,%rax,%rax
> +adcx   %r15,%r8,%r8

Might this better be

adcx   %r15d,%r8d,%r8d

to avoid having two exclusion criteria (REX register use and REX.W set)?
Or maybe even split to further separate source and destination:

adcx   %eax,%r8d,%r8d
adcx   %r15d,%eax,%eax

?

> +adcx   (%edx,%ecx,1),%eax,%eax
> +adox   %ebx,%eax,%eax
> +adox   %eax,%ebx,%eax
> +adox   %rbx,%rax,%rax
> +adox   %r15,%r8,%r8
> +adox   (%edx,%ecx,1),%eax,%eax

Same here then.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 6/9] Support APX NDD
  2023-12-08 14:12   ` Jan Beulich
@ 2023-12-11 13:36     ` Cui, Lili
  2023-12-11 16:50       ` Jan Beulich
  2024-03-22 10:02     ` Jan Beulich
  1 sibling, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-11 13:36 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, Kong, Lingling, binutils

> On 24.11.2023 08:02, Cui, Lili wrote:
> > @@ -8870,25 +8890,33 @@ build_modrm_byte (void)
> >  				     || i.vec_encoding == vex_encoding_evex));
> >      }
> >
> > -  for (v = source + 1; v < dest; ++v)
> > -    if (v != reg_slot)
> > -      break;
> > -  if (v >= dest)
> > -    v = ~0;
> > -  if (i.tm.extension_opcode != None)
> > +  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
> >      {
> > -      if (dest != source)
> > -	v = dest;
> > -      dest = ~0;
> > +      v = dest;
> > +      dest-- ;
> 
> Nit: Stray blank.
> 

Done.

> >      }
> > -  gas_assert (source < dest);
> 
> Starting from this line, do you really need to move that into the "else"
> branch? It looks to me as it it could stay here. (Maybe I'm wrong with the
> assertion itself, but ...
> 
> > -  if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
> > -      && source != op)
> 
> ... this entire if() pretty surely can stay as is, as there are no templates with
> both DstVVVV and SwapSources afaict. (Thing is - as before - that it isn't easy
> to see that what is happening here is really just re-indentation. Iirc in an
> earlier version there actually were hidden changes.) If you want this moved as
> an optimization, please do so in a separate patch.
> 

Moved "i.tm.extension_opcode != None" and SWAP_SOURCES.

  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
    {
      v = dest;
      dest-- ;
    }
  else
    {
      for (v = source + 1; v < dest; ++v)
        if (v != reg_slot)
          break;
      if (v >= dest)
        v = ~0;
    }
  if (i.tm.extension_opcode != None)
    {
      if (dest != source)
        v = dest;
      dest = ~0;
    }
  gas_assert (source < dest);
  if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
      && source != op)
    {
      unsigned int tmp = source;

      source = v;
      v = tmp;
    }

> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d
> > @@ -27,4 +27,6 @@ Disassembly of section .text:
> >  [ 	]*[a-f0-9]+:[ 	]+c8 ff ff ff[ 	]+enter  \$0xffff,\$0xff
> >  [ 	]*[a-f0-9]+:[ 	]+67 62 f2 7c 18 f5[ 	]+addr32 \(bad\)
> >  [ 	]*[a-f0-9]+:[ 	]+0b ff[ 	]+or     %edi,%edi
> > +[ 	]*[a-f0-9]+:[ 	]+62 f4 fc 08 ff[ 	]+\(bad\)
> > +[ 	]*[a-f0-9]+:[ 	]+d8[ 	]+.byte 0xd8
> >  #pass
> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > @@ -26,3 +26,5 @@ _start:
> >  	#EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[20](EVEX.b) == 1 (illegal
> value).
> >  	.insn EVEX.L0.NP.0f38.W0 0xf5, %eax ,(%ebx){1to8}, %ecx
> >  	.byte 0xff
> > +	#{evex} inc %rax %rbx EVEX.vvvv' != 1111 && EVEX.ND = 0.
> > +	.insn EVEX.L0.NP.M4.W1 0xff, %rax, %rbx
> 
> I don't think this does what you want. In the .d file the 4 bits are all set. I think
> you mean something like
> 
> 	.insn EVEX.L0.NP.M4.W1 0xff/0, %rcx, %rbx
> 
> (i.e. ModR/M.reg specified as opcode extension _and_ the first operand not
> the accumulator). The reason disassembly fails for what you've used looks to
> be ModR/M.reg == 0b011 (resulting from the use of %rbx).
> 

Change it to SIB so don’t need to add 0xff.

.insn EVEX.L0.NP.M4.W1 0xff/0, (%rax,%rcx), %rbx

0000000000000000 <_start>:
   0:   62 f4 e4                (bad)
   3:   08 ff                      or     %bh,%bh
   5:   04 08                   add    $0x8,%al

> (Also, nit: What's EVEX.vvvv' ? I.e. what's the ' there about?)
> 

Oh, it should be EVEX.vvvv.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> > @@ -0,0 +1,155 @@
> > +# Check 64bit APX NDD instructions with evex prefix encoding
> > +
> > +	.allow_index_reg
> > +	.text
> > +_start:
> > +	adc    $0x1234,%ax,%r30w
> > +	adc    %r15b,%r17b,%r18b
> > +	adc    %r15d,(%r8),%r18d
> > +	adc    (%r15,%rax,1),%r16b,%r8b
> > +	adc    (%r15,%rax,1),%r16w,%r8w
> > +	adcl   $0x11,(%r19,%rax,4),%r20d
> > +	adcx   %r15d,%r8d,%r18d
> > +	adcx   (%r15,%r31,1),%r8
> > +	adcx   (%r15,%r31,1),%r8d,%r18d
> > +	add    $0x1234,%ax,%r30w
> > +	add    $0x12344433,%r15,%r16
> > +	add    $0x34,%r13b,%r17b
> > +	add    $0xfffffffff4332211,%rax,%r8
> > +	add    %r31,%r8,%r16
> > +	add    %r31,(%r8),%r16
> > +	add    %r31,(%r8,%r16,8),%r16
> > +	add    %r31b,%r8b,%r16b
> > +	add    %r31d,%r8d,%r16d
> > +	add    %r31w,%r8w,%r16w
> > +	add    (%r31),%r8,%r16
> > +	add    0x9090(%r31,%r16,1),%r8,%r16
> > +	addb    %r31b,%r8b,%r16b
> > +	addl    %r31d,%r8d,%r16d
> > +	addl   $0x11,(%r19,%rax,4),%r20d
> > +	addq    %r31,%r8,%r16
> > +	addq   $0x12344433,(%r15,%rcx,4),%r16
> > +	addw    %r31w,%r8w,%r16w
> > +	adox   %r15d,%r8d,%r18d
> 
> Nit: Inconsistent blank padding.
> 

Done.

> > +	{load}  add    %r31,%r8,%r16
> > +	{store} add    %r31,%r8,%r16
> > +	adox   (%r15,%r31,1),%r8
> > +	adox   (%r15,%r31,1),%r8d,%r18d
> > +	and    $0x1234,%ax,%r30w
> > +	and    %r15b,%r17b,%r18b
> > +	and    %r15d,(%r8),%r18d
> > +	and    (%r15,%rax,1),%r16b,%r8b
> > +	and    (%r15,%rax,1),%r16w,%r8w
> > +	andl   $0x11,(%r19,%rax,4),%r20d
> > +	cmova  0x90909090(%eax),%edx,%r8d
> > +	cmovae 0x90909090(%eax),%edx,%r8d
> > +	cmovb  0x90909090(%eax),%edx,%r8d
> > +	cmovbe 0x90909090(%eax),%edx,%r8d
> > +	cmove  0x90909090(%eax),%edx,%r8d
> > +	cmovg  0x90909090(%eax),%edx,%r8d
> > +	cmovge 0x90909090(%eax),%edx,%r8d
> > +	cmovl  0x90909090(%eax),%edx,%r8d
> > +	cmovle 0x90909090(%eax),%edx,%r8d
> > +	cmovne 0x90909090(%eax),%edx,%r8d
> > +	cmovno 0x90909090(%eax),%edx,%r8d
> > +	cmovnp 0x90909090(%eax),%edx,%r8d
> > +	cmovns 0x90909090(%eax),%edx,%r8d
> > +	cmovo  0x90909090(%eax),%edx,%r8d
> > +	cmovp  0x90909090(%eax),%edx,%r8d
> > +	cmovs  0x90909090(%eax),%edx,%r8d
> > +	dec    %rax,%r17
> > +	decb   (%r31,%r12,1),%r8b
> > +	imul   0x909(%rax,%r31,8),%rdx,%r25
> > +	imul   0x90909(%eax),%edx,%r8d
> > +	inc    %r31,%r16
> > +	inc    %r31,%r8
> > +	inc    %rax,%rbx
> > +	neg    %rax,%r17
> > +	negb   (%r31,%r12,1),%r8b
> > +	not    %rax,%r17
> > +	notb   (%r31,%r12,1),%r8b
> > +	or     $0x1234,%ax,%r30w
> > +	or     %r15b,%r17b,%r18b
> > +	or     %r15d,(%r8),%r18d
> > +	or     (%r15,%rax,1),%r16b,%r8b
> > +	or     (%r15,%rax,1),%r16w,%r8w
> > +	orl    $0x11,(%r19,%rax,4),%r20d
> > +	rcl    $0x2,%r12b,%r31b
> > +	rcl    %cl,%r16b,%r8b
> > +	rclb   $0x1, (%rax),%r31b
> > +	rcll   $0x2,(%rax),%r31d
> > +	rclw   $0x1, (%rax),%r31w
> 
> Nit: Would be nice if there consistently were or were not blanks after the
> commas.
> 

Done.

> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -139,9 +139,13 @@
> >  #define Vsz256 Vsz=VSZ256
> >  #define Vsz512 Vsz=VSZ512
> >
> > +#define DstVVVV VexVVVV=VexVVVV_DST
> > +
> >  // The EVEX purpose of StaticRounding appears only together with SAE.
> > Re-use  // the bit to mark commutative VEX encodings where swapping
> > the source  // operands may allow to switch from 3-byte to 2-byte VEX
> encoding.
> > +// And re-use the bit to mark some NDD insns that swapping the source
> > +operands // may allow to switch from EVEX encoding to REX2 encoding.
> >  #define C StaticRounding
> >
> >  #define FP 387|287|8087
> > @@ -288,26 +292,40 @@ std, 0xfd, 0, NoSuf, {}  sti, 0xfb, 0, NoSuf, {}
> >
> >  // Arithmetic.
> > +add, 0x0, APX_F,
> >
> +D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|N
> F, {
> > +Reg8|Reg16|Reg32|Reg64,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64 }
> 
> There is _still_ Byte|Word|Dword|Qword in here (and below), when I think I
> pointed out more than once before that in new templates such redundancy
> wants omitting.
> 
> Since this isn't the first instance of earlier review comments not taken care of,
> may I please ask that you make reasonably sure that new versions aren't sent
> out like this?
> 

This part could indeed be omitted, but I really don't remember you mentioning it on the APX patches. There are still a lot of redundant Byte|Word|Dword|Qword in the opcode table, APX just added some flags on top of the old ones. Do you mind if I create a patch first to remove the redundant parts of master?

> >  add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, {
> > Reg8|Reg16|Reg32|Reg64,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> > +add, 0x83/0, APX_F,
> >
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|
> NF, {
> > +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> > +Reg16|Reg32|Reg64 }
> >  add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> > Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  add,
> 0x4,
> > 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> Acc|Byte|Word|Dword|Qword }
> > +add, 0x80/0, APX_F,
> > +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64}
> >  add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> > Imm8|Imm16|Imm32|Imm32S,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> >
> >  inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
> > +inc, 0xfe/0, APX_F,
> > +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF,
> >
> +{Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64}
> >  inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> >
> > +sub, 0x28, APX_F,
> >
> +D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Opti
> mize|
> > +NF, { Reg8|Reg16|Reg32|Reg64,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64, }
> 
> Here and elsewhere, what's Optimize for? It not being there on other
> templates, it can't be for the EVEX->REX2 optimization? If there are further
> optimization plans, that's (again) something to mention in the description. Yet
> better would be if such attributes were added only when respective
> optimizations are actually introduced. Unlike e.g. NF, which would mean
> another bulk update if not added right away, new optimizations typically affect
> only a few templates at a time.
> 

Optimize is not new.

sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Optimize|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }

> >  sub, 0x28, 0,
> > D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
> > Reg8|Reg16|Reg32|Reg64,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> > +sub, 0x83/5, APX_F,
> > +Modrm|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S,
> > +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> > +Reg16|Reg32|Reg64 }
> >  sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> > Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  sub,
> 0x2c,
> > 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> Acc|Byte|Word|Dword|Qword }
> > +sub, 0x80/5, APX_F,
> > +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64 }
> >  sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> > Imm8|Imm16|Imm32|Imm32S,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> 
> There are still only 3 new templates here (and also above for add, plus for
> other similar insns), when ...
> 
> >  dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
> > +dec, 0xfe/1, APX_F,
> > +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64 }
> >  dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> >
> > +sbb, 0x18, APX_F,
> > +D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, {
> > +Reg8|Reg16|Reg32|Reg64,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64 }
> >  sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, {
> > Reg8|Reg16|Reg32|Reg64,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> > +sbb, 0x18, APX_F,
> > +D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, {
> > +Reg8|Reg16|Reg32|Reg64,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x }
> > +sbb, 0x83/3, APX_F,
> >
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4,
> {
> > +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> > +Reg16|Reg32|Reg64 }
> >  sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> > Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> > +sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf,
> { Imm8S,
> > +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> >  sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> > Acc|Byte|Word|Dword|Qword }
> > +sbb, 0x80/3, APX_F,
> > +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64 }
> >  sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> > Imm8|Imm16|Imm32|Imm32S,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> > +sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x }
> 
> ... there are 6 new templates here. This is again an aspect I had pointed out
> before. You cannot defer the addition of the other 3 until the NF patch, as you
> want to make sure that with just this patch in place something both
> 
>     {evex} sbb %eax, %eax
> 
> and
> 
>     {evex} sub %eax, %eax
> 
> actually assemble, and to EVEX encodings. I can't see how that would work in
> the latter case without those further templates.
> 
> The alternative is to also defer adding the 2-operand SBB templates (and any
> others you add here which don't use DstVVVV).
> 

I'm having a headache with this, some instructions like sbb don't support NF, originally they were in the 4/9 patch, but their disassemblers are in the NDD patch, and you agreed to put them in the NDD patch. Now I really don't know where to move. Moving encoding, decoding, and especially test cases for instructions between patches is cumbersome and I really don't think it makes much sense.

> >  xor, 0x30, 0,
> > D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
> > Reg8|Reg16|Reg32|Reg64,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> > +xor, 0x83/6, APX_F,
> >
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|
> NF, {
> > +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> > +Reg16|Reg32|Reg64 }
> >  xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> > Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  xor,
> 0x34,
> > 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> Acc|Byte|Word|Dword|Qword }
> > +xor, 0x80/6, APX_F,
> > +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
> > +Imm8|Imm16|Imm32|Imm32S,
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64 }
> >  xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> > Imm8|Imm16|Imm32|Imm32S,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> >
> >  // clr with 1 operand is really xor with 2 operands.
> >  clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, {
> > Reg8|Reg16|Reg32|Reg64 }
> 
> Btw., for consistency this may also want accompanying with an EVEX
> counterpart.
> 

Do you mean to add an entry like this? It should belong to the previous patch.

// clr with 1 operand is really xor with 2 operands.
clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }
clr, 0x30, APX_F, W|Modrm|No_sSuf|RegKludge|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64 }

> >  mul, 0xf6/4, 0, W|Modrm|No_sSuf, {
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> > imul, 0xf6/5, 0, W|Modrm|No_sSuf, {
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> > +imul, 0xaf, APX_F,
> >
> +C|Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap
> 4, {
> > +Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex,
> > +Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 }
> 
> Missing NF?
> 

Oh, when I rebase the NF patch, I found this missing and fixed it.

> >  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> > +rol, 0xd0/0, APX_F,
> > +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> x,
> > +Reg8|Reg16|Reg32|Reg64 }
> 
> Didn't we agree to avoid adding this (and its sibling) template, for the omitted
> shift count being ambiguous? Consider
> 
>     rol %cl, %al
> 
> Is this a rotate by %cl, or a 1-bit NDD rotate?
> 

These entries should be deleted.

rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2023-12-11 13:36     ` Cui, Lili
@ 2023-12-11 16:50       ` Jan Beulich
  2023-12-13 10:42         ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-11 16:50 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, Kong, Lingling, binutils

On 11.12.2023 14:36, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> --- a/opcodes/i386-opc.tbl
>>> +++ b/opcodes/i386-opc.tbl
>>> @@ -139,9 +139,13 @@
>>>  #define Vsz256 Vsz=VSZ256
>>>  #define Vsz512 Vsz=VSZ512
>>>
>>> +#define DstVVVV VexVVVV=VexVVVV_DST
>>> +
>>>  // The EVEX purpose of StaticRounding appears only together with SAE.
>>> Re-use  // the bit to mark commutative VEX encodings where swapping
>>> the source  // operands may allow to switch from 3-byte to 2-byte VEX
>> encoding.
>>> +// And re-use the bit to mark some NDD insns that swapping the source
>>> +operands // may allow to switch from EVEX encoding to REX2 encoding.
>>>  #define C StaticRounding
>>>
>>>  #define FP 387|287|8087
>>> @@ -288,26 +292,40 @@ std, 0xfd, 0, NoSuf, {}  sti, 0xfb, 0, NoSuf, {}
>>>
>>>  // Arithmetic.
>>> +add, 0x0, APX_F,
>>>
>> +D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|N
>> F, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>
>> There is _still_ Byte|Word|Dword|Qword in here (and below), when I think I
>> pointed out more than once before that in new templates such redundancy
>> wants omitting.
>>
>> Since this isn't the first instance of earlier review comments not taken care of,
>> may I please ask that you make reasonably sure that new versions aren't sent
>> out like this?
>>
> 
> This part could indeed be omitted, but I really don't remember you mentioning it on the APX patches.

Already in e.g.
https://sourceware.org/pipermail/binutils/2023-November/130422.html
I pointed out that such earlier comments in e.g.
https://sourceware.org/pipermail/binutils/2023-September/129590.html
were not addressed.

> There are still a lot of redundant Byte|Word|Dword|Qword in the opcode table, APX just added some flags on top of the old ones. Do you mind if I create a patch first to remove the redundant parts of master?

I don't mind you cleaning up first. It's just that normally I wouldn't do
so in a separate patch (one of the reasons being that such non-functional
changes get in the way of using "git blame" or alike when trying to find
the most recent real change to a line), unless it was only a handful of
instances left. Instead I typically do such tidying as lines are touched
anyway. Thing here simply is that new templates shouldn't have such
anomalies anymore.

>>>  add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, {
>>> Reg8|Reg16|Reg32|Reg64,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +add, 0x83/0, APX_F,
>>>
>> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|
>> NF, {
>>> +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 }
>>>  add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  add,
>> 0x4,
>>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>> Acc|Byte|Word|Dword|Qword }
>>> +add, 0x80/0, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64}
>>>  add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>>
>>>  inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
>>> +inc, 0xfe/0, APX_F,
>>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF,
>>>
>> +{Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64}
>>>  inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>>
>>> +sub, 0x28, APX_F,
>>>
>> +D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Opti
>> mize|
>>> +NF, { Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64, }
>>
>> Here and elsewhere, what's Optimize for? It not being there on other
>> templates, it can't be for the EVEX->REX2 optimization? If there are further
>> optimization plans, that's (again) something to mention in the description. Yet
>> better would be if such attributes were added only when respective
>> optimizations are actually introduced. Unlike e.g. NF, which would mean
>> another bulk update if not added right away, new optimizations typically affect
>> only a few templates at a time.
>>
> 
> Optimize is not new.
> 
> sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Optimize|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
> sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }

Optimize is legitimately there for the legacy template. If the new template
also wants it, there needs to be some reason. Otherwise it is part of the
tranformation to APX/EVEX to drop it.

>>>  sub, 0x28, 0,
>>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
>>> Reg8|Reg16|Reg32|Reg64,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +sub, 0x83/5, APX_F,
>>> +Modrm|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S,
>>> +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 }
>>>  sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  sub,
>> 0x2c,
>>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>> Acc|Byte|Word|Dword|Qword }
>>> +sub, 0x80/5, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>
>> There are still only 3 new templates here (and also above for add, plus for
>> other similar insns), when ...
>>
>>>  dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
>>> +dec, 0xfe/1, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>>
>>> +sbb, 0x18, APX_F,
>>> +D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, {
>>> Reg8|Reg16|Reg32|Reg64,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +sbb, 0x18, APX_F,
>>> +D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x }
>>> +sbb, 0x83/3, APX_F,
>>>
>> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4,
>> {
>>> +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 }
>>>  sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>>> +sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf,
>> { Imm8S,
>>> +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>>>  sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>>> Acc|Byte|Word|Dword|Qword }
>>> +sbb, 0x80/3, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x }
>>
>> ... there are 6 new templates here. This is again an aspect I had pointed out
>> before. You cannot defer the addition of the other 3 until the NF patch, as you
>> want to make sure that with just this patch in place something both
>>
>>     {evex} sbb %eax, %eax
>>
>> and
>>
>>     {evex} sub %eax, %eax
>>
>> actually assemble, and to EVEX encodings. I can't see how that would work in
>> the latter case without those further templates.
>>
>> The alternative is to also defer adding the 2-operand SBB templates (and any
>> others you add here which don't use DstVVVV).
>>
> 
> I'm having a headache with this, some instructions like sbb don't support NF, originally they were in the 4/9 patch, but their disassemblers are in the NDD patch, and you agreed to put them in the NDD patch.

Right, yet still the overall result wants to be consistent. Hence why I'm
not demanding that you move these templates yet later (which is one
option). Instead I've indicated that moving the others ahead would also
be okay.

Like with any series, you want it to be in a shape where it can be committed
piecemeal. Which is even more important with a release around the corner.
If we end up with just partial APX support in 2.42, that partial support
should be in a shape that's predictable to users.

> Now I really don't know where to move. Moving encoding, decoding, and especially test cases for instructions between patches is cumbersome and I really don't think it makes much sense.

I can see your point, and I'm sorry for the hassle. Part of the problem of
the moving being troublesome is (imo) that many of the patches simply were
(are) doing too many things at a time anyway.

>>>  xor, 0x30, 0,
>>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
>>> Reg8|Reg16|Reg32|Reg64,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +xor, 0x83/6, APX_F,
>>>
>> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|
>> NF, {
>>> +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 }
>>>  xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  xor,
>> 0x34,
>>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>> Acc|Byte|Word|Dword|Qword }
>>> +xor, 0x80/6, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>>
>>>  // clr with 1 operand is really xor with 2 operands.
>>>  clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, {
>>> Reg8|Reg16|Reg32|Reg64 }
>>
>> Btw., for consistency this may also want accompanying with an EVEX
>> counterpart.
>>
> 
> Do you mean to add an entry like this? It should belong to the previous patch.
> 
> // clr with 1 operand is really xor with 2 operands.
> clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }
> clr, 0x30, APX_F, W|Modrm|No_sSuf|RegKludge|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64 }

Yes, something like this. And possibly indeed not the patch here; the
template simply happened to be in context. Where exactly it wants to
go depends - see above - on where other similar templates are
introduced. Note however that the corresponding XOR templates are
introduced here, just above and still in context.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
                   ` (8 preceding siblings ...)
  2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
@ 2023-12-12  2:57 ` Lu, Hongjiu
  2023-12-12  8:16 ` Cui, Lili
  10 siblings, 0 replies; 69+ messages in thread
From: Lu, Hongjiu @ 2023-12-12  2:57 UTC (permalink / raw)
  To: Cui, Lili, binutils; +Cc: Beulich, Jan

> -----Original Message-----
> From: Cui, Lili <lili.cui@intel.com>
> Sent: Thursday, November 23, 2023 11:02 PM
> To: binutils@sourceware.org
> Cc: Beulich, Jan <JBeulich@suse.com>; Lu, Hongjiu <hongjiu.lu@intel.com>
> Subject: [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax
> 
> Make const_1_mode print $1 in AT&T syntax, otherwise
> there will be correctness issues when it is extended
> to support APX NDD,
> 
> gas/ChangeLog:
> 
>         * testsuite/gas/i386/intel.d: Adjust testcase.
>         * testsuite/gas/i386/lfence-load.d: Ditto.
>         * testsuite/gas/i386/noreg16-data32.d: Ditto.
>         * testsuite/gas/i386/noreg16.d: Ditto.
>         * testsuite/gas/i386/noreg32-data16.d: Ditto.
>         * testsuite/gas/i386/noreg32.d: Ditto.
>         * testsuite/gas/i386/noreg64-data16.d: Ditto.
>         * testsuite/gas/i386/noreg64-rex64.d: Ditto.
>         * testsuite/gas/i386/noreg64.d: Ditto.
>         * testsuite/gas/i386/opcode-suffix.d: Ditto.
>         * testsuite/gas/i386/opcode.d: Ditto.
>         * testsuite/gas/i386/x86-64-lfence-load.d: Ditto.
>         * testsuite/gas/i386/x86-64-opcode.d: Ditto.
> 
> opcodes/ChangeLog:
> 
>         * i386-dis.c (OP_I): Make const_1_mode print $1 in AT&T syntax.
> ---
>  gas/testsuite/gas/i386/intel.d              |  6 ++--
>  gas/testsuite/gas/i386/lfence-load.d        |  2 +-
>  gas/testsuite/gas/i386/noreg16-data32.d     | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg16.d            | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg32-data16.d     | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg32.d            | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg64-data16.d     | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg64-rex64.d      | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg64.d            | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/opcode-suffix.d      |  6 ++--
>  gas/testsuite/gas/i386/opcode.d             | 10 +++----
>  gas/testsuite/gas/i386/x86-64-lfence-load.d |  2 +-
>  gas/testsuite/gas/i386/x86-64-opcode.d      |  6 ++--
>  opcodes/i386-dis.c                          |  2 ++
>  14 files changed, 130 insertions(+), 128 deletions(-)
> 
> diff --git a/gas/testsuite/gas/i386/intel.d b/gas/testsuite/gas/i386/intel.d
> index bc212893853..c3e45c2e38c 100644
> --- a/gas/testsuite/gas/i386/intel.d
> +++ b/gas/testsuite/gas/i386/intel.d
> @@ -208,8 +208,8 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	cd 90 [ 	]*int    \$0x90
>  [ 	]*[a-f0-9]+:	ce [ 	]*into
>  [ 	]*[a-f0-9]+:	cf [ 	]*iret
> -[ 	]*[a-f0-9]+:	d0 90 90 90 90 90 [ 	]*rclb   -0x6f6f6f70\(%eax\)
> -[ 	]*[a-f0-9]+:	d1 90 90 90 90 90 [ 	]*rcll   -0x6f6f6f70\(%eax\)
> +[ 	]*[a-f0-9]+:	d0 90 90 90 90 90 [ 	]*rclb   \$1,-0x6f6f6f70\(%eax\)
> +[ 	]*[a-f0-9]+:	d1 90 90 90 90 90 [ 	]*rcll   \$1,-0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	d2 90 90 90 90 90 [ 	]*rclb   %cl,-0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	d3 90 90 90 90 90 [ 	]*rcll   %cl,-0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	d4 90 [ 	]*aam    \$0x90
> @@ -527,7 +527,7 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	66 ca 90 90 [ 	]*lretw  \$0x9090
>  [ 	]*[a-f0-9]+:	66 cb [ 	]*lretw
>  [ 	]*[a-f0-9]+:	66 cf [ 	]*iretw
> -[ 	]*[a-f0-9]+:	66 d1 90 90 90 90 90 [ 	]*rclw   -0x6f6f6f70\(%eax\)
> +[ 	]*[a-f0-9]+:	66 d1 90 90 90 90 90 [ 	]*rclw   \$1,-0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	66 d3 90 90 90 90 90 [ 	]*rclw   %cl,-0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	66 e5 90 [ 	]*in     \$0x90,%ax
>  [ 	]*[a-f0-9]+:	66 e7 90 [ 	]*out    %ax,\$0x90
> diff --git a/gas/testsuite/gas/i386/lfence-load.d b/gas/testsuite/gas/i386/lfence-
> load.d
> index 33ebef5432f..eb94bdcbb68 100644
> --- a/gas/testsuite/gas/i386/lfence-load.d
> +++ b/gas/testsuite/gas/i386/lfence-load.d
> @@ -83,7 +83,7 @@ Disassembly of section .text:
>   +[a-f0-9]+:	0f ae e8             	lfence
>   +[a-f0-9]+:	58                   	pop    %eax
>   +[a-f0-9]+:	0f ae e8             	lfence
> - +[a-f0-9]+:	66 d1 11             	rclw   \(%ecx\)
> + +[a-f0-9]+:	66 d1 11             	rclw   \$1,\(%ecx\)
>   +[a-f0-9]+:	0f ae e8             	lfence
>   +[a-f0-9]+:	f7 01 01 00 00 00    	testl  \$0x1,\(%ecx\)
>   +[a-f0-9]+:	0f ae e8             	lfence
> diff --git a/gas/testsuite/gas/i386/noreg16-data32.d
> b/gas/testsuite/gas/i386/noreg16-data32.d
> index 7561b549ebb..237e25dd0e1 100644
> --- a/gas/testsuite/gas/i386/noreg16-data32.d
> +++ b/gas/testsuite/gas/i386/noreg16-data32.d
> @@ -96,43 +96,43 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 27          	ptwrite \(%bx\)
>   *[a-f0-9]+:	66 ff 37             	pushl  \(%bx\)
>   *[a-f0-9]+:	66 06                	pushl  %es
> - *[a-f0-9]+:	66 d1 17             	rcll   \(%bx\)
> + *[a-f0-9]+:	66 d1 17             	rcll   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 17 02          	rcll   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 17             	rcll   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 17             	rcll   \(%bx\)
> - *[a-f0-9]+:	66 d1 1f             	rcrl   \(%bx\)
> + *[a-f0-9]+:	66 d1 17             	rcll   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 1f             	rcrl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 1f 02          	rcrl   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 1f             	rcrl   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 1f             	rcrl   \(%bx\)
> - *[a-f0-9]+:	66 d1 07             	roll   \(%bx\)
> + *[a-f0-9]+:	66 d1 1f             	rcrl   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 07             	roll   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 07 02          	roll   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 07             	roll   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 07             	roll   \(%bx\)
> - *[a-f0-9]+:	66 d1 0f             	rorl   \(%bx\)
> + *[a-f0-9]+:	66 d1 07             	roll   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 0f             	rorl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 0f 02          	rorl   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 0f             	rorl   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 0f             	rorl   \(%bx\)
> + *[a-f0-9]+:	66 d1 0f             	rorl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 83 1f 01          	sbbl   \$0x1,\(%bx\)
>   *[a-f0-9]+:	66 81 1f 89 00 00 00 	sbbl   \$0x89,\(%bx\)
>   *[a-f0-9]+:	66 81 1f 34 12 00 00 	sbbl   \$0x1234,\(%bx\)
>   *[a-f0-9]+:	66 af                	scas   %es:\(%di\),%eax
>   *[a-f0-9]+:	66 af                	scas   %es:\(%di\),%eax
> - *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
> + *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 27 02          	shll   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 27             	shll   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
> - *[a-f0-9]+:	66 d1 3f             	sarl   \(%bx\)
> + *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 3f             	sarl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 3f 02          	sarl   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 3f             	sarl   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 3f             	sarl   \(%bx\)
> - *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
> + *[a-f0-9]+:	66 d1 3f             	sarl   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 27 02          	shll   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 27             	shll   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
> - *[a-f0-9]+:	66 d1 2f             	shrl   \(%bx\)
> + *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 2f             	shrl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 2f 02          	shrl   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 2f             	shrl   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 2f             	shrl   \(%bx\)
> + *[a-f0-9]+:	66 d1 2f             	shrl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 ab                	stos   %eax,%es:\(%di\)
>   *[a-f0-9]+:	66 ab                	stos   %eax,%es:\(%di\)
>   *[a-f0-9]+:	66 83 2f 01          	subl   \$0x1,\(%bx\)
> diff --git a/gas/testsuite/gas/i386/noreg16.d b/gas/testsuite/gas/i386/noreg16.d
> index 86f852fb4ca..e4149b03a6e 100644
> --- a/gas/testsuite/gas/i386/noreg16.d
> +++ b/gas/testsuite/gas/i386/noreg16.d
> @@ -95,43 +95,43 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 27          	ptwrite \(%bx\)
>   *[a-f0-9]+:	ff 37                	push   \(%bx\)
>   *[a-f0-9]+:	06                   	push   %es
> - *[a-f0-9]+:	d1 17                	rclw   \(%bx\)
> + *[a-f0-9]+:	d1 17                	rclw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 17 02             	rclw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 17                	rclw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 17                	rclw   \(%bx\)
> - *[a-f0-9]+:	d1 1f                	rcrw   \(%bx\)
> + *[a-f0-9]+:	d1 17                	rclw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 1f                	rcrw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 1f 02             	rcrw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 1f                	rcrw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 1f                	rcrw   \(%bx\)
> - *[a-f0-9]+:	d1 07                	rolw   \(%bx\)
> + *[a-f0-9]+:	d1 1f                	rcrw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 07                	rolw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 07 02             	rolw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 07                	rolw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 07                	rolw   \(%bx\)
> - *[a-f0-9]+:	d1 0f                	rorw   \(%bx\)
> + *[a-f0-9]+:	d1 07                	rolw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 0f                	rorw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 0f 02             	rorw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 0f                	rorw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 0f                	rorw   \(%bx\)
> + *[a-f0-9]+:	d1 0f                	rorw   \$1,\(%bx\)
>   *[a-f0-9]+:	83 1f 01             	sbbw   \$0x1,\(%bx\)
>   *[a-f0-9]+:	81 1f 89 00          	sbbw   \$0x89,\(%bx\)
>   *[a-f0-9]+:	81 1f 34 12          	sbbw   \$0x1234,\(%bx\)
>   *[a-f0-9]+:	af                   	scas   %es:\(%di\),%ax
>   *[a-f0-9]+:	af                   	scas   %es:\(%di\),%ax
> - *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
> + *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 27 02             	shlw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 27                	shlw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
> - *[a-f0-9]+:	d1 3f                	sarw   \(%bx\)
> + *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 3f                	sarw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 3f 02             	sarw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 3f                	sarw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 3f                	sarw   \(%bx\)
> - *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
> + *[a-f0-9]+:	d1 3f                	sarw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 27 02             	shlw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 27                	shlw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
> - *[a-f0-9]+:	d1 2f                	shrw   \(%bx\)
> + *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 2f                	shrw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 2f 02             	shrw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 2f                	shrw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 2f                	shrw   \(%bx\)
> + *[a-f0-9]+:	d1 2f                	shrw   \$1,\(%bx\)
>   *[a-f0-9]+:	ab                   	stos   %ax,%es:\(%di\)
>   *[a-f0-9]+:	ab                   	stos   %ax,%es:\(%di\)
>   *[a-f0-9]+:	83 2f 01             	subw   \$0x1,\(%bx\)
> diff --git a/gas/testsuite/gas/i386/noreg32-data16.d
> b/gas/testsuite/gas/i386/noreg32-data16.d
> index 1ec6b9e8670..e3ae2116bb1 100644
> --- a/gas/testsuite/gas/i386/noreg32-data16.d
> +++ b/gas/testsuite/gas/i386/noreg32-data16.d
> @@ -103,44 +103,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 20          	ptwrite \(%eax\)
>   *[a-f0-9]+:	66 ff 30             	pushw  \(%eax\)
>   *[a-f0-9]+:	66 06                	pushw  %es
> - *[a-f0-9]+:	66 d1 10             	rclw   \(%eax\)
> + *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 10 02          	rclw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 10             	rclw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 10             	rclw   \(%eax\)
> - *[a-f0-9]+:	66 d1 18             	rcrw   \(%eax\)
> + *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 18 02          	rcrw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 18             	rcrw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 18             	rcrw   \(%eax\)
> - *[a-f0-9]+:	66 d1 00             	rolw   \(%eax\)
> + *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 00 02          	rolw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 00             	rolw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 00             	rolw   \(%eax\)
> - *[a-f0-9]+:	66 d1 08             	rorw   \(%eax\)
> + *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 08 02          	rorw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 08             	rorw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 08             	rorw   \(%eax\)
> + *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 83 18 01          	sbbw   \$0x1,\(%eax\)
>   *[a-f0-9]+:	66 81 18 89 00       	sbbw   \$0x89,\(%eax\)
>   *[a-f0-9]+:	66 81 18 34 12       	sbbw   \$0x1234,\(%eax\)
>   *[a-f0-9]+:	66 81 18 78 56       	sbbw   \$0x5678,\(%eax\)
>   *[a-f0-9]+:	66 af                	scas   %es:\(%edi\),%ax
>   *[a-f0-9]+:	66 af                	scas   %es:\(%edi\),%ax
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
> - *[a-f0-9]+:	66 d1 38             	sarw   \(%eax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 38 02          	sarw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 38             	sarw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 38             	sarw   \(%eax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
> + *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
> - *[a-f0-9]+:	66 d1 28             	shrw   \(%eax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 28 02          	shrw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 28             	shrw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 28             	shrw   \(%eax\)
> + *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%edi\)
>   *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%edi\)
>   *[a-f0-9]+:	66 83 28 01          	subw   \$0x1,\(%eax\)
> diff --git a/gas/testsuite/gas/i386/noreg32.d b/gas/testsuite/gas/i386/noreg32.d
> index 9dbef908ce7..8bb08ca73c6 100644
> --- a/gas/testsuite/gas/i386/noreg32.d
> +++ b/gas/testsuite/gas/i386/noreg32.d
> @@ -101,44 +101,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 20          	ptwrite \(%eax\)
>   *[a-f0-9]+:	ff 30                	push   \(%eax\)
>   *[a-f0-9]+:	06                   	push   %es
> - *[a-f0-9]+:	d1 10                	rcll   \(%eax\)
> + *[a-f0-9]+:	d1 10                	rcll   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 10 02             	rcll   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 10                	rcll   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 10                	rcll   \(%eax\)
> - *[a-f0-9]+:	d1 18                	rcrl   \(%eax\)
> + *[a-f0-9]+:	d1 10                	rcll   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 18 02             	rcrl   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 18                	rcrl   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 18                	rcrl   \(%eax\)
> - *[a-f0-9]+:	d1 00                	roll   \(%eax\)
> + *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 00                	roll   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 00 02             	roll   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 00                	roll   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 00                	roll   \(%eax\)
> - *[a-f0-9]+:	d1 08                	rorl   \(%eax\)
> + *[a-f0-9]+:	d1 00                	roll   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 08                	rorl   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 08 02             	rorl   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 08                	rorl   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 08                	rorl   \(%eax\)
> + *[a-f0-9]+:	d1 08                	rorl   \$1,\(%eax\)
>   *[a-f0-9]+:	83 18 01             	sbbl   \$0x1,\(%eax\)
>   *[a-f0-9]+:	81 18 89 00 00 00    	sbbl   \$0x89,\(%eax\)
>   *[a-f0-9]+:	81 18 34 12 00 00    	sbbl   \$0x1234,\(%eax\)
>   *[a-f0-9]+:	81 18 78 56 34 12    	sbbl   \$0x12345678,\(%eax\)
>   *[a-f0-9]+:	af                   	scas   %es:\(%edi\),%eax
>   *[a-f0-9]+:	af                   	scas   %es:\(%edi\),%eax
> - *[a-f0-9]+:	d1 20                	shll   \(%eax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 20                	shll   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%eax\)
> - *[a-f0-9]+:	d1 38                	sarl   \(%eax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 38                	sarl   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 38 02             	sarl   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 38                	sarl   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 38                	sarl   \(%eax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%eax\)
> + *[a-f0-9]+:	d1 38                	sarl   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 20                	shll   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%eax\)
> - *[a-f0-9]+:	d1 28                	shrl   \(%eax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 28                	shrl   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 28 02             	shrl   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 28                	shrl   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 28                	shrl   \(%eax\)
> + *[a-f0-9]+:	d1 28                	shrl   \$1,\(%eax\)
>   *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%edi\)
>   *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%edi\)
>   *[a-f0-9]+:	83 28 01             	subl   \$0x1,\(%eax\)
> diff --git a/gas/testsuite/gas/i386/noreg64-data16.d
> b/gas/testsuite/gas/i386/noreg64-data16.d
> index f1e67096a58..802eb4053d3 100644
> --- a/gas/testsuite/gas/i386/noreg64-data16.d
> +++ b/gas/testsuite/gas/i386/noreg64-data16.d
> @@ -106,44 +106,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	66 0f a1             	popw   %fs
>   *[a-f0-9]+:	66 ff 30             	pushw  \(%rax\)
>   *[a-f0-9]+:	66 0f a0             	pushw  %fs
> - *[a-f0-9]+:	66 d1 10             	rclw   \(%rax\)
> + *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 10 02          	rclw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 10             	rclw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 10             	rclw   \(%rax\)
> - *[a-f0-9]+:	66 d1 18             	rcrw   \(%rax\)
> + *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 18 02          	rcrw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 18             	rcrw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 18             	rcrw   \(%rax\)
> - *[a-f0-9]+:	66 d1 00             	rolw   \(%rax\)
> + *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 00 02          	rolw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 00             	rolw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 00             	rolw   \(%rax\)
> - *[a-f0-9]+:	66 d1 08             	rorw   \(%rax\)
> + *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 08 02          	rorw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 08             	rorw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 08             	rorw   \(%rax\)
> + *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 83 18 01          	sbbw   \$0x1,\(%rax\)
>   *[a-f0-9]+:	66 81 18 89 00       	sbbw   \$0x89,\(%rax\)
>   *[a-f0-9]+:	66 81 18 34 12       	sbbw   \$0x1234,\(%rax\)
>   *[a-f0-9]+:	66 81 18 78 56       	sbbw   \$0x5678,\(%rax\)
>   *[a-f0-9]+:	66 af                	scas   %es:\(%rdi\),%ax
>   *[a-f0-9]+:	66 af                	scas   %es:\(%rdi\),%ax
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
> - *[a-f0-9]+:	66 d1 38             	sarw   \(%rax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 38 02          	sarw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 38             	sarw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 38             	sarw   \(%rax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
> + *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
> - *[a-f0-9]+:	66 d1 28             	shrw   \(%rax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 28 02          	shrw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 28             	shrw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 28             	shrw   \(%rax\)
> + *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%rdi\)
>   *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%rdi\)
>   *[a-f0-9]+:	66 83 28 01          	subw   \$0x1,\(%rax\)
> diff --git a/gas/testsuite/gas/i386/noreg64-rex64.d
> b/gas/testsuite/gas/i386/noreg64-rex64.d
> index cd8679e626a..e33851d8093 100644
> --- a/gas/testsuite/gas/i386/noreg64-rex64.d
> +++ b/gas/testsuite/gas/i386/noreg64-rex64.d
> @@ -105,44 +105,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 48 0f ae 20       	ptwriteq \(%rax\)
>   *[a-f0-9]+:	48 ff 30             	rex\.W push \(%rax\)
>   *[a-f0-9]+:	48 0f a0             	rex\.W push %fs
> - *[a-f0-9]+:	48 d1 10             	rclq   \(%rax\)
> + *[a-f0-9]+:	48 d1 10             	rclq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 10 02          	rclq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 10             	rclq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 10             	rclq   \(%rax\)
> - *[a-f0-9]+:	48 d1 18             	rcrq   \(%rax\)
> + *[a-f0-9]+:	48 d1 10             	rclq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 18             	rcrq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 18 02          	rcrq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 18             	rcrq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 18             	rcrq   \(%rax\)
> - *[a-f0-9]+:	48 d1 00             	rolq   \(%rax\)
> + *[a-f0-9]+:	48 d1 18             	rcrq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 00             	rolq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 00 02          	rolq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 00             	rolq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 00             	rolq   \(%rax\)
> - *[a-f0-9]+:	48 d1 08             	rorq   \(%rax\)
> + *[a-f0-9]+:	48 d1 00             	rolq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 08             	rorq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 08 02          	rorq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 08             	rorq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 08             	rorq   \(%rax\)
> + *[a-f0-9]+:	48 d1 08             	rorq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 83 18 01          	sbbq   \$0x1,\(%rax\)
>   *[a-f0-9]+:	48 81 18 89 00 00 00 	sbbq   \$0x89,\(%rax\)
>   *[a-f0-9]+:	48 81 18 34 12 00 00 	sbbq   \$0x1234,\(%rax\)
>   *[a-f0-9]+:	48 81 18 78 56 34 12 	sbbq   \$0x12345678,\(%rax\)
>   *[a-f0-9]+:	48 af                	scas   %es:\(%rdi\),%rax
>   *[a-f0-9]+:	48 af                	scas   %es:\(%rdi\),%rax
> - *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
> + *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 20 02          	shlq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 20             	shlq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
> - *[a-f0-9]+:	48 d1 38             	sarq   \(%rax\)
> + *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 38             	sarq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 38 02          	sarq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 38             	sarq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 38             	sarq   \(%rax\)
> - *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
> + *[a-f0-9]+:	48 d1 38             	sarq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 20 02          	shlq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 20             	shlq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
> - *[a-f0-9]+:	48 d1 28             	shrq   \(%rax\)
> + *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 28             	shrq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 28 02          	shrq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 28             	shrq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 28             	shrq   \(%rax\)
> + *[a-f0-9]+:	48 d1 28             	shrq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 ab                	stos   %rax,%es:\(%rdi\)
>   *[a-f0-9]+:	48 ab                	stos   %rax,%es:\(%rdi\)
>   *[a-f0-9]+:	48 83 28 01          	subq   \$0x1,\(%rax\)
> diff --git a/gas/testsuite/gas/i386/noreg64.d b/gas/testsuite/gas/i386/noreg64.d
> index 354d89069ae..2afdef38f92 100644
> --- a/gas/testsuite/gas/i386/noreg64.d
> +++ b/gas/testsuite/gas/i386/noreg64.d
> @@ -107,44 +107,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 20          	ptwritel \(%rax\)
>   *[a-f0-9]+:	ff 30                	push   \(%rax\)
>   *[a-f0-9]+:	0f a0                	push   %fs
> - *[a-f0-9]+:	d1 10                	rcll   \(%rax\)
> + *[a-f0-9]+:	d1 10                	rcll   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 10 02             	rcll   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 10                	rcll   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 10                	rcll   \(%rax\)
> - *[a-f0-9]+:	d1 18                	rcrl   \(%rax\)
> + *[a-f0-9]+:	d1 10                	rcll   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 18 02             	rcrl   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 18                	rcrl   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 18                	rcrl   \(%rax\)
> - *[a-f0-9]+:	d1 00                	roll   \(%rax\)
> + *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 00                	roll   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 00 02             	roll   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 00                	roll   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 00                	roll   \(%rax\)
> - *[a-f0-9]+:	d1 08                	rorl   \(%rax\)
> + *[a-f0-9]+:	d1 00                	roll   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 08                	rorl   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 08 02             	rorl   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 08                	rorl   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 08                	rorl   \(%rax\)
> + *[a-f0-9]+:	d1 08                	rorl   \$1,\(%rax\)
>   *[a-f0-9]+:	83 18 01             	sbbl   \$0x1,\(%rax\)
>   *[a-f0-9]+:	81 18 89 00 00 00    	sbbl   \$0x89,\(%rax\)
>   *[a-f0-9]+:	81 18 34 12 00 00    	sbbl   \$0x1234,\(%rax\)
>   *[a-f0-9]+:	81 18 78 56 34 12    	sbbl   \$0x12345678,\(%rax\)
>   *[a-f0-9]+:	af                   	scas   %es:\(%rdi\),%eax
>   *[a-f0-9]+:	af                   	scas   %es:\(%rdi\),%eax
> - *[a-f0-9]+:	d1 20                	shll   \(%rax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 20                	shll   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%rax\)
> - *[a-f0-9]+:	d1 38                	sarl   \(%rax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 38                	sarl   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 38 02             	sarl   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 38                	sarl   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 38                	sarl   \(%rax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%rax\)
> + *[a-f0-9]+:	d1 38                	sarl   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 20                	shll   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%rax\)
> - *[a-f0-9]+:	d1 28                	shrl   \(%rax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 28                	shrl   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 28 02             	shrl   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 28                	shrl   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 28                	shrl   \(%rax\)
> + *[a-f0-9]+:	d1 28                	shrl   \$1,\(%rax\)
>   *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%rdi\)
>   *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%rdi\)
>   *[a-f0-9]+:	83 28 01             	subl   \$0x1,\(%rax\)
> diff --git a/gas/testsuite/gas/i386/opcode-suffix.d b/gas/testsuite/gas/i386/opcode-
> suffix.d
> index 946a0a4d7a0..ca6af50c9cf 100644
> --- a/gas/testsuite/gas/i386/opcode-suffix.d
> +++ b/gas/testsuite/gas/i386/opcode-suffix.d
> @@ -206,8 +206,8 @@ Disassembly of section .text:
>   *[0-9a-f]+:	cd 90[ 	]+int[ 	]+\$0x90
>   *[0-9a-f]+:	ce[ 	]+into
>   *[0-9a-f]+:	cf[ 	]+iretl
> - *[0-9a-f]+:	d0 90 90 90 90 90[ 	]+rclb[ 	]+-0x6f6f6f70\(%eax\)
> - *[0-9a-f]+:	d1 90 90 90 90 90[ 	]+rcll[ 	]+-0x6f6f6f70\(%eax\)
> + *[0-9a-f]+:	d0 90 90 90 90 90[ 	]+rclb[ 	]+\$1,-0x6f6f6f70\(%eax\)
> + *[0-9a-f]+:	d1 90 90 90 90 90[ 	]+rcll[ 	]+\$1,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	d2 90 90 90 90 90[ 	]+rclb[ 	]+%cl,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	d3 90 90 90 90 90[ 	]+rcll[ 	]+%cl,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	d4 90[ 	]+aam[ 	]+\$0x90
> @@ -523,7 +523,7 @@ Disassembly of section .text:
>   *[0-9a-f]+:	66 ca 90 90[ 	]+lretw[ 	]+\$0x9090
>   *[0-9a-f]+:	66 cb[ 	]+lretw
>   *[0-9a-f]+:	66 cf[ 	]+iretw
> - *[0-9a-f]+:	66 d1 90 90 90 90 90[ 	]+rclw[ 	]+-0x6f6f6f70\(%eax\)
> + *[0-9a-f]+:	66 d1 90 90 90 90 90[ 	]+rclw[ 	]+\$1,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	66 d3 90 90 90 90 90[ 	]+rclw[ 	]+%cl,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	66 e5 90[ 	]+inw[ 	]+\$0x90,%ax
>   *[0-9a-f]+:	66 e7 90[ 	]+outw[ 	]+%ax,\$0x90
> diff --git a/gas/testsuite/gas/i386/opcode.d b/gas/testsuite/gas/i386/opcode.d
> index 7631195d8d4..f7af22518e2 100644
> --- a/gas/testsuite/gas/i386/opcode.d
> +++ b/gas/testsuite/gas/i386/opcode.d
> @@ -205,8 +205,8 @@ Disassembly of section .text:
>   279:	cd 90 [ 	]*int    \$0x90
>   27b:	ce [ 	]*into
>   27c:	cf [ 	]*iret
> - 27d:	d0 90 90 90 90 90 [ 	]*rclb   -0x6f6f6f70\(%eax\)
> - 283:	d1 90 90 90 90 90 [ 	]*rcll   -0x6f6f6f70\(%eax\)
> + 27d:	d0 90 90 90 90 90 [ 	]*rclb   \$1,-0x6f6f6f70\(%eax\)
> + 283:	d1 90 90 90 90 90 [ 	]*rcll   \$1,-0x6f6f6f70\(%eax\)
>   289:	d2 90 90 90 90 90 [ 	]*rclb   %cl,-0x6f6f6f70\(%eax\)
>   28f:	d3 90 90 90 90 90 [ 	]*rcll   %cl,-0x6f6f6f70\(%eax\)
>   295:	d4 90 [ 	]*aam    \$0x90
> @@ -522,7 +522,7 @@ Disassembly of section .text:
>   869:	66 ca 90 90 [ 	]*lretw  \$0x9090
>   86d:	66 cb [ 	]*lretw
>   86f:	66 cf [ 	]*iretw
> - 871:	66 d1 90 90 90 90 90 [ 	]*rclw   -0x6f6f6f70\(%eax\)
> + 871:	66 d1 90 90 90 90 90 [ 	]*rclw   \$1,-0x6f6f6f70\(%eax\)
>   878:	66 d3 90 90 90 90 90 [ 	]*rclw   %cl,-0x6f6f6f70\(%eax\)
>   87f:	66 e5 90 [ 	]*in     \$0x90,%ax
>   882:	66 e7 90 [ 	]*out    %ax,\$0x90
> @@ -610,8 +610,8 @@ Disassembly of section .text:
>   +[a-f0-9]+:	f7 c9 04 00 00 00    	test   \$(0x)?0*4,%ecx
>   +[a-f0-9]+:	c0 f0 02             	shl    \$0x2,%al
>   +[a-f0-9]+:	c1 f0 01             	shl    \$0x1,%eax
> - +[a-f0-9]+:	d0 f0                	shl    %al
> - +[a-f0-9]+:	d1 f0                	shl    %eax
> + +[a-f0-9]+:	d0 f0                	shl    \$1,%al
> + +[a-f0-9]+:	d1 f0                	shl    \$1,%eax
>   +[a-f0-9]+:	d2 f0                	shl    %cl,%al
>   +[a-f0-9]+:	d3 f0                	shl    %cl,%eax
>  #pass
> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d
> b/gas/testsuite/gas/i386/x86-64-lfence-load.d
> index b4a03db811d..726236826e8 100644
> --- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
> +++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
> @@ -90,7 +90,7 @@ Disassembly of section .text:
>   +[a-f0-9]+:	0f ae e8             	lfence
>   +[a-f0-9]+:	58                   	pop    %rax
>   +[a-f0-9]+:	0f ae e8             	lfence
> - +[a-f0-9]+:	66 d1 11             	rclw   \(%rcx\)
> + +[a-f0-9]+:	66 d1 11             	rclw   \$1,\(%rcx\)
>   +[a-f0-9]+:	0f ae e8             	lfence
>   +[a-f0-9]+:	f7 01 01 00 00 00    	testl  \$0x1,\(%rcx\)
>   +[a-f0-9]+:	0f ae e8             	lfence
> diff --git a/gas/testsuite/gas/i386/x86-64-opcode.d b/gas/testsuite/gas/i386/x86-
> 64-opcode.d
> index ee6d0f5f4bd..1b8a9fa9014 100644
> --- a/gas/testsuite/gas/i386/x86-64-opcode.d
> +++ b/gas/testsuite/gas/i386/x86-64-opcode.d
> @@ -335,9 +335,9 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	c0 f0 02             	shl    \$0x2,%al
>  [ 	]*[a-f0-9]+:	c1 f0 01             	shl    \$0x1,%eax
>  [ 	]*[a-f0-9]+:	48 c1 f0 01          	shl    \$0x1,%rax
> -[ 	]*[a-f0-9]+:	d0 f0                	shl    %al
> -[ 	]*[a-f0-9]+:	d1 f0                	shl    %eax
> -[ 	]*[a-f0-9]+:	48 d1 f0             	shl    %rax
> +[ 	]*[a-f0-9]+:	d0 f0                	shl    \$1,%al
> +[ 	]*[a-f0-9]+:	d1 f0                	shl    \$1,%eax
> +[ 	]*[a-f0-9]+:	48 d1 f0             	shl    \$1,%rax
>  [ 	]*[a-f0-9]+:	d2 f0                	shl    %cl,%al
>  [ 	]*[a-f0-9]+:	d3 f0                	shl    %cl,%eax
>  [ 	]*[a-f0-9]+:	48 d3 f0             	shl    %cl,%rax
> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
> index 2e2043d467b..e432b61a6cd 100644
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -12090,6 +12090,8 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
>      case const_1_mode:
>        if (ins->intel_syntax)
>  	oappend (ins, "1");
> +      else
> +	oappend (ins, "$1");
>        return true;
>      default:
>        oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> --
> 2.25.1

OK.

Thanks.

H.J.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-12-11 12:27   ` Jan Beulich
@ 2023-12-12  3:18     ` Hu, Lin1
  2023-12-12  8:41       ` Jan Beulich
  2023-12-12  8:45       ` Jan Beulich
  0 siblings, 2 replies; 69+ messages in thread
From: Hu, Lin1 @ 2023-12-12  3:18 UTC (permalink / raw)
  To: Beulich, Jan, Cui, Lili; +Cc: Lu, Hongjiu, binutils

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, December 11, 2023 8:28 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; Hu, Lin1 <lin1.hu@intel.com>;
> binutils@sourceware.org
> Subject: Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
> 
> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if the instruction use the REX registers.  */ static bool
> > +check_RexOperands () {
> > +  for (unsigned int op = 0; op < i.operands; op++)
> > +    {
> > +      if (i.types[op].bitfield.class != Reg)
> > +	continue;
> > +
> > +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
> > +	return true;
> > +    }
> > +
> > +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
> > +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
> > +    return true;
> > +
> > +  /* Check pseudo prefix {rex} are valid.  */  return i.rex_encoding;
> 
> Can this actually happen, when we're converting from EVEX to legacy?
> (Initially I wanted to ask about "rex" and alike prefixes, i.e. the non- pseudo
> ones.)
>

This is to align with check_EgprOperands. I hope the function be more general. Not just for this optimization problem.
 
>
> > +}
> > +
> > +/* Optimize APX NDD insns to legacy insns.  */ static unsigned int
> > +can_convert_NDD_to_legacy (const insn_template *t) {
> > +  unsigned int match_dest_op = ~0;
> > +
> > +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> 
> No new callers are expected to appear (any time soon) and the sole caller has
> checked this already.
>

I will remove the condition.
 
>
> Also with this check, ...
> 
> > +      && t->opcode_space == SPACE_EVEXMAP4
> 
> ... what (further) effect is this one intended to have?
>

At first it was because of map4+vexvvvvv in order to locate the instructions that needed to be optimized. It seems that it's useless now. I will remove it.
 
>
> > +      && !i.has_nf
> > +      && i.reg_operands >= 2)
> > +    {
> > +      unsigned int dest = i.operands - 1;
> > +      unsigned int src1 = i.operands - 2;
> > +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> > +
> > +      if (i.types[src1].bitfield.class == Reg
> > +	  && i.op[src1].regs == i.op[dest].regs)
> > +	match_dest_op = src1;
> > +      /* If the first operand is the same as the third operand,
> > +	 these instructions need to support the ability to commutative
> > +	 the first two operands and still not change the semantics in order
> > +	 to be optimized.  */
> > +      else if (i.types[src2].bitfield.class == Reg
> > +	       && i.op[src2].regs == i.op[dest].regs
> > +	       && optimize > 1
> > +	       && t->opcode_modifier.commutative)
> 
> Based on the "cheap conditions first" principle and to also be better in line with
> the comment, may I suggest
> 
> +      else if (optimize > 1
> +	       && t->opcode_modifier.commutative
> +	       && i.types[src2].bitfield.class == Reg
> +	       && i.op[src2].regs == i.op[dest].regs)
> 
> ?
>

Thanks your advice, I have modified.

> 
> > +	match_dest_op = src2;
> > +    }
> > +  return match_dest_op;
> > +}
> > +
> >  /* Helper function for the progress() macro in match_template().  */
> > static INLINE enum i386_error progress (enum i386_error new,
> >  					enum i386_error last,
> > @@ -7675,6 +7727,61 @@ match_template (char mnem_suffix)
> >  	  i.memshift = memshift;
> >  	}
> >
> > +      /* If we can optimize a NDD insn to legacy insn, like
> > +	 add %r16, %r8, %r8 -> add %r16, %r8,
> > +	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> > +	 Note that the semantics have not been changed.  */
> > +      if (optimize
> > +	  && !i.no_optimize
> > +	  && i.vec_encoding != vex_encoding_evex
> > +	  && t + 1 < current_templates->end
> > +	  && !t[1].opcode_modifier.evex
> > +	  && t[1].opcode_space <= SPACE_0F38
> > +	  && t->opcode_modifier.vexvvvv == VexVVVV_DST)
> > +	{
> > +	  unsigned int match_dest_op = can_convert_NDD_to_legacy (t);
> > +	  size_match = true;
> 
> This would perhaps better ...
> 
> > +	  if (match_dest_op != (unsigned int) ~0)
> > +	    {
> 
> ... live here
>

OK.
 
>
> > +	      /* We ensure that the next template has the same input
> > +		 operands as the original matching template by the first
> > +		 opernd (ATT), thus avoiding the error caused by the wrong
> order
> > +		 of insns in i386.tbl.  */
> 
> I'm sorry, but I (still) can't make sense of this last part of the comment, after the
> comma.
>

I mean if someone support new NDD insns and put it in the wrong position, so the part will try to avoid to optimize the insn.
 
>
> > +	      overlap0 = operand_type_and (i.types[0],
> > +					   t[1].operand_types[0]);
> > +	      if (t->opcode_modifier.d)
> > +		overlap1 = operand_type_and (i.types[0],
> > +					     t[1].operand_types[1]);
> > +	      if (!operand_type_match (overlap0, i.types[0])
> > +		  && (!t->opcode_modifier.d
> > +		      || (t->opcode_modifier.d
> > +			  && !operand_type_match (overlap1, i.types[0]))))
> 
> What's wrong with the simpler
> 
> 		  && (!t->opcode_modifier.d
> 		      || !operand_type_match (overlap1, i.types[0])))
> 
> ?
>

Yes, the simpler is useful, I have modified these conditions.
 
>
> > +		size_match = false;
> 
> Yet still, and despite the improved comment, I don't really see what all of this is
> about. What cases would be mis-handled if this wasn't there?
>

Not currently. It's for someone support new NDD instructions with wrong order. My idea is to prefer not to optimize, but try to avoid reporting error.
 
>
> > +	      if (size_match
> > +		  /* Optimizing some non-legacy-map0/1 without REX/REX2
> prefix will be valuable.  */
> > +		  && (t[1].opcode_space <= SPACE_0F
> 
> Where a comment is placed is meaningful to understanding what it's about. The
> wayy you have it, is says "non-legacy-map0/1" on a check that the (next)
> encoding is map0 or map1. I think this wants moving down by a line, and even
> then also re-wording: If I didn't (vaguely) recall context, I don't think I could
> derive what is meant. Iirc this is about legacy encodings being one byte shorter
> for certain 0f38 space insns when they don't require a REX prefix to encode.
> How about something like "Some non-legacy-map0/1 insns can be shorter when
> legacy-encoded and when no REX prefix is required"?
>

OK, I have modified. 
 
>
> > +		      || (!check_EgprOperands (t + 1)
> > +			  && !check_RexOperands ()
> 
> I'm not going to insist that you adjust this, but these two calls side by side
> demonstrate a curious inconsistency: The former requires t to be passed in. If
> you keep it like that, I may change this down the road, the more that the t-
> related aspect isn't relevant here at all (and could hence be moved out of the
> function to the single place where it is needed).
>

I will keep the part of code. I prefer to modified check_EgprOperands in another patch.
 
>
> > +			  && !i.op[i.operands - 1].regs-
> >reg_type.bitfield.qword)))
> > +		{
> > +		  unsigned int src1 = i.operands - 2;
> > +		  unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> > +
> > +		  if (match_dest_op == src2)
> > +		    swap_2_operands (match_dest_op, src1);
> 
> Isn't it wrong (albeit benign) to swap when i.operands == 2? IOW wouldn't
> 
> 		  if (i.reg_operands > 2 && match_dest_op == i.operands - 3)
> 		    swap_2_operands (match_dest_op, i.operands - 2);
> 
> be more in line with what's actually wanted?
> 

Because some insn use memory (like add %r8, (%rsp), %r8), so I change the code be
		  if (i.operands > 2 && match_dest_op == i.operands - 3)
		    swap_2_operands (match_dest_op, i.operands - 2);

>
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
> > @@ -0,0 +1,123 @@
> > +# Check 64bit APX NDD instructions with optimized encoding
> > +
> > +	.text
> > +_start:
> > +add    %r31,%r8,%r8
> > +addb   %r31b,%r8b,%r8b
> > +{store} add    %r31,%r8,%r8
> > +{load}  add    %r31,%r8,%r8
> > +add    %r31,(%r8),%r31
> > +add    (%r31),%r8,%r8
> > +add    $0x12344433,%r15,%r15
> > +add    $0xfffffffff4332211,%r8,%r8
> > +inc    %r31,%r31
> > +incb   %r31b,%r31b
> > +sub    %r15,%r17,%r17
> > +subb   %r15b,%r17b,%r17b
> > +sub    %r15,(%r8),%r15
> > +sub    (%r15,%rax,1),%r16,%r16
> > +sub    $0x1234,%r30,%r30
> > +dec    %r17,%r17
> > +decb   %r17b,%r17b
> > +sbb    %r15,%r17,%r17
> > +sbbb   %r15b,%r17b,%r17b
> > +sbb    %r15,(%r8),%r15
> > +sbb    (%r15,%rax,1),%r16,%r16
> > +sbb    $0x1234,%r30,%r30
> > +and    %r15,%r17,%r17
> > +andb   %r15b,%r17b,%r17b
> > +and    %r15,(%r8),%r15
> > +and    (%r15,%rax,1),%r16,%r16
> > +and    $0x1234,%r30,%r30
> > +or     %r15,%r17,%r17
> > +orb    %r15b,%r17b,%r17b
> > +or     %r15,(%r8),%r15
> > +or     (%r15,%rax,1),%r16,%r16
> > +or     $0x1234,%r30,%r30
> > +xor    %r15,%r17,%r17
> > +xorb   %r15b,%r17b,%r17b
> > +xor    %r15,(%r8),%r15
> > +xor    (%r15,%rax,1),%r16,%r16
> > +xor    $0x1234,%r30,%r30
> > +adc    %r15,%r17,%r17
> > +adcb   %r15b,%r17b,%r17b
> > +adc    %r15,(%r8),%r15
> > +adc    (%r15,%rax,1),%r16,%r16
> > +adc    $0x1234,%r30,%r30
> > +neg    %r17,%r17
> > +negb   %r17b,%r17b
> > +not    %r17,%r17
> > +notb   %r17b,%r17b
> > +imul   0x90909(%eax),%edx,%edx
> > +imul   0x909(%rax,%r31,8),%rdx,%rdx
> > +imul   %rdx,%rax,%rdx
> > +rol    %r31,%r31
> > +rolb   %r31b,%r31b
> > +rol    $0x2,%r12,%r12
> > +rolb   $0x2,%r12b,%r12b
> > +ror    %r31,%r31
> > +rorb   %r31b,%r31b
> > +ror    $0x2,%r12,%r12
> > +rorb   $0x2,%r12b,%r12b
> > +rcl    %r31,%r31
> > +rclb   %r31b,%r31b
> > +rcl    $0x2,%r12,%r12
> > +rclb   $0x2,%r12b,%r12b
> > +rcr    %r31,%r31
> > +rcrb   %r31b,%r31b
> > +rcr    $0x2,%r12,%r12
> > +rcrb   $0x2,%r12b,%r12b
> > +sal    %r31,%r31
> > +salb   %r31b,%r31b
> > +sal    $0x2,%r12,%r12
> > +salb   $0x2,%r12b,%r12b
> > +shl    %r31,%r31
> > +shlb   %r31b,%r31b
> > +shl    $0x2,%r12,%r12
> > +shlb   $0x2,%r12b,%r12b
> > +shr    %r31,%r31
> > +shrb   %r31b,%r31b
> > +shr    $0x2,%r12,%r12
> > +shrb   $0x2,%r12b,%r12b
> > +sar    %r31,%r31
> > +sarb   %r31b,%r31b
> > +sar    $0x2,%r12,%r12
> > +sarb   $0x2,%r12b,%r12b
> > +shld   $0x1,%r12,(%rax),%r12
> > +shld   $0x2,%r8,%r12,%r12
> > +shld   $0x2,%r8,%r12,%r8
> > +shld   %cl,%r9,(%rax),%r9
> > +shld   %cl,%r12,%r16,%r16
> > +shld   %cl,%r12,%r16,%r12
> > +shrd   $0x1,%r12,(%rax),%r12
> > +shrd   $0x1,%r13,%r12,%r12
> > +shrd   $0x1,%r13,%r12,%r13
> > +shrd   %cl,%r9,(%rax),%r9
> > +shrd   %cl,%r12,%r16,%r16
> > +shrd   %cl,%r12,%r16,%r12
> > +cmovo  0x90909090(%eax),%edx,%edx
> > +cmovno 0x90909090(%eax),%edx,%edx
> > +cmovb  0x90909090(%eax),%edx,%edx
> > +cmovae 0x90909090(%eax),%edx,%edx
> > +cmove  0x90909090(%eax),%edx,%edx
> > +cmovne 0x90909090(%eax),%edx,%edx
> > +cmovbe 0x90909090(%eax),%edx,%edx
> > +cmova  0x90909090(%eax),%edx,%edx
> > +cmovs  0x90909090(%eax),%edx,%edx
> > +cmovns 0x90909090(%eax),%edx,%edx
> > +cmovp  0x90909090(%eax),%edx,%edx
> > +cmovnp 0x90909090(%eax),%edx,%edx
> > +cmovl  0x90909090(%eax),%edx,%edx
> > +cmovge 0x90909090(%eax),%edx,%edx
> > +cmovle 0x90909090(%eax),%edx,%edx
> > +cmovg  0x90909090(%eax),%edx,%edx
> > +adcx   %ebx,%eax,%eax
> > +adcx   %eax,%ebx,%eax
> > +adcx   %rbx,%rax,%rax
> > +adcx   %r15,%r8,%r8
> 
> Might this better be
> 
> adcx   %r15d,%r8d,%r8d
> 
> to avoid having two exclusion criteria (REX register use and REX.W set)?
> Or maybe even split to further separate source and destination:
> 
> adcx   %eax,%r8d,%r8d
> adcx   %r15d,%eax,%eax
> 
> ?
>

OK, I have split adcx and adox.
 
BR,
Lin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 6/9] Support APX NDD
  2023-12-08 14:27   ` Jan Beulich
@ 2023-12-12  5:53     ` Cui, Lili
  2023-12-12  8:28       ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-12  5:53 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, Kong, Lingling, binutils

> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- a/opcodes/i386-dis-evex-reg.h
> > +++ b/opcodes/i386-dis-evex-reg.h
> > @@ -56,3 +56,58 @@
> >      { "blsmskS",	{ VexGdq, Edq }, 0 },
> >      { "blsiS",	{ VexGdq, Edq }, 0 },
> >    },
> > +  /* REG_EVEX_MAP4_80 */
> > +  {
> > +    { "addA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> > +    { "orA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> > +    { "adcA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> > +    { "sbbA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> > +    { "andA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> > +    { "subA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> > +    { "xorA",	{ VexGb, Eb, Ib }, NO_PREFIX },
> 
> Don't these need to use PREFIX_NP_OR_DATA? The doc clearly says
> ".IGNORED" there. (Applies to other byte ops as well then, of course.)
> 

I'm confused here, "IGNORED" means the W bit in the EVEX payload is ignored. Why is 0x66 allowed?

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax
  2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
                   ` (9 preceding siblings ...)
  2023-12-12  2:57 ` Lu, Hongjiu
@ 2023-12-12  8:16 ` Cui, Lili
  10 siblings, 0 replies; 69+ messages in thread
From: Cui, Lili @ 2023-12-12  8:16 UTC (permalink / raw)
  To: binutils; +Cc: Beulich, Jan, Lu, Hongjiu

Hi H.J,

Could you help review this patch ? thanks.

Lili.

> -----Original Message-----
> From: Cui, Lili <lili.cui@intel.com>
> Sent: Friday, November 24, 2023 3:02 PM
> To: binutils@sourceware.org
> Cc: Beulich, Jan <JBeulich@suse.com>; Lu, Hongjiu <hongjiu.lu@intel.com>
> Subject: [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax
> 
> Make const_1_mode print $1 in AT&T syntax, otherwise there will be
> correctness issues when it is extended to support APX NDD,
> 
> gas/ChangeLog:
> 
>         * testsuite/gas/i386/intel.d: Adjust testcase.
>         * testsuite/gas/i386/lfence-load.d: Ditto.
>         * testsuite/gas/i386/noreg16-data32.d: Ditto.
>         * testsuite/gas/i386/noreg16.d: Ditto.
>         * testsuite/gas/i386/noreg32-data16.d: Ditto.
>         * testsuite/gas/i386/noreg32.d: Ditto.
>         * testsuite/gas/i386/noreg64-data16.d: Ditto.
>         * testsuite/gas/i386/noreg64-rex64.d: Ditto.
>         * testsuite/gas/i386/noreg64.d: Ditto.
>         * testsuite/gas/i386/opcode-suffix.d: Ditto.
>         * testsuite/gas/i386/opcode.d: Ditto.
>         * testsuite/gas/i386/x86-64-lfence-load.d: Ditto.
>         * testsuite/gas/i386/x86-64-opcode.d: Ditto.
> 
> opcodes/ChangeLog:
> 
>         * i386-dis.c (OP_I): Make const_1_mode print $1 in AT&T syntax.
> ---
>  gas/testsuite/gas/i386/intel.d              |  6 ++--
>  gas/testsuite/gas/i386/lfence-load.d        |  2 +-
>  gas/testsuite/gas/i386/noreg16-data32.d     | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg16.d            | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg32-data16.d     | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg32.d            | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg64-data16.d     | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg64-rex64.d      | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/noreg64.d            | 32 ++++++++++-----------
>  gas/testsuite/gas/i386/opcode-suffix.d      |  6 ++--
>  gas/testsuite/gas/i386/opcode.d             | 10 +++----
>  gas/testsuite/gas/i386/x86-64-lfence-load.d |  2 +-
>  gas/testsuite/gas/i386/x86-64-opcode.d      |  6 ++--
>  opcodes/i386-dis.c                          |  2 ++
>  14 files changed, 130 insertions(+), 128 deletions(-)
> 
> diff --git a/gas/testsuite/gas/i386/intel.d b/gas/testsuite/gas/i386/intel.d
> index bc212893853..c3e45c2e38c 100644
> --- a/gas/testsuite/gas/i386/intel.d
> +++ b/gas/testsuite/gas/i386/intel.d
> @@ -208,8 +208,8 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	cd 90 [ 	]*int    \$0x90
>  [ 	]*[a-f0-9]+:	ce [ 	]*into
>  [ 	]*[a-f0-9]+:	cf [ 	]*iret
> -[ 	]*[a-f0-9]+:	d0 90 90 90 90 90 [ 	]*rclb   -0x6f6f6f70\(%eax\)
> -[ 	]*[a-f0-9]+:	d1 90 90 90 90 90 [ 	]*rcll   -0x6f6f6f70\(%eax\)
> +[ 	]*[a-f0-9]+:	d0 90 90 90 90 90 [ 	]*rclb   \$1,-
> 0x6f6f6f70\(%eax\)
> +[ 	]*[a-f0-9]+:	d1 90 90 90 90 90 [ 	]*rcll   \$1,-
> 0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	d2 90 90 90 90 90 [ 	]*rclb   %cl,-
> 0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	d3 90 90 90 90 90 [ 	]*rcll   %cl,-
> 0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	d4 90 [ 	]*aam    \$0x90
> @@ -527,7 +527,7 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	66 ca 90 90 [ 	]*lretw  \$0x9090
>  [ 	]*[a-f0-9]+:	66 cb [ 	]*lretw
>  [ 	]*[a-f0-9]+:	66 cf [ 	]*iretw
> -[ 	]*[a-f0-9]+:	66 d1 90 90 90 90 90 [ 	]*rclw   -0x6f6f6f70\(%eax\)
> +[ 	]*[a-f0-9]+:	66 d1 90 90 90 90 90 [ 	]*rclw   \$1,-
> 0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	66 d3 90 90 90 90 90 [ 	]*rclw   %cl,-
> 0x6f6f6f70\(%eax\)
>  [ 	]*[a-f0-9]+:	66 e5 90 [ 	]*in     \$0x90,%ax
>  [ 	]*[a-f0-9]+:	66 e7 90 [ 	]*out    %ax,\$0x90
> diff --git a/gas/testsuite/gas/i386/lfence-load.d
> b/gas/testsuite/gas/i386/lfence-load.d
> index 33ebef5432f..eb94bdcbb68 100644
> --- a/gas/testsuite/gas/i386/lfence-load.d
> +++ b/gas/testsuite/gas/i386/lfence-load.d
> @@ -83,7 +83,7 @@ Disassembly of section .text:
>   +[a-f0-9]+:	0f ae e8             	lfence
>   +[a-f0-9]+:	58                   	pop    %eax
>   +[a-f0-9]+:	0f ae e8             	lfence
> - +[a-f0-9]+:	66 d1 11             	rclw   \(%ecx\)
> + +[a-f0-9]+:	66 d1 11             	rclw   \$1,\(%ecx\)
>   +[a-f0-9]+:	0f ae e8             	lfence
>   +[a-f0-9]+:	f7 01 01 00 00 00    	testl  \$0x1,\(%ecx\)
>   +[a-f0-9]+:	0f ae e8             	lfence
> diff --git a/gas/testsuite/gas/i386/noreg16-data32.d
> b/gas/testsuite/gas/i386/noreg16-data32.d
> index 7561b549ebb..237e25dd0e1 100644
> --- a/gas/testsuite/gas/i386/noreg16-data32.d
> +++ b/gas/testsuite/gas/i386/noreg16-data32.d
> @@ -96,43 +96,43 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 27          	ptwrite \(%bx\)
>   *[a-f0-9]+:	66 ff 37             	pushl  \(%bx\)
>   *[a-f0-9]+:	66 06                	pushl  %es
> - *[a-f0-9]+:	66 d1 17             	rcll   \(%bx\)
> + *[a-f0-9]+:	66 d1 17             	rcll   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 17 02          	rcll   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 17             	rcll   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 17             	rcll   \(%bx\)
> - *[a-f0-9]+:	66 d1 1f             	rcrl   \(%bx\)
> + *[a-f0-9]+:	66 d1 17             	rcll   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 1f             	rcrl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 1f 02          	rcrl   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 1f             	rcrl   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 1f             	rcrl   \(%bx\)
> - *[a-f0-9]+:	66 d1 07             	roll   \(%bx\)
> + *[a-f0-9]+:	66 d1 1f             	rcrl   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 07             	roll   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 07 02          	roll   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 07             	roll   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 07             	roll   \(%bx\)
> - *[a-f0-9]+:	66 d1 0f             	rorl   \(%bx\)
> + *[a-f0-9]+:	66 d1 07             	roll   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 0f             	rorl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 0f 02          	rorl   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 0f             	rorl   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 0f             	rorl   \(%bx\)
> + *[a-f0-9]+:	66 d1 0f             	rorl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 83 1f 01          	sbbl   \$0x1,\(%bx\)
>   *[a-f0-9]+:	66 81 1f 89 00 00 00 	sbbl   \$0x89,\(%bx\)
>   *[a-f0-9]+:	66 81 1f 34 12 00 00 	sbbl   \$0x1234,\(%bx\)
>   *[a-f0-9]+:	66 af                	scas   %es:\(%di\),%eax
>   *[a-f0-9]+:	66 af                	scas   %es:\(%di\),%eax
> - *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
> + *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 27 02          	shll   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 27             	shll   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
> - *[a-f0-9]+:	66 d1 3f             	sarl   \(%bx\)
> + *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 3f             	sarl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 3f 02          	sarl   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 3f             	sarl   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 3f             	sarl   \(%bx\)
> - *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
> + *[a-f0-9]+:	66 d1 3f             	sarl   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 27 02          	shll   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 27             	shll   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 27             	shll   \(%bx\)
> - *[a-f0-9]+:	66 d1 2f             	shrl   \(%bx\)
> + *[a-f0-9]+:	66 d1 27             	shll   \$1,\(%bx\)
> + *[a-f0-9]+:	66 d1 2f             	shrl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 c1 2f 02          	shrl   \$0x2,\(%bx\)
>   *[a-f0-9]+:	66 d3 2f             	shrl   %cl,\(%bx\)
> - *[a-f0-9]+:	66 d1 2f             	shrl   \(%bx\)
> + *[a-f0-9]+:	66 d1 2f             	shrl   \$1,\(%bx\)
>   *[a-f0-9]+:	66 ab                	stos   %eax,%es:\(%di\)
>   *[a-f0-9]+:	66 ab                	stos   %eax,%es:\(%di\)
>   *[a-f0-9]+:	66 83 2f 01          	subl   \$0x1,\(%bx\)
> diff --git a/gas/testsuite/gas/i386/noreg16.d
> b/gas/testsuite/gas/i386/noreg16.d
> index 86f852fb4ca..e4149b03a6e 100644
> --- a/gas/testsuite/gas/i386/noreg16.d
> +++ b/gas/testsuite/gas/i386/noreg16.d
> @@ -95,43 +95,43 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 27          	ptwrite \(%bx\)
>   *[a-f0-9]+:	ff 37                	push   \(%bx\)
>   *[a-f0-9]+:	06                   	push   %es
> - *[a-f0-9]+:	d1 17                	rclw   \(%bx\)
> + *[a-f0-9]+:	d1 17                	rclw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 17 02             	rclw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 17                	rclw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 17                	rclw   \(%bx\)
> - *[a-f0-9]+:	d1 1f                	rcrw   \(%bx\)
> + *[a-f0-9]+:	d1 17                	rclw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 1f                	rcrw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 1f 02             	rcrw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 1f                	rcrw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 1f                	rcrw   \(%bx\)
> - *[a-f0-9]+:	d1 07                	rolw   \(%bx\)
> + *[a-f0-9]+:	d1 1f                	rcrw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 07                	rolw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 07 02             	rolw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 07                	rolw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 07                	rolw   \(%bx\)
> - *[a-f0-9]+:	d1 0f                	rorw   \(%bx\)
> + *[a-f0-9]+:	d1 07                	rolw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 0f                	rorw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 0f 02             	rorw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 0f                	rorw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 0f                	rorw   \(%bx\)
> + *[a-f0-9]+:	d1 0f                	rorw   \$1,\(%bx\)
>   *[a-f0-9]+:	83 1f 01             	sbbw   \$0x1,\(%bx\)
>   *[a-f0-9]+:	81 1f 89 00          	sbbw   \$0x89,\(%bx\)
>   *[a-f0-9]+:	81 1f 34 12          	sbbw   \$0x1234,\(%bx\)
>   *[a-f0-9]+:	af                   	scas   %es:\(%di\),%ax
>   *[a-f0-9]+:	af                   	scas   %es:\(%di\),%ax
> - *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
> + *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 27 02             	shlw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 27                	shlw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
> - *[a-f0-9]+:	d1 3f                	sarw   \(%bx\)
> + *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 3f                	sarw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 3f 02             	sarw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 3f                	sarw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 3f                	sarw   \(%bx\)
> - *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
> + *[a-f0-9]+:	d1 3f                	sarw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 27 02             	shlw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 27                	shlw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 27                	shlw   \(%bx\)
> - *[a-f0-9]+:	d1 2f                	shrw   \(%bx\)
> + *[a-f0-9]+:	d1 27                	shlw   \$1,\(%bx\)
> + *[a-f0-9]+:	d1 2f                	shrw   \$1,\(%bx\)
>   *[a-f0-9]+:	c1 2f 02             	shrw   \$0x2,\(%bx\)
>   *[a-f0-9]+:	d3 2f                	shrw   %cl,\(%bx\)
> - *[a-f0-9]+:	d1 2f                	shrw   \(%bx\)
> + *[a-f0-9]+:	d1 2f                	shrw   \$1,\(%bx\)
>   *[a-f0-9]+:	ab                   	stos   %ax,%es:\(%di\)
>   *[a-f0-9]+:	ab                   	stos   %ax,%es:\(%di\)
>   *[a-f0-9]+:	83 2f 01             	subw   \$0x1,\(%bx\)
> diff --git a/gas/testsuite/gas/i386/noreg32-data16.d
> b/gas/testsuite/gas/i386/noreg32-data16.d
> index 1ec6b9e8670..e3ae2116bb1 100644
> --- a/gas/testsuite/gas/i386/noreg32-data16.d
> +++ b/gas/testsuite/gas/i386/noreg32-data16.d
> @@ -103,44 +103,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 20          	ptwrite \(%eax\)
>   *[a-f0-9]+:	66 ff 30             	pushw  \(%eax\)
>   *[a-f0-9]+:	66 06                	pushw  %es
> - *[a-f0-9]+:	66 d1 10             	rclw   \(%eax\)
> + *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 10 02          	rclw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 10             	rclw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 10             	rclw   \(%eax\)
> - *[a-f0-9]+:	66 d1 18             	rcrw   \(%eax\)
> + *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 18 02          	rcrw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 18             	rcrw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 18             	rcrw   \(%eax\)
> - *[a-f0-9]+:	66 d1 00             	rolw   \(%eax\)
> + *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 00 02          	rolw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 00             	rolw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 00             	rolw   \(%eax\)
> - *[a-f0-9]+:	66 d1 08             	rorw   \(%eax\)
> + *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 08 02          	rorw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 08             	rorw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 08             	rorw   \(%eax\)
> + *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 83 18 01          	sbbw   \$0x1,\(%eax\)
>   *[a-f0-9]+:	66 81 18 89 00       	sbbw   \$0x89,\(%eax\)
>   *[a-f0-9]+:	66 81 18 34 12       	sbbw   \$0x1234,\(%eax\)
>   *[a-f0-9]+:	66 81 18 78 56       	sbbw   \$0x5678,\(%eax\)
>   *[a-f0-9]+:	66 af                	scas   %es:\(%edi\),%ax
>   *[a-f0-9]+:	66 af                	scas   %es:\(%edi\),%ax
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
> - *[a-f0-9]+:	66 d1 38             	sarw   \(%eax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 38 02          	sarw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 38             	sarw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 38             	sarw   \(%eax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
> + *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%eax\)
> - *[a-f0-9]+:	66 d1 28             	shrw   \(%eax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%eax\)
> + *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 c1 28 02          	shrw   \$0x2,\(%eax\)
>   *[a-f0-9]+:	66 d3 28             	shrw   %cl,\(%eax\)
> - *[a-f0-9]+:	66 d1 28             	shrw   \(%eax\)
> + *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%eax\)
>   *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%edi\)
>   *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%edi\)
>   *[a-f0-9]+:	66 83 28 01          	subw   \$0x1,\(%eax\)
> diff --git a/gas/testsuite/gas/i386/noreg32.d
> b/gas/testsuite/gas/i386/noreg32.d
> index 9dbef908ce7..8bb08ca73c6 100644
> --- a/gas/testsuite/gas/i386/noreg32.d
> +++ b/gas/testsuite/gas/i386/noreg32.d
> @@ -101,44 +101,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 20          	ptwrite \(%eax\)
>   *[a-f0-9]+:	ff 30                	push   \(%eax\)
>   *[a-f0-9]+:	06                   	push   %es
> - *[a-f0-9]+:	d1 10                	rcll   \(%eax\)
> + *[a-f0-9]+:	d1 10                	rcll   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 10 02             	rcll   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 10                	rcll   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 10                	rcll   \(%eax\)
> - *[a-f0-9]+:	d1 18                	rcrl   \(%eax\)
> + *[a-f0-9]+:	d1 10                	rcll   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 18 02             	rcrl   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 18                	rcrl   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 18                	rcrl   \(%eax\)
> - *[a-f0-9]+:	d1 00                	roll   \(%eax\)
> + *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 00                	roll   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 00 02             	roll   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 00                	roll   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 00                	roll   \(%eax\)
> - *[a-f0-9]+:	d1 08                	rorl   \(%eax\)
> + *[a-f0-9]+:	d1 00                	roll   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 08                	rorl   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 08 02             	rorl   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 08                	rorl   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 08                	rorl   \(%eax\)
> + *[a-f0-9]+:	d1 08                	rorl   \$1,\(%eax\)
>   *[a-f0-9]+:	83 18 01             	sbbl   \$0x1,\(%eax\)
>   *[a-f0-9]+:	81 18 89 00 00 00    	sbbl   \$0x89,\(%eax\)
>   *[a-f0-9]+:	81 18 34 12 00 00    	sbbl   \$0x1234,\(%eax\)
>   *[a-f0-9]+:	81 18 78 56 34 12    	sbbl   \$0x12345678,\(%eax\)
>   *[a-f0-9]+:	af                   	scas   %es:\(%edi\),%eax
>   *[a-f0-9]+:	af                   	scas   %es:\(%edi\),%eax
> - *[a-f0-9]+:	d1 20                	shll   \(%eax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 20                	shll   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%eax\)
> - *[a-f0-9]+:	d1 38                	sarl   \(%eax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 38                	sarl   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 38 02             	sarl   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 38                	sarl   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 38                	sarl   \(%eax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%eax\)
> + *[a-f0-9]+:	d1 38                	sarl   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 20                	shll   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%eax\)
> - *[a-f0-9]+:	d1 28                	shrl   \(%eax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%eax\)
> + *[a-f0-9]+:	d1 28                	shrl   \$1,\(%eax\)
>   *[a-f0-9]+:	c1 28 02             	shrl   \$0x2,\(%eax\)
>   *[a-f0-9]+:	d3 28                	shrl   %cl,\(%eax\)
> - *[a-f0-9]+:	d1 28                	shrl   \(%eax\)
> + *[a-f0-9]+:	d1 28                	shrl   \$1,\(%eax\)
>   *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%edi\)
>   *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%edi\)
>   *[a-f0-9]+:	83 28 01             	subl   \$0x1,\(%eax\)
> diff --git a/gas/testsuite/gas/i386/noreg64-data16.d
> b/gas/testsuite/gas/i386/noreg64-data16.d
> index f1e67096a58..802eb4053d3 100644
> --- a/gas/testsuite/gas/i386/noreg64-data16.d
> +++ b/gas/testsuite/gas/i386/noreg64-data16.d
> @@ -106,44 +106,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	66 0f a1             	popw   %fs
>   *[a-f0-9]+:	66 ff 30             	pushw  \(%rax\)
>   *[a-f0-9]+:	66 0f a0             	pushw  %fs
> - *[a-f0-9]+:	66 d1 10             	rclw   \(%rax\)
> + *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 10 02          	rclw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 10             	rclw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 10             	rclw   \(%rax\)
> - *[a-f0-9]+:	66 d1 18             	rcrw   \(%rax\)
> + *[a-f0-9]+:	66 d1 10             	rclw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 18 02          	rcrw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 18             	rcrw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 18             	rcrw   \(%rax\)
> - *[a-f0-9]+:	66 d1 00             	rolw   \(%rax\)
> + *[a-f0-9]+:	66 d1 18             	rcrw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 00 02          	rolw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 00             	rolw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 00             	rolw   \(%rax\)
> - *[a-f0-9]+:	66 d1 08             	rorw   \(%rax\)
> + *[a-f0-9]+:	66 d1 00             	rolw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 08 02          	rorw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 08             	rorw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 08             	rorw   \(%rax\)
> + *[a-f0-9]+:	66 d1 08             	rorw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 83 18 01          	sbbw   \$0x1,\(%rax\)
>   *[a-f0-9]+:	66 81 18 89 00       	sbbw   \$0x89,\(%rax\)
>   *[a-f0-9]+:	66 81 18 34 12       	sbbw   \$0x1234,\(%rax\)
>   *[a-f0-9]+:	66 81 18 78 56       	sbbw   \$0x5678,\(%rax\)
>   *[a-f0-9]+:	66 af                	scas   %es:\(%rdi\),%ax
>   *[a-f0-9]+:	66 af                	scas   %es:\(%rdi\),%ax
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
> - *[a-f0-9]+:	66 d1 38             	sarw   \(%rax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 38 02          	sarw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 38             	sarw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 38             	sarw   \(%rax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
> + *[a-f0-9]+:	66 d1 38             	sarw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 20 02          	shlw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 20             	shlw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 20             	shlw   \(%rax\)
> - *[a-f0-9]+:	66 d1 28             	shrw   \(%rax\)
> + *[a-f0-9]+:	66 d1 20             	shlw   \$1,\(%rax\)
> + *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 c1 28 02          	shrw   \$0x2,\(%rax\)
>   *[a-f0-9]+:	66 d3 28             	shrw   %cl,\(%rax\)
> - *[a-f0-9]+:	66 d1 28             	shrw   \(%rax\)
> + *[a-f0-9]+:	66 d1 28             	shrw   \$1,\(%rax\)
>   *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%rdi\)
>   *[a-f0-9]+:	66 ab                	stos   %ax,%es:\(%rdi\)
>   *[a-f0-9]+:	66 83 28 01          	subw   \$0x1,\(%rax\)
> diff --git a/gas/testsuite/gas/i386/noreg64-rex64.d
> b/gas/testsuite/gas/i386/noreg64-rex64.d
> index cd8679e626a..e33851d8093 100644
> --- a/gas/testsuite/gas/i386/noreg64-rex64.d
> +++ b/gas/testsuite/gas/i386/noreg64-rex64.d
> @@ -105,44 +105,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 48 0f ae 20       	ptwriteq \(%rax\)
>   *[a-f0-9]+:	48 ff 30             	rex\.W push \(%rax\)
>   *[a-f0-9]+:	48 0f a0             	rex\.W push %fs
> - *[a-f0-9]+:	48 d1 10             	rclq   \(%rax\)
> + *[a-f0-9]+:	48 d1 10             	rclq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 10 02          	rclq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 10             	rclq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 10             	rclq   \(%rax\)
> - *[a-f0-9]+:	48 d1 18             	rcrq   \(%rax\)
> + *[a-f0-9]+:	48 d1 10             	rclq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 18             	rcrq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 18 02          	rcrq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 18             	rcrq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 18             	rcrq   \(%rax\)
> - *[a-f0-9]+:	48 d1 00             	rolq   \(%rax\)
> + *[a-f0-9]+:	48 d1 18             	rcrq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 00             	rolq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 00 02          	rolq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 00             	rolq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 00             	rolq   \(%rax\)
> - *[a-f0-9]+:	48 d1 08             	rorq   \(%rax\)
> + *[a-f0-9]+:	48 d1 00             	rolq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 08             	rorq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 08 02          	rorq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 08             	rorq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 08             	rorq   \(%rax\)
> + *[a-f0-9]+:	48 d1 08             	rorq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 83 18 01          	sbbq   \$0x1,\(%rax\)
>   *[a-f0-9]+:	48 81 18 89 00 00 00 	sbbq   \$0x89,\(%rax\)
>   *[a-f0-9]+:	48 81 18 34 12 00 00 	sbbq   \$0x1234,\(%rax\)
>   *[a-f0-9]+:	48 81 18 78 56 34 12 	sbbq   \$0x12345678,\(%rax\)
>   *[a-f0-9]+:	48 af                	scas   %es:\(%rdi\),%rax
>   *[a-f0-9]+:	48 af                	scas   %es:\(%rdi\),%rax
> - *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
> + *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 20 02          	shlq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 20             	shlq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
> - *[a-f0-9]+:	48 d1 38             	sarq   \(%rax\)
> + *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 38             	sarq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 38 02          	sarq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 38             	sarq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 38             	sarq   \(%rax\)
> - *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
> + *[a-f0-9]+:	48 d1 38             	sarq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 20 02          	shlq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 20             	shlq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 20             	shlq   \(%rax\)
> - *[a-f0-9]+:	48 d1 28             	shrq   \(%rax\)
> + *[a-f0-9]+:	48 d1 20             	shlq   \$1,\(%rax\)
> + *[a-f0-9]+:	48 d1 28             	shrq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 c1 28 02          	shrq   \$0x2,\(%rax\)
>   *[a-f0-9]+:	48 d3 28             	shrq   %cl,\(%rax\)
> - *[a-f0-9]+:	48 d1 28             	shrq   \(%rax\)
> + *[a-f0-9]+:	48 d1 28             	shrq   \$1,\(%rax\)
>   *[a-f0-9]+:	48 ab                	stos   %rax,%es:\(%rdi\)
>   *[a-f0-9]+:	48 ab                	stos   %rax,%es:\(%rdi\)
>   *[a-f0-9]+:	48 83 28 01          	subq   \$0x1,\(%rax\)
> diff --git a/gas/testsuite/gas/i386/noreg64.d
> b/gas/testsuite/gas/i386/noreg64.d
> index 354d89069ae..2afdef38f92 100644
> --- a/gas/testsuite/gas/i386/noreg64.d
> +++ b/gas/testsuite/gas/i386/noreg64.d
> @@ -107,44 +107,44 @@ Disassembly of section .text:
>   *[a-f0-9]+:	f3 0f ae 20          	ptwritel \(%rax\)
>   *[a-f0-9]+:	ff 30                	push   \(%rax\)
>   *[a-f0-9]+:	0f a0                	push   %fs
> - *[a-f0-9]+:	d1 10                	rcll   \(%rax\)
> + *[a-f0-9]+:	d1 10                	rcll   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 10 02             	rcll   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 10                	rcll   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 10                	rcll   \(%rax\)
> - *[a-f0-9]+:	d1 18                	rcrl   \(%rax\)
> + *[a-f0-9]+:	d1 10                	rcll   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 18 02             	rcrl   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 18                	rcrl   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 18                	rcrl   \(%rax\)
> - *[a-f0-9]+:	d1 00                	roll   \(%rax\)
> + *[a-f0-9]+:	d1 18                	rcrl   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 00                	roll   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 00 02             	roll   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 00                	roll   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 00                	roll   \(%rax\)
> - *[a-f0-9]+:	d1 08                	rorl   \(%rax\)
> + *[a-f0-9]+:	d1 00                	roll   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 08                	rorl   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 08 02             	rorl   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 08                	rorl   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 08                	rorl   \(%rax\)
> + *[a-f0-9]+:	d1 08                	rorl   \$1,\(%rax\)
>   *[a-f0-9]+:	83 18 01             	sbbl   \$0x1,\(%rax\)
>   *[a-f0-9]+:	81 18 89 00 00 00    	sbbl   \$0x89,\(%rax\)
>   *[a-f0-9]+:	81 18 34 12 00 00    	sbbl   \$0x1234,\(%rax\)
>   *[a-f0-9]+:	81 18 78 56 34 12    	sbbl   \$0x12345678,\(%rax\)
>   *[a-f0-9]+:	af                   	scas   %es:\(%rdi\),%eax
>   *[a-f0-9]+:	af                   	scas   %es:\(%rdi\),%eax
> - *[a-f0-9]+:	d1 20                	shll   \(%rax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 20                	shll   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%rax\)
> - *[a-f0-9]+:	d1 38                	sarl   \(%rax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 38                	sarl   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 38 02             	sarl   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 38                	sarl   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 38                	sarl   \(%rax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%rax\)
> + *[a-f0-9]+:	d1 38                	sarl   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 20 02             	shll   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 20                	shll   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 20                	shll   \(%rax\)
> - *[a-f0-9]+:	d1 28                	shrl   \(%rax\)
> + *[a-f0-9]+:	d1 20                	shll   \$1,\(%rax\)
> + *[a-f0-9]+:	d1 28                	shrl   \$1,\(%rax\)
>   *[a-f0-9]+:	c1 28 02             	shrl   \$0x2,\(%rax\)
>   *[a-f0-9]+:	d3 28                	shrl   %cl,\(%rax\)
> - *[a-f0-9]+:	d1 28                	shrl   \(%rax\)
> + *[a-f0-9]+:	d1 28                	shrl   \$1,\(%rax\)
>   *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%rdi\)
>   *[a-f0-9]+:	ab                   	stos   %eax,%es:\(%rdi\)
>   *[a-f0-9]+:	83 28 01             	subl   \$0x1,\(%rax\)
> diff --git a/gas/testsuite/gas/i386/opcode-suffix.d
> b/gas/testsuite/gas/i386/opcode-suffix.d
> index 946a0a4d7a0..ca6af50c9cf 100644
> --- a/gas/testsuite/gas/i386/opcode-suffix.d
> +++ b/gas/testsuite/gas/i386/opcode-suffix.d
> @@ -206,8 +206,8 @@ Disassembly of section .text:
>   *[0-9a-f]+:	cd 90[ 	]+int[ 	]+\$0x90
>   *[0-9a-f]+:	ce[ 	]+into
>   *[0-9a-f]+:	cf[ 	]+iretl
> - *[0-9a-f]+:	d0 90 90 90 90 90[ 	]+rclb[ 	]+-0x6f6f6f70\(%eax\)
> - *[0-9a-f]+:	d1 90 90 90 90 90[ 	]+rcll[ 	]+-0x6f6f6f70\(%eax\)
> + *[0-9a-f]+:	d0 90 90 90 90 90[ 	]+rclb[ 	]+\$1,-0x6f6f6f70\(%eax\)
> + *[0-9a-f]+:	d1 90 90 90 90 90[ 	]+rcll[ 	]+\$1,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	d2 90 90 90 90 90[ 	]+rclb[ 	]+%cl,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	d3 90 90 90 90 90[ 	]+rcll[ 	]+%cl,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	d4 90[ 	]+aam[ 	]+\$0x90
> @@ -523,7 +523,7 @@ Disassembly of section .text:
>   *[0-9a-f]+:	66 ca 90 90[ 	]+lretw[ 	]+\$0x9090
>   *[0-9a-f]+:	66 cb[ 	]+lretw
>   *[0-9a-f]+:	66 cf[ 	]+iretw
> - *[0-9a-f]+:	66 d1 90 90 90 90 90[ 	]+rclw[ 	]+-0x6f6f6f70\(%eax\)
> + *[0-9a-f]+:	66 d1 90 90 90 90 90[ 	]+rclw[ 	]+\$1,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	66 d3 90 90 90 90 90[ 	]+rclw[ 	]+%cl,-0x6f6f6f70\(%eax\)
>   *[0-9a-f]+:	66 e5 90[ 	]+inw[ 	]+\$0x90,%ax
>   *[0-9a-f]+:	66 e7 90[ 	]+outw[ 	]+%ax,\$0x90
> diff --git a/gas/testsuite/gas/i386/opcode.d
> b/gas/testsuite/gas/i386/opcode.d index 7631195d8d4..f7af22518e2 100644
> --- a/gas/testsuite/gas/i386/opcode.d
> +++ b/gas/testsuite/gas/i386/opcode.d
> @@ -205,8 +205,8 @@ Disassembly of section .text:
>   279:	cd 90 [ 	]*int    \$0x90
>   27b:	ce [ 	]*into
>   27c:	cf [ 	]*iret
> - 27d:	d0 90 90 90 90 90 [ 	]*rclb   -0x6f6f6f70\(%eax\)
> - 283:	d1 90 90 90 90 90 [ 	]*rcll   -0x6f6f6f70\(%eax\)
> + 27d:	d0 90 90 90 90 90 [ 	]*rclb   \$1,-0x6f6f6f70\(%eax\)
> + 283:	d1 90 90 90 90 90 [ 	]*rcll   \$1,-0x6f6f6f70\(%eax\)
>   289:	d2 90 90 90 90 90 [ 	]*rclb   %cl,-0x6f6f6f70\(%eax\)
>   28f:	d3 90 90 90 90 90 [ 	]*rcll   %cl,-0x6f6f6f70\(%eax\)
>   295:	d4 90 [ 	]*aam    \$0x90
> @@ -522,7 +522,7 @@ Disassembly of section .text:
>   869:	66 ca 90 90 [ 	]*lretw  \$0x9090
>   86d:	66 cb [ 	]*lretw
>   86f:	66 cf [ 	]*iretw
> - 871:	66 d1 90 90 90 90 90 [ 	]*rclw   -0x6f6f6f70\(%eax\)
> + 871:	66 d1 90 90 90 90 90 [ 	]*rclw   \$1,-0x6f6f6f70\(%eax\)
>   878:	66 d3 90 90 90 90 90 [ 	]*rclw   %cl,-0x6f6f6f70\(%eax\)
>   87f:	66 e5 90 [ 	]*in     \$0x90,%ax
>   882:	66 e7 90 [ 	]*out    %ax,\$0x90
> @@ -610,8 +610,8 @@ Disassembly of section .text:
>   +[a-f0-9]+:	f7 c9 04 00 00 00    	test   \$(0x)?0*4,%ecx
>   +[a-f0-9]+:	c0 f0 02             	shl    \$0x2,%al
>   +[a-f0-9]+:	c1 f0 01             	shl    \$0x1,%eax
> - +[a-f0-9]+:	d0 f0                	shl    %al
> - +[a-f0-9]+:	d1 f0                	shl    %eax
> + +[a-f0-9]+:	d0 f0                	shl    \$1,%al
> + +[a-f0-9]+:	d1 f0                	shl    \$1,%eax
>   +[a-f0-9]+:	d2 f0                	shl    %cl,%al
>   +[a-f0-9]+:	d3 f0                	shl    %cl,%eax
>  #pass
> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d
> b/gas/testsuite/gas/i386/x86-64-lfence-load.d
> index b4a03db811d..726236826e8 100644
> --- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
> +++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
> @@ -90,7 +90,7 @@ Disassembly of section .text:
>   +[a-f0-9]+:	0f ae e8             	lfence
>   +[a-f0-9]+:	58                   	pop    %rax
>   +[a-f0-9]+:	0f ae e8             	lfence
> - +[a-f0-9]+:	66 d1 11             	rclw   \(%rcx\)
> + +[a-f0-9]+:	66 d1 11             	rclw   \$1,\(%rcx\)
>   +[a-f0-9]+:	0f ae e8             	lfence
>   +[a-f0-9]+:	f7 01 01 00 00 00    	testl  \$0x1,\(%rcx\)
>   +[a-f0-9]+:	0f ae e8             	lfence
> diff --git a/gas/testsuite/gas/i386/x86-64-opcode.d
> b/gas/testsuite/gas/i386/x86-64-opcode.d
> index ee6d0f5f4bd..1b8a9fa9014 100644
> --- a/gas/testsuite/gas/i386/x86-64-opcode.d
> +++ b/gas/testsuite/gas/i386/x86-64-opcode.d
> @@ -335,9 +335,9 @@ Disassembly of section .text:
>  [ 	]*[a-f0-9]+:	c0 f0 02             	shl    \$0x2,%al
>  [ 	]*[a-f0-9]+:	c1 f0 01             	shl    \$0x1,%eax
>  [ 	]*[a-f0-9]+:	48 c1 f0 01          	shl    \$0x1,%rax
> -[ 	]*[a-f0-9]+:	d0 f0                	shl    %al
> -[ 	]*[a-f0-9]+:	d1 f0                	shl    %eax
> -[ 	]*[a-f0-9]+:	48 d1 f0             	shl    %rax
> +[ 	]*[a-f0-9]+:	d0 f0                	shl    \$1,%al
> +[ 	]*[a-f0-9]+:	d1 f0                	shl    \$1,%eax
> +[ 	]*[a-f0-9]+:	48 d1 f0             	shl    \$1,%rax
>  [ 	]*[a-f0-9]+:	d2 f0                	shl    %cl,%al
>  [ 	]*[a-f0-9]+:	d3 f0                	shl    %cl,%eax
>  [ 	]*[a-f0-9]+:	48 d3 f0             	shl    %cl,%rax
> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c index
> 2e2043d467b..e432b61a6cd 100644
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -12090,6 +12090,8 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
>      case const_1_mode:
>        if (ins->intel_syntax)
>  	oappend (ins, "1");
> +      else
> +	oappend (ins, "$1");
>        return true;
>      default:
>        oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2023-12-12  5:53     ` Cui, Lili
@ 2023-12-12  8:28       ` Jan Beulich
  0 siblings, 0 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-12  8:28 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, Kong, Lingling, binutils

On 12.12.2023 06:53, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> --- a/opcodes/i386-dis-evex-reg.h
>>> +++ b/opcodes/i386-dis-evex-reg.h
>>> @@ -56,3 +56,58 @@
>>>      { "blsmskS",	{ VexGdq, Edq }, 0 },
>>>      { "blsiS",	{ VexGdq, Edq }, 0 },
>>>    },
>>> +  /* REG_EVEX_MAP4_80 */
>>> +  {
>>> +    { "addA",	{ VexGb, Eb, Ib }, NO_PREFIX },
>>> +    { "orA",	{ VexGb, Eb, Ib }, NO_PREFIX },
>>> +    { "adcA",	{ VexGb, Eb, Ib }, NO_PREFIX },
>>> +    { "sbbA",	{ VexGb, Eb, Ib }, NO_PREFIX },
>>> +    { "andA",	{ VexGb, Eb, Ib }, NO_PREFIX },
>>> +    { "subA",	{ VexGb, Eb, Ib }, NO_PREFIX },
>>> +    { "xorA",	{ VexGb, Eb, Ib }, NO_PREFIX },
>>
>> Don't these need to use PREFIX_NP_OR_DATA? The doc clearly says
>> ".IGNORED" there. (Applies to other byte ops as well then, of course.)
>>
> 
> I'm confused here, "IGNORED" means the W bit in the EVEX payload is ignored. Why is 0x66 allowed?

Hmm, looks like I have been confused. Earlier communication had led me to
the impression that pp == 0b01 would be ignored here. In fact I had my
own disassembler library the other way originally, and I then changed it
to this model. Looks like I need to change it back. And I'm sorry for
causing confusion here.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-12-12  3:18     ` Hu, Lin1
@ 2023-12-12  8:41       ` Jan Beulich
  2023-12-13  5:31         ` Hu, Lin1
  2023-12-12  8:45       ` Jan Beulich
  1 sibling, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-12  8:41 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, binutils, Cui, Lili

On 12.12.2023 04:18, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, December 11, 2023 8:28 PM
>>
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> @@ -7675,6 +7727,61 @@ match_template (char mnem_suffix)
>>>  	  i.memshift = memshift;
>>>  	}
>>>
>>> +      /* If we can optimize a NDD insn to legacy insn, like
>>> +	 add %r16, %r8, %r8 -> add %r16, %r8,
>>> +	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
>>> +	 Note that the semantics have not been changed.  */
>>> +      if (optimize
>>> +	  && !i.no_optimize
>>> +	  && i.vec_encoding != vex_encoding_evex
>>> +	  && t + 1 < current_templates->end
>>> +	  && !t[1].opcode_modifier.evex
>>> +	  && t[1].opcode_space <= SPACE_0F38
>>> +	  && t->opcode_modifier.vexvvvv == VexVVVV_DST)
>>> +	{
>>> +	  unsigned int match_dest_op = can_convert_NDD_to_legacy (t);
>>> +	  size_match = true;
>>
>> This would perhaps better ...
>>
>>> +	  if (match_dest_op != (unsigned int) ~0)
>>> +	    {
>>
>> ... live here
>>
> 
> OK.
>  
>>
>>> +	      /* We ensure that the next template has the same input
>>> +		 operands as the original matching template by the first
>>> +		 opernd (ATT), thus avoiding the error caused by the wrong
>> order
>>> +		 of insns in i386.tbl.  */
>>
>> I'm sorry, but I (still) can't make sense of this last part of the comment, after the
>> comma.
>>
> 
> I mean if someone support new NDD insns and put it in the wrong position, so the part will try to avoid to optimize the insn.

If this is about hypothetical new templates, that would want saying so in
the comment. Thus clarifying that there's no functional effect right now.
I wonder what H.J.'s view on such effectively dead code is.

However, there's a bigger problem with this patch as I realized only a
few minutes ago when looking into Lili's reply on the NDD patch thread:
NDD insns are implicitly zero-upper. Hence converting NDD to legacy insns
needs to be limited to 32- and 64-bit operand size. For 8- and 16-bit
operand size the results would differ, which isn't acceptable under any
-O<n>. It may be okay to do such a conversion even for the smaller sizes,
but then under a separate option explicitly permitting such a functional
difference.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-12-12  3:18     ` Hu, Lin1
  2023-12-12  8:41       ` Jan Beulich
@ 2023-12-12  8:45       ` Jan Beulich
  2023-12-13  6:06         ` Hu, Lin1
  1 sibling, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-12  8:45 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, binutils, Cui, Lili

On 12.12.2023 04:18, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, December 11, 2023 8:28 PM
>>
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> --- a/gas/config/tc-i386.c
>>> +++ b/gas/config/tc-i386.c
>>> @@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t)
>>>    return 0;
>>>  }
>>>
>>> +/* Check if the instruction use the REX registers.  */ static bool
>>> +check_RexOperands () {
>>> +  for (unsigned int op = 0; op < i.operands; op++)
>>> +    {
>>> +      if (i.types[op].bitfield.class != Reg)
>>> +	continue;
>>> +
>>> +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
>>> +	return true;
>>> +    }
>>> +
>>> +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
>>> +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
>>> +    return true;
>>> +
>>> +  /* Check pseudo prefix {rex} are valid.  */  return i.rex_encoding;
>>
>> Can this actually happen, when we're converting from EVEX to legacy?
>> (Initially I wanted to ask about "rex" and alike prefixes, i.e. the non- pseudo
>> ones.)
>>
> 
> This is to align with check_EgprOperands. I hope the function be more general. Not just for this optimization problem.

But then the comment shouldn't say "REX registers", and "Operands" in
its name isn't quite right either.

Also you want to make the function be a proper modern declaration, by
adding "void" between the parentheses.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-11  8:34       ` Jan Beulich
@ 2023-12-12 10:44         ` Cui, Lili
  2023-12-12 11:16           ` Jan Beulich
  2023-12-12 12:58         ` Cui, Lili
  1 sibling, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-12 10:44 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
> >>>
> >>>    /* Dual VEX/EVEX templates need stripping one of the possible
> variants.  */
> >>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>> -  {
> >>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>> -	   || maybe_cpu (t, CpuFMA))
> >>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> >>> +    {
> >>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >> APX_F(CpuCMPCCXADD)
> >>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >> APX_F(CpuAVX512DQ)
> >>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >>>  	{
> >>>  	  if (need_evex_encoding ())
> >>
> >> There are several issues here:
> >> - Why did you need to change (to the worse) the original code?
> >> - Why did you not model the addition after that original code?
> >> - How come APX_F (CpuAVX512*) constructs appear here, when no
> AVX512
> >> insn can be VEX-encoded?
> >
> >  I don't understand what you mean, we have this combination.
> >
> > kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> > Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> > RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> 
> Oh, I'm sorry: I forgot about the mask register insns.
> 
> >> - If these new macros are really needed for whatever reason, they
> shouldn't
> >>   be added to opcodes/i386-opc.h when they're useful only in the
> assembler.
> >> - Style requires a blank before the opening parenthesis in function
> >>   invocations (which also covers function-like macro invocations).
> >>
> >> I think I asked before: How is it that you get away without altering
> >> cpu_flags_match(), containing related and quite similar logic?
> >>
> >
> > For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket
> and the content in the following brackets can be combined arbitrarily. I think
> it is Inaccurate.
> 
> In which way? If there are issues with the existing code, these issues want
> taking care of in separate (prereq) patches. Of course there are assumptions
> made here about the CPU combinations that can (and cannot) occur in any of
> our templates. Similar assumptions are imo fine to make in the APX additions.
> 
> Note how I used two nested if()s despite that not having been necessary at
> that time. I did so in anticipation that for APX you'd want to add another
> (separate) inner if(), rather than altering the one that's there.
> 
Hi Jan, 

Could we remove the CPU check here? it's a bit ugly and has limited effectiveness.

  if (t->opcode_modifier.vex && t->opcode_modifier.evex)
    {
      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
          || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
          || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
          || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))


> > So I give examples one by one for each identified combination.
> 
> Which examples are you talking about? I see none given in your reply.
> 

Sorry, I want to say "I've listed every possible combination".

> > Just found cpu_flags_match() has similar logic, I think the following is the
> only code related to CPUID alerts, but none of our combinations are related
> to cpuavx.
> >
> >           if (all.bitfield.cpuavx)
> >             {
> >               /* We need to check SSE2AVX with AVX.  */
> >               if (!t->opcode_modifier.sse2avx
> >                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >                 match |= CPU_FLAGS_ARCH_MATCH;
> >             }
> 
> Not sure why you pick out this one. This special case is needed for sse2avx; I
> don't see how it's related here. What I've been pointing you at is the code in
> that function which follows a similar "Dual VEX/EVEX templates ..."
> comment.
> 

I know you're talking about this code, I'm just guessing what it does? Don't know what I missed.

For example 

.arch .nobmi
andn    (%eax), %eax, %eax

---------------------------------------------------------------------------------------------
  if (flag_code != CODE_64BIT)
    active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
  else
    active = cpu_arch_flags;                   ---> cpubmi = 0;
  cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
  if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
    {
    ...
    }    
Return  CPU_FLAGS_64BIT_MATCH
----------------------------------------------------------------------------------------------
Then we will report an arch error.

          if (supported != CPU_FLAGS_PERFECT_MATCH)
            {
              as_bad (_("`%s' is not supported on `%s%s'"),
                      insn_name (current_templates.start),
                      cpu_arch_name ? cpu_arch_name : default_arch,
                      cpu_sub_arch_name ? cpu_sub_arch_name : "");
              return NULL;
            }

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-12 10:44         ` Cui, Lili
@ 2023-12-12 11:16           ` Jan Beulich
  2023-12-12 12:32             ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-12 11:16 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 12.12.2023 11:44, Cui, Lili wrote:
>>>> On 24.11.2023 08:02, Cui, Lili wrote:
>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>>>>>
>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
>> variants.  */
>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>> -  {
>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>>>>> +    {
>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>> APX_F(CpuCMPCCXADD)
>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>> APX_F(CpuAVX512DQ)
>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>>>  	{
>>>>>  	  if (need_evex_encoding ())
>>>>
>>>> There are several issues here:
>>>> - Why did you need to change (to the worse) the original code?
>>>> - Why did you not model the addition after that original code?
>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>> AVX512
>>>> insn can be VEX-encoded?
>>>
>>>  I don't understand what you mean, we have this combination.
>>>
>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>
>> Oh, I'm sorry: I forgot about the mask register insns.
>>
>>>> - If these new macros are really needed for whatever reason, they
>> shouldn't
>>>>   be added to opcodes/i386-opc.h when they're useful only in the
>> assembler.
>>>> - Style requires a blank before the opening parenthesis in function
>>>>   invocations (which also covers function-like macro invocations).
>>>>
>>>> I think I asked before: How is it that you get away without altering
>>>> cpu_flags_match(), containing related and quite similar logic?
>>>>
>>>
>>> For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket
>> and the content in the following brackets can be combined arbitrarily. I think
>> it is Inaccurate.
>>
>> In which way? If there are issues with the existing code, these issues want
>> taking care of in separate (prereq) patches. Of course there are assumptions
>> made here about the CPU combinations that can (and cannot) occur in any of
>> our templates. Similar assumptions are imo fine to make in the APX additions.
>>
>> Note how I used two nested if()s despite that not having been necessary at
>> that time. I did so in anticipation that for APX you'd want to add another
>> (separate) inner if(), rather than altering the one that's there.
> 
> Could we remove the CPU check here? it's a bit ugly and has limited effectiveness.
> 
>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>     {
>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))

I agree on the "a bit ugly" part, but taking what's there right now I
don't understand "has limited effectiveness". Of course you can remove
any code you want, provided you can prove nothing breaks.

>>> So I give examples one by one for each identified combination.
>>
>> Which examples are you talking about? I see none given in your reply.
>>
> 
> Sorry, I want to say "I've listed every possible combination".
> 
>>> Just found cpu_flags_match() has similar logic, I think the following is the
>> only code related to CPUID alerts, but none of our combinations are related
>> to cpuavx.
>>>
>>>           if (all.bitfield.cpuavx)
>>>             {
>>>               /* We need to check SSE2AVX with AVX.  */
>>>               if (!t->opcode_modifier.sse2avx
>>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>>>                 match |= CPU_FLAGS_ARCH_MATCH;
>>>             }
>>
>> Not sure why you pick out this one. This special case is needed for sse2avx; I
>> don't see how it's related here. What I've been pointing you at is the code in
>> that function which follows a similar "Dual VEX/EVEX templates ..."
>> comment.
>>
> 
> I know you're talking about this code, I'm just guessing what it does? Don't know what I missed.

You pulled out this sse2avx code. Hence I was expecting you to tell me
why you consider it relevant here.

> For example 
> 
> .arch .nobmi
> andn    (%eax), %eax, %eax
> 
> ---------------------------------------------------------------------------------------------
>   if (flag_code != CODE_64BIT)
>     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
>   else
>     active = cpu_arch_flags;                   ---> cpubmi = 0;
>   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
>   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
>     {
>     ...
>     }    
> Return  CPU_FLAGS_64BIT_MATCH
> ----------------------------------------------------------------------------------------------
> Then we will report an arch error.
> 
>           if (supported != CPU_FLAGS_PERFECT_MATCH)
>             {
>               as_bad (_("`%s' is not supported on `%s%s'"),
>                       insn_name (current_templates.start),
>                       cpu_arch_name ? cpu_arch_name : default_arch,
>                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
>               return NULL;
>             }

Which is what we want, I think (for the particular example you picked)? Yet
again, I don't think I can see what you're trying to tell me. I also have
to confess I've lost track of whether we're discussing install_template(),
cpu_flag_match(), or both. For example in install_template() you may
indeed be able to get away with little or no changes, as long as there's
no used features tracking for APX (see the early ELF-specific part of
output_insn()). Things would be somewhat inconsistent then, but that may be
tolerable (as long as properly justified in the patch description). Not
getting this into proper shape right with the introduction of APX may bite
us later, though.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-12 11:16           ` Jan Beulich
@ 2023-12-12 12:32             ` Cui, Lili
  2023-12-12 12:39               ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-12 12:32 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> >>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
> >>>>>
> >>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
> >> variants.  */
> >>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>> -  {
> >>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>>>> -	   || maybe_cpu (t, CpuFMA))
> >>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> >>>>> +    {
> >>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>> APX_F(CpuCMPCCXADD)
> >>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>> APX_F(CpuAVX512DQ)
> >>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> APX_F(CpuBMI2))
> >>>>>  	{
> >>>>>  	  if (need_evex_encoding ())
> >>>>
> >>>> There are several issues here:
> >>>> - Why did you need to change (to the worse) the original code?
> >>>> - Why did you not model the addition after that original code?
> >>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
> >> AVX512
> >>>> insn can be VEX-encoded?
> >>>
> >>>  I don't understand what you mean, we have this combination.
> >>>
> >>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> >>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> >>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> >>
> >> Oh, I'm sorry: I forgot about the mask register insns.
> >>
> >>>> - If these new macros are really needed for whatever reason, they
> >> shouldn't
> >>>>   be added to opcodes/i386-opc.h when they're useful only in the
> >> assembler.
> >>>> - Style requires a blank before the opening parenthesis in function
> >>>>   invocations (which also covers function-like macro invocations).
> >>>>
> >>>> I think I asked before: How is it that you get away without
> >>>> altering cpu_flags_match(), containing related and quite similar logic?
> >>>>
> >>>
> >>> For the original logic ( ... || ... ) && ( ... || ...), the content
> >>> in the first bracket
> >> and the content in the following brackets can be combined
> >> arbitrarily. I think it is Inaccurate.
> >>
> >> In which way? If there are issues with the existing code, these
> >> issues want taking care of in separate (prereq) patches. Of course
> >> there are assumptions made here about the CPU combinations that can
> >> (and cannot) occur in any of our templates. Similar assumptions are imo
> fine to make in the APX additions.
> >>
> >> Note how I used two nested if()s despite that not having been
> >> necessary at that time. I did so in anticipation that for APX you'd
> >> want to add another
> >> (separate) inner if(), rather than altering the one that's there.
> >
> > Could we remove the CPU check here? it's a bit ugly and has limited
> effectiveness.
> >
> >   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >     {
> >       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> >           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> >           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> 
> I agree on the "a bit ugly" part, but taking what's there right now I don't
> understand "has limited effectiveness". Of course you can remove any code
> you want, provided you can prove nothing breaks.
> 

Here is install_template().
All I can say is that after removing the CPU check, no test cases failed. I know it's hard to convince you to delete this place, or what do you suggest to do with this? APX requires this, otherwise the test cases will fail.

-      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
-         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
-         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
-         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
-       {

> >>> So I give examples one by one for each identified combination.
> >>
> >> Which examples are you talking about? I see none given in your reply.
> >>
> >
> > Sorry, I want to say "I've listed every possible combination".
> >
> >>> Just found cpu_flags_match() has similar logic, I think the
> >>> following is the
> >> only code related to CPUID alerts, but none of our combinations are
> >> related to cpuavx.
> >>>
> >>>           if (all.bitfield.cpuavx)
> >>>             {
> >>>               /* We need to check SSE2AVX with AVX.  */
> >>>               if (!t->opcode_modifier.sse2avx
> >>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >>>                 match |= CPU_FLAGS_ARCH_MATCH;
> >>>             }
> >>
> >> Not sure why you pick out this one. This special case is needed for
> >> sse2avx; I don't see how it's related here. What I've been pointing
> >> you at is the code in that function which follows a similar "Dual VEX/EVEX
> templates ..."
> >> comment.
> >>
> >
> > I know you're talking about this code, I'm just guessing what it does? Don't
> know what I missed.
> 
> You pulled out this sse2avx code. Hence I was expecting you to tell me why
> you consider it relevant here.
> 
Here is cpu_flag_match().

I rechecked the code, maybe you want to say I missed the outer loop.

      cpu = cpu_flags_and (any, active);
      if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu))
        {
          if (all.bitfield.cpuavx)
            {
              /* We need to check SSE2AVX with AVX.  */
              if (!t->opcode_modifier.sse2avx
                  || (sse2avx && !i.prefix[DATA_PREFIX]))
                match |= CPU_FLAGS_ARCH_MATCH;
            }
          else
            match |= CPU_FLAGS_ARCH_MATCH;
        }

> > For example
> >
> > .arch .nobmi
> > andn    (%eax), %eax, %eax
> >
> > ---------------------------------------------------------------------------------------------
> >   if (flag_code != CODE_64BIT)
> >     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
> >   else
> >     active = cpu_arch_flags;                   ---> cpubmi = 0;
> >   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
> >   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
> >     {
> >     ...
> >     }
> > Return  CPU_FLAGS_64BIT_MATCH
> > ----------------------------------------------------------------------
> > ------------------------
> > Then we will report an arch error.
> >
> >           if (supported != CPU_FLAGS_PERFECT_MATCH)
> >             {
> >               as_bad (_("`%s' is not supported on `%s%s'"),
> >                       insn_name (current_templates.start),
> >                       cpu_arch_name ? cpu_arch_name : default_arch,
> >                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
> >               return NULL;
> >             }
> 
> Which is what we want, I think (for the particular example you picked)? Yet
> again, I don't think I can see what you're trying to tell me. I also have to
> confess I've lost track of whether we're discussing install_template(),
> cpu_flag_match(), or both. For example in install_template() you may indeed
> be able to get away with little or no changes, as long as there's no used
> features tracking for APX (see the early ELF-specific part of output_insn()).
> Things would be somewhat inconsistent then, but that may be tolerable (as
> long as properly justified in the patch description). Not getting this into
> proper shape right with the introduction of APX may bite us later, though.
> 

Here is cpu_flag_match().
I just want to say that for the APX part we don't need to handle it in the "Double VEX/EVEX Template...".

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-12 12:32             ` Cui, Lili
@ 2023-12-12 12:39               ` Jan Beulich
  2023-12-12 13:15                 ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-12 12:39 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 12.12.2023 13:32, Cui, Lili wrote:
>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>>>>>>>
>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
>>>> variants.  */
>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>> -  {
>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>>>>>>> +    {
>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>> APX_F(CpuAVX512DQ)
>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>> APX_F(CpuBMI2))
>>>>>>>  	{
>>>>>>>  	  if (need_evex_encoding ())
>>>>>>
>>>>>> There are several issues here:
>>>>>> - Why did you need to change (to the worse) the original code?
>>>>>> - Why did you not model the addition after that original code?
>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>>>> AVX512
>>>>>> insn can be VEX-encoded?
>>>>>
>>>>>  I don't understand what you mean, we have this combination.
>>>>>
>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>>>
>>>> Oh, I'm sorry: I forgot about the mask register insns.
>>>>
>>>>>> - If these new macros are really needed for whatever reason, they
>>>> shouldn't
>>>>>>   be added to opcodes/i386-opc.h when they're useful only in the
>>>> assembler.
>>>>>> - Style requires a blank before the opening parenthesis in function
>>>>>>   invocations (which also covers function-like macro invocations).
>>>>>>
>>>>>> I think I asked before: How is it that you get away without
>>>>>> altering cpu_flags_match(), containing related and quite similar logic?
>>>>>>
>>>>>
>>>>> For the original logic ( ... || ... ) && ( ... || ...), the content
>>>>> in the first bracket
>>>> and the content in the following brackets can be combined
>>>> arbitrarily. I think it is Inaccurate.
>>>>
>>>> In which way? If there are issues with the existing code, these
>>>> issues want taking care of in separate (prereq) patches. Of course
>>>> there are assumptions made here about the CPU combinations that can
>>>> (and cannot) occur in any of our templates. Similar assumptions are imo
>> fine to make in the APX additions.
>>>>
>>>> Note how I used two nested if()s despite that not having been
>>>> necessary at that time. I did so in anticipation that for APX you'd
>>>> want to add another
>>>> (separate) inner if(), rather than altering the one that's there.
>>>
>>> Could we remove the CPU check here? it's a bit ugly and has limited
>> effectiveness.
>>>
>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>     {
>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>> APX_F(CpuCMPCCXADD)
>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>> APX_F(CpuAVX512DQ)
>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>
>> I agree on the "a bit ugly" part, but taking what's there right now I don't
>> understand "has limited effectiveness". Of course you can remove any code
>> you want, provided you can prove nothing breaks.
>>
> 
> Here is install_template().
> All I can say is that after removing the CPU check, no test cases failed. I know it's hard to convince you to delete this place, or what do you suggest to do with this? APX requires this, otherwise the test cases will fail.
> 
> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> -       {

So be it then (assuming you don't delete any pre-existing code there). As
said, I expect this will bite us later.

>>>>> Just found cpu_flags_match() has similar logic, I think the
>>>>> following is the
>>>> only code related to CPUID alerts, but none of our combinations are
>>>> related to cpuavx.
>>>>>
>>>>>           if (all.bitfield.cpuavx)
>>>>>             {
>>>>>               /* We need to check SSE2AVX with AVX.  */
>>>>>               if (!t->opcode_modifier.sse2avx
>>>>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>>>>>                 match |= CPU_FLAGS_ARCH_MATCH;
>>>>>             }
>>>>
>>>> Not sure why you pick out this one. This special case is needed for
>>>> sse2avx; I don't see how it's related here. What I've been pointing
>>>> you at is the code in that function which follows a similar "Dual VEX/EVEX
>> templates ..."
>>>> comment.
>>>>
>>>
>>> I know you're talking about this code, I'm just guessing what it does? Don't
>> know what I missed.
>>
>> You pulled out this sse2avx code. Hence I was expecting you to tell me why
>> you consider it relevant here.
>>
> Here is cpu_flag_match().
> 
> I rechecked the code, maybe you want to say I missed the outer loop.
> 
>       cpu = cpu_flags_and (any, active);
>       if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu))
>         {
>           if (all.bitfield.cpuavx)
>             {
>               /* We need to check SSE2AVX with AVX.  */
>               if (!t->opcode_modifier.sse2avx
>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>                 match |= CPU_FLAGS_ARCH_MATCH;
>             }
>           else
>             match |= CPU_FLAGS_ARCH_MATCH;
>         }

No, ...

>>> For example
>>>
>>> .arch .nobmi
>>> andn    (%eax), %eax, %eax
>>>
>>> ---------------------------------------------------------------------------------------------
>>>   if (flag_code != CODE_64BIT)
>>>     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
>>>   else
>>>     active = cpu_arch_flags;                   ---> cpubmi = 0;
>>>   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
>>>   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
>>>     {
>>>     ...
>>>     }
>>> Return  CPU_FLAGS_64BIT_MATCH
>>> ----------------------------------------------------------------------
>>> ------------------------
>>> Then we will report an arch error.
>>>
>>>           if (supported != CPU_FLAGS_PERFECT_MATCH)
>>>             {
>>>               as_bad (_("`%s' is not supported on `%s%s'"),
>>>                       insn_name (current_templates.start),
>>>                       cpu_arch_name ? cpu_arch_name : default_arch,
>>>                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
>>>               return NULL;
>>>             }
>>
>> Which is what we want, I think (for the particular example you picked)? Yet
>> again, I don't think I can see what you're trying to tell me. I also have to
>> confess I've lost track of whether we're discussing install_template(),
>> cpu_flag_match(), or both. For example in install_template() you may indeed
>> be able to get away with little or no changes, as long as there's no used
>> features tracking for APX (see the early ELF-specific part of output_insn()).
>> Things would be somewhat inconsistent then, but that may be tolerable (as
>> long as properly justified in the patch description). Not getting this into
>> proper shape right with the introduction of APX may bite us later, though.
>>
> 
> Here is cpu_flag_match().
> I just want to say that for the APX part we don't need to handle it in the "Double VEX/EVEX Template...".

... I was referring to the dual VEX/EVEX logic. I have to admit I still don't
understand how you get away without touching that, but if everything works,
all is fine of course.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-11  8:34       ` Jan Beulich
  2023-12-12 10:44         ` Cui, Lili
@ 2023-12-12 12:58         ` Cui, Lili
  2023-12-12 14:04           ` Jan Beulich
  1 sibling, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-12 12:58 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> 
> >>> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
> >> *r)
> >>>        if (!cpu_arch_flags.bitfield.cpuapx_f
> >>>  	  || flag_code != CODE_64BIT)
> >>>  	return false;
> >>> +
> >>> +      /* When using RegRex2, dual VEX/EVEX templates need to be
> >>> + marked as
> >> EVEX.
> >>> +	 For the later install_template function.  */
> >>> +      if (current_templates->start->opcode_modifier.vex
> >>> +	  && current_templates->start->opcode_modifier.evex)
> >>> +	i.vec_encoding = vex_encoding_evex;
> >>
> >> I'm afraid I don't understand the 2nd sentence of the comment. This
> >> may be related to my question regarding cpu_flags_match() further up.
> >>
> >> The first sentence isn't quite correct either - you don't mark any
> >> template here (and you can't, because we don't even know yet which
> >> template we're going to use).
> >>
> >> Finally - do you really need the .evex check here? (I won't exclude
> >> that this yields a better diagnostic in certain cases, but this wants
> >> clarifying if so.)
> >>
> >
> > If you look at install_template(), you'll see that before this function we
> need to know if the current encoding is evex.
> 
> "This function" being check_register()? If so, then no, we can't know up front
> whether EVEX encoding is going to be needed, as operand parsing happens
> ahead of template selection. If instead you mean "that function" and hence
> install_template(), then yes, we need to know whether to use EVEX there.
> Yet how does that result in a need for the .evex check here? (Or maybe your
> reply was really to the first of the three parts of my earlier one?)
>

Agree with you, put them here is unreasonable. 

For example 

vtestps (%r27),%ymm6

we should report unsupported  Egpr. But without .evex check, it will report "Error: no EVEX encoding for `vtestps'"

> But anyway - as said earlier on, using current_templates here looks wrong in
> the first place. check_register() deals with only a register, without regard to
> the context it is used in (with the sole exception of allow_pseudo_reg).
> May I remind you that earlier on I already indicated that I suspect you'll need
> a new enumerator to put in i.vec_encoding for this new purpose?
> 

If we don't put it in check_register(), we need to add a for loop at the beginning of the install_template() to check RegRex2. Do you think it is okay? Or create a function for it.

for (unsigned int op = 0; op < i.operands; op++)
    {
      if (i.types[op].bitfield.class != Reg)
        continue;

      if (i.op[op].regs->reg_flags & RegRex2)
        i.vec_encoding = vex_encoding_evex;
    }

  if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
      || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
    i.vec_encoding = vex_encoding_evex; 


> > We need to check opcode_modifier.evex here, it is a fix for issues caused by
> the merge of VEX and EVEX.
> >   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >     {
> >       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> >           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> >           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >         {
> >           if (need_evex_encoding ())
> >             {
> >[...]
> >>> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
> >>>
> >>>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> >>> Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
> >>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> >>> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> >>> +Oword|Unspecified|BaseIndex, Reg64 }
> >>>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> >>> Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
> >>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> >>> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> >>> +Oword|Unspecified|BaseIndex, Reg64 }
> >>
> >> Seeing these: Are there any Map4 encodings which aren't EVex128? If
> >> not (and if you're also not hiddenly aware of some appearing in the
> >> near future), please consider making EVexMap4 include this right
> >> away. Even if in the longer run other encodings appear, it'll then be
> >> easy to simply replace all the
> >> EVexMap4 uses in a purely mechanical way. Until then shorter template
> >> lines are preferable.
> >>
> >
> > Would you mind defining it this way? Since #define EVex128 is behind it.
> Considering that you don't like unnecessary changes.
> >
> > +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128
> 
> The order of #define-s doesn't matter. There's no reason not to use EVex128
> here even if it's #define-d only a few lines later.
> 

OK

#define EVex128 EVex=EVEX128
#define EVex256 EVex=EVEX256
#define EVex512 EVex=EVEX512
#define EVexLIG EVex=EVEXLIG
#define EVexDYN EVex=EVEXDYN

+#define Space0F    OpcodeSpace=SPACE_0F
+#define Space0F38  OpcodeSpace=SPACE_0F38
+#define Space0F3A  OpcodeSpace=SPACE_0F3A
+#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
+#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
+#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
+
+#define EVexMap4 OpcodeSpace=MAP4|EVex128
+#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
+#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6

> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> >> _bSuf|No
> >>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> >>> Reg32|Reg64 }
> >>> +bzhi, 0xf5, BMI2&(BMI2|APX_F),
> >>>
> >>
> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
> >> ources|N
> >>> +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
> >>> +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> >>
> >> Hmm, I had specifically suggested a pre-processor macro to use in
> >> place of the open-coded BMI2&(BMI2|APX_F). Is there a reason you
> >> didn't use that (here and below)?
> >
> > There are many different types of combinations, and each combination
> appears relatively few times, so I think adding a #define for each combination
> feels a bit wasteful.
> 
> I never suggested using multiple #define-s. I suggested a single APX_F()
> macro which would be used uniformly here and elsewhere (here:
> APX_F(BMI2)).
> And that macro would come with a comment explaining why the expression
> is the (seemingly strange) way it is. Right now there's no such explanation
> anywhere, and it would also be hard to find a good (central) place where to
> put it.
> 

Oh, get you.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-12 12:39               ` Jan Beulich
@ 2023-12-12 13:15                 ` Cui, Lili
  2023-12-12 14:13                   ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-12 13:15 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 12.12.2023 13:32, Cui, Lili wrote:
> >>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template
> >>>>>>> *t)
> >>>>>>>
> >>>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
> >>>> variants.  */
> >>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>>>> -  {
> >>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>>>>>> -	   || maybe_cpu (t, CpuFMA))
> >>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
> CpuAVX512VL)))
> >>>>>>> +    {
> >>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
> AVX512F(CpuFMA)
> >>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>>>> APX_F(CpuCMPCCXADD)
> >>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>>>> APX_F(CpuAVX512DQ)
> >>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> >> APX_F(CpuBMI2))
> >>>>>>>  	{
> >>>>>>>  	  if (need_evex_encoding ())
> >>>>>>
> >>>>>> There are several issues here:
> >>>>>> - Why did you need to change (to the worse) the original code?
> >>>>>> - Why did you not model the addition after that original code?
> >>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
> >>>> AVX512
> >>>>>> insn can be VEX-encoded?
> >>>>>
> >>>>>  I don't understand what you mean, we have this combination.
> >>>>>
> >>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> >>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> >>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> >>>>
> >>>> Oh, I'm sorry: I forgot about the mask register insns.
> >>>>
> >>>>>> - If these new macros are really needed for whatever reason, they
> >>>> shouldn't
> >>>>>>   be added to opcodes/i386-opc.h when they're useful only in the
> >>>> assembler.
> >>>>>> - Style requires a blank before the opening parenthesis in function
> >>>>>>   invocations (which also covers function-like macro invocations).
> >>>>>>
> >>>>>> I think I asked before: How is it that you get away without
> >>>>>> altering cpu_flags_match(), containing related and quite similar
> logic?
> >>>>>>
> >>>>>
> >>>>> For the original logic ( ... || ... ) && ( ... || ...), the
> >>>>> content in the first bracket
> >>>> and the content in the following brackets can be combined
> >>>> arbitrarily. I think it is Inaccurate.
> >>>>
> >>>> In which way? If there are issues with the existing code, these
> >>>> issues want taking care of in separate (prereq) patches. Of course
> >>>> there are assumptions made here about the CPU combinations that can
> >>>> (and cannot) occur in any of our templates. Similar assumptions are
> >>>> imo
> >> fine to make in the APX additions.
> >>>>
> >>>> Note how I used two nested if()s despite that not having been
> >>>> necessary at that time. I did so in anticipation that for APX you'd
> >>>> want to add another
> >>>> (separate) inner if(), rather than altering the one that's there.
> >>>
> >>> Could we remove the CPU check here? it's a bit ugly and has limited
> >> effectiveness.
> >>>
> >>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>     {
> >>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >> APX_F(CpuCMPCCXADD)
> >>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >> APX_F(CpuAVX512DQ)
> >>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >>
> >> I agree on the "a bit ugly" part, but taking what's there right now I
> >> don't understand "has limited effectiveness". Of course you can
> >> remove any code you want, provided you can prove nothing breaks.
> >>
> >
> > Here is install_template().
> > All I can say is that after removing the CPU check, no test cases failed. I
> know it's hard to convince you to delete this place, or what do you suggest to
> do with this? APX requires this, otherwise the test cases will fail.
> >
> > -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> > -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> > -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> > -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> > -       {
> 
> So be it then (assuming you don't delete any pre-existing code there). As said,
> I expect this will bite us later.
> 

Done.

+      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
+          || maybe_cpu (t, CpuFMA))
+         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
+         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
+         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
+         || APX_F(CpuBMI2))

> >>>>> Just found cpu_flags_match() has similar logic, I think the
> >>>>> following is the
> >>>> only code related to CPUID alerts, but none of our combinations are
> >>>> related to cpuavx.
> >>>>>
> >>>>>           if (all.bitfield.cpuavx)
> >>>>>             {
> >>>>>               /* We need to check SSE2AVX with AVX.  */
> >>>>>               if (!t->opcode_modifier.sse2avx
> >>>>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >>>>>                 match |= CPU_FLAGS_ARCH_MATCH;
> >>>>>             }
> >>>>
> >>>> Not sure why you pick out this one. This special case is needed for
> >>>> sse2avx; I don't see how it's related here. What I've been pointing
> >>>> you at is the code in that function which follows a similar "Dual
> >>>> VEX/EVEX
> >> templates ..."
> >>>> comment.
> >>>>
> >>>
> >>> I know you're talking about this code, I'm just guessing what it
> >>> does? Don't
> >> know what I missed.
> >>
> >> You pulled out this sse2avx code. Hence I was expecting you to tell
> >> me why you consider it relevant here.
> >>
> > Here is cpu_flag_match().
> >
> > I rechecked the code, maybe you want to say I missed the outer loop.
> >
> >       cpu = cpu_flags_and (any, active);
> >       if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu))
> >         {
> >           if (all.bitfield.cpuavx)
> >             {
> >               /* We need to check SSE2AVX with AVX.  */
> >               if (!t->opcode_modifier.sse2avx
> >                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >                 match |= CPU_FLAGS_ARCH_MATCH;
> >             }
> >           else
> >             match |= CPU_FLAGS_ARCH_MATCH;
> >         }
> 
> No, ...
> 
> >>> For example
> >>>
> >>> .arch .nobmi
> >>> andn    (%eax), %eax, %eax
> >>>
> >>> ---------------------------------------------------------------------------------------------
> >>>   if (flag_code != CODE_64BIT)
> >>>     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
> >>>   else
> >>>     active = cpu_arch_flags;                   ---> cpubmi = 0;
> >>>   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
> >>>   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
> >>>     {
> >>>     ...
> >>>     }
> >>> Return  CPU_FLAGS_64BIT_MATCH
> >>> --------------------------------------------------------------------
> >>> --
> >>> ------------------------
> >>> Then we will report an arch error.
> >>>
> >>>           if (supported != CPU_FLAGS_PERFECT_MATCH)
> >>>             {
> >>>               as_bad (_("`%s' is not supported on `%s%s'"),
> >>>                       insn_name (current_templates.start),
> >>>                       cpu_arch_name ? cpu_arch_name : default_arch,
> >>>                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
> >>>               return NULL;
> >>>             }
> >>
> >> Which is what we want, I think (for the particular example you
> >> picked)? Yet again, I don't think I can see what you're trying to
> >> tell me. I also have to confess I've lost track of whether we're
> >> discussing install_template(), cpu_flag_match(), or both. For example
> >> in install_template() you may indeed be able to get away with little
> >> or no changes, as long as there's no used features tracking for APX (see the
> early ELF-specific part of output_insn()).
> >> Things would be somewhat inconsistent then, but that may be tolerable
> >> (as long as properly justified in the patch description). Not getting
> >> this into proper shape right with the introduction of APX may bite us later,
> though.
> >>
> >
> > Here is cpu_flag_match().
> > I just want to say that for the APX part we don't need to handle it in the
> "Double VEX/EVEX Template...".
> 
> ... I was referring to the dual VEX/EVEX logic. I have to admit I still don't
> understand how you get away without touching that, but if everything
> works, all is fine of course.
> 

Maybe, when FMA is combined with AVX512F, if we disable FMA, but the current instruction belongs to AVX512F, there is no need to report cpu errors for it. But it's different with APX. The combination of APX and BMI requires that both are indispensable.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-12 12:58         ` Cui, Lili
@ 2023-12-12 14:04           ` Jan Beulich
  2023-12-13  8:35             ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-12 14:04 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 12.12.2023 13:58, Cui, Lili wrote:
>>
>>>>> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
>>>> *r)
>>>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
>>>>>  	  || flag_code != CODE_64BIT)
>>>>>  	return false;
>>>>> +
>>>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be
>>>>> + marked as
>>>> EVEX.
>>>>> +	 For the later install_template function.  */
>>>>> +      if (current_templates->start->opcode_modifier.vex
>>>>> +	  && current_templates->start->opcode_modifier.evex)
>>>>> +	i.vec_encoding = vex_encoding_evex;
>>>>
>>>> I'm afraid I don't understand the 2nd sentence of the comment. This
>>>> may be related to my question regarding cpu_flags_match() further up.
>>>>
>>>> The first sentence isn't quite correct either - you don't mark any
>>>> template here (and you can't, because we don't even know yet which
>>>> template we're going to use).
>>>>
>>>> Finally - do you really need the .evex check here? (I won't exclude
>>>> that this yields a better diagnostic in certain cases, but this wants
>>>> clarifying if so.)
>>>>
>>>
>>> If you look at install_template(), you'll see that before this function we
>> need to know if the current encoding is evex.
>>
>> "This function" being check_register()? If so, then no, we can't know up front
>> whether EVEX encoding is going to be needed, as operand parsing happens
>> ahead of template selection. If instead you mean "that function" and hence
>> install_template(), then yes, we need to know whether to use EVEX there.
>> Yet how does that result in a need for the .evex check here? (Or maybe your
>> reply was really to the first of the three parts of my earlier one?)
>>
> 
> Agree with you, put them here is unreasonable. 
> 
> For example 
> 
> vtestps (%r27),%ymm6
> 
> we should report unsupported  Egpr. But without .evex check, it will report "Error: no EVEX encoding for `vtestps'"
> 
>> But anyway - as said earlier on, using current_templates here looks wrong in
>> the first place. check_register() deals with only a register, without regard to
>> the context it is used in (with the sole exception of allow_pseudo_reg).
>> May I remind you that earlier on I already indicated that I suspect you'll need
>> a new enumerator to put in i.vec_encoding for this new purpose?
>>
> 
> If we don't put it in check_register(), we need to add a for loop at the beginning of the install_template() to check RegRex2. Do you think it is okay? Or create a function for it.
> 
> for (unsigned int op = 0; op < i.operands; op++)
>     {
>       if (i.types[op].bitfield.class != Reg)
>         continue;
> 
>       if (i.op[op].regs->reg_flags & RegRex2)
>         i.vec_encoding = vex_encoding_evex;
>     }
> 
>   if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>       || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>     i.vec_encoding = vex_encoding_evex; 

As a last resort this may be an option. But until my suggestion wasn't at
least tried or demonstrated to be worse, I don't think the above would be
acceptable.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-12 13:15                 ` Cui, Lili
@ 2023-12-12 14:13                   ` Jan Beulich
  2023-12-13  7:36                     ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-12 14:13 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 12.12.2023 14:15, Cui, Lili wrote:
>> On 12.12.2023 13:32, Cui, Lili wrote:
>>>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template
>>>>>>>>> *t)
>>>>>>>>>
>>>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
>>>>>> variants.  */
>>>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>>>> -  {
>>>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
>> CpuAVX512VL)))
>>>>>>>>> +    {
>>>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
>> AVX512F(CpuFMA)
>>>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>>>> APX_F(CpuAVX512DQ)
>>>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>>>> APX_F(CpuBMI2))
>>>>>>>>>  	{
>>>>>>>>>  	  if (need_evex_encoding ())
>>>>>>>>
>>>>>>>> There are several issues here:
>>>>>>>> - Why did you need to change (to the worse) the original code?
>>>>>>>> - Why did you not model the addition after that original code?
>>>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>>>>>> AVX512
>>>>>>>> insn can be VEX-encoded?
>>>>>>>
>>>>>>>  I don't understand what you mean, we have this combination.
>>>>>>>
>>>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>>>>>
>>>>>> Oh, I'm sorry: I forgot about the mask register insns.
>>>>>>
>>>>>>>> - If these new macros are really needed for whatever reason, they
>>>>>> shouldn't
>>>>>>>>   be added to opcodes/i386-opc.h when they're useful only in the
>>>>>> assembler.
>>>>>>>> - Style requires a blank before the opening parenthesis in function
>>>>>>>>   invocations (which also covers function-like macro invocations).
>>>>>>>>
>>>>>>>> I think I asked before: How is it that you get away without
>>>>>>>> altering cpu_flags_match(), containing related and quite similar
>> logic?
>>>>>>>>
>>>>>>>
>>>>>>> For the original logic ( ... || ... ) && ( ... || ...), the
>>>>>>> content in the first bracket
>>>>>> and the content in the following brackets can be combined
>>>>>> arbitrarily. I think it is Inaccurate.
>>>>>>
>>>>>> In which way? If there are issues with the existing code, these
>>>>>> issues want taking care of in separate (prereq) patches. Of course
>>>>>> there are assumptions made here about the CPU combinations that can
>>>>>> (and cannot) occur in any of our templates. Similar assumptions are
>>>>>> imo
>>>> fine to make in the APX additions.
>>>>>>
>>>>>> Note how I used two nested if()s despite that not having been
>>>>>> necessary at that time. I did so in anticipation that for APX you'd
>>>>>> want to add another
>>>>>> (separate) inner if(), rather than altering the one that's there.
>>>>>
>>>>> Could we remove the CPU check here? it's a bit ugly and has limited
>>>> effectiveness.
>>>>>
>>>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>     {
>>>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>> APX_F(CpuCMPCCXADD)
>>>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>> APX_F(CpuAVX512DQ)
>>>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>>
>>>> I agree on the "a bit ugly" part, but taking what's there right now I
>>>> don't understand "has limited effectiveness". Of course you can
>>>> remove any code you want, provided you can prove nothing breaks.
>>>>
>>>
>>> Here is install_template().
>>> All I can say is that after removing the CPU check, no test cases failed. I
>> know it's hard to convince you to delete this place, or what do you suggest to
>> do with this? APX requires this, otherwise the test cases will fail.
>>>
>>> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>> APX_F(CpuCMPCCXADD)
>>> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>> APX_F(CpuAVX512DQ)
>>> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>> -       {
>>
>> So be it then (assuming you don't delete any pre-existing code there). As said,
>> I expect this will bite us later.
> 
> Done.

I can't connect this with ...

> +      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> +          || maybe_cpu (t, CpuFMA))
> +         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
> +         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
> +         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
> +         || APX_F(CpuBMI2))

... this: You said you want to remove all the new checks. And now you say
"done" with the checks all still there? And even if I misunderstood you,
I still don't see why you'd modify the existing condition: The adjustments
made in the body of the if() aren't applicable to APX afaict. Plus there
are still the odd APX_F() uses; I'm sure I commented on that before. If
any adjustments need making for APX, you want to add a 2nd inner if()
inside the enclosing one.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-12-12  8:41       ` Jan Beulich
@ 2023-12-13  5:31         ` Hu, Lin1
  0 siblings, 0 replies; 69+ messages in thread
From: Hu, Lin1 @ 2023-12-13  5:31 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils, Cui, Lili

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, December 12, 2023 4:42 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org; Cui, Lili
> <lili.cui@intel.com>
> Subject: Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
> 
> On 12.12.2023 04:18, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Monday, December 11, 2023 8:28 PM
> >>
> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> @@ -7675,6 +7727,61 @@ match_template (char mnem_suffix)
> >>>  	  i.memshift = memshift;
> >>>  	}
> >>>
> >>> +      /* If we can optimize a NDD insn to legacy insn, like
> >>> +	 add %r16, %r8, %r8 -> add %r16, %r8,
> >>> +	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> >>> +	 Note that the semantics have not been changed.  */
> >>> +      if (optimize
> >>> +	  && !i.no_optimize
> >>> +	  && i.vec_encoding != vex_encoding_evex
> >>> +	  && t + 1 < current_templates->end
> >>> +	  && !t[1].opcode_modifier.evex
> >>> +	  && t[1].opcode_space <= SPACE_0F38
> >>> +	  && t->opcode_modifier.vexvvvv == VexVVVV_DST)
> >>> +	{
> >>> +	  unsigned int match_dest_op = can_convert_NDD_to_legacy (t);
> >>> +	  size_match = true;
> >>
> >> This would perhaps better ...
> >>
> >>> +	  if (match_dest_op != (unsigned int) ~0)
> >>> +	    {
> >>
> >> ... live here
> >>
> >
> > OK.
> >
> >>
> >>> +	      /* We ensure that the next template has the same input
> >>> +		 operands as the original matching template by the first
> >>> +		 opernd (ATT), thus avoiding the error caused by the wrong
> >> order
> >>> +		 of insns in i386.tbl.  */
> >>
> >> I'm sorry, but I (still) can't make sense of this last part of the
> >> comment, after the comma.
> >>
> >
> > I mean if someone support new NDD insns and put it in the wrong position, so
> the part will try to avoid to optimize the insn.
> 
> If this is about hypothetical new templates, that would want saying so in the
> comment. Thus clarifying that there's no functional effect right now.
> I wonder what H.J.'s view on such effectively dead code is.
> 
> However, there's a bigger problem with this patch as I realized only a few
> minutes ago when looking into Lili's reply on the NDD patch thread:
> NDD insns are implicitly zero-upper. Hence converting NDD to legacy insns needs
> to be limited to 32- and 64-bit operand size. For 8- and 16-bit operand size the
> results would differ, which isn't acceptable under any -O<n>. It may be okay to
> do such a conversion even for the smaller sizes, but then under a separate
> option explicitly permitting such a functional difference.
> 

We can constraint the optimization by add some conditions, like i.types[i.operands - 1].bitfield.dword || i.types[i.operands - 1].bitfield.qword.

BRs,
Lin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-12-12  8:45       ` Jan Beulich
@ 2023-12-13  6:06         ` Hu, Lin1
  2023-12-13  8:19           ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Hu, Lin1 @ 2023-12-13  6:06 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils, Cui, Lili

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, December 12, 2023 4:46 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org; Cui, Lili
> <lili.cui@intel.com>
> Subject: Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
> 
> On 12.12.2023 04:18, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Monday, December 11, 2023 8:28 PM
> >>
> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> --- a/gas/config/tc-i386.c
> >>> +++ b/gas/config/tc-i386.c
> >>> @@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t)
> >>>    return 0;
> >>>  }
> >>>
> >>> +/* Check if the instruction use the REX registers.  */ static bool
> >>> +check_RexOperands () {
> >>> +  for (unsigned int op = 0; op < i.operands; op++)
> >>> +    {
> >>> +      if (i.types[op].bitfield.class != Reg)
> >>> +	continue;
> >>> +
> >>> +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
> >>> +	return true;
> >>> +    }
> >>> +
> >>> +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
> >>> +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
> >>> +    return true;
> >>> +
> >>> +  /* Check pseudo prefix {rex} are valid.  */  return
> >>> + i.rex_encoding;
> >>
> >> Can this actually happen, when we're converting from EVEX to legacy?
> >> (Initially I wanted to ask about "rex" and alike prefixes, i.e. the
> >> non- pseudo
> >> ones.)
> >>
> >
> > This is to align with check_EgprOperands. I hope the function be more general.
> Not just for this optimization problem.
> 
> But then the comment shouldn't say "REX registers", and "Operands" in its name
> isn't quite right either.
> 
> Also you want to make the function be a proper modern declaration, by adding
> "void" between the parentheses.
> 

I have modified the comment like " Check if the instruction use the REX registers or REX prefix." And function name is check_RexOperands_or_RexPrefix.

BRs,
Lin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-12 14:13                   ` Jan Beulich
@ 2023-12-13  7:36                     ` Cui, Lili
  2023-12-13  7:48                       ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-13  7:36 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> >>>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const
> insn_template
> >>>>>>>>> *t)
> >>>>>>>>>
> >>>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the
> >>>>>>>>> possible
> >>>>>> variants.  */
> >>>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>>>>>> -  {
> >>>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>>>>>>>> -	   || maybe_cpu (t, CpuFMA))
> >>>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
> >> CpuAVX512VL)))
> >>>>>>>>> +    {
> >>>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
> >> AVX512F(CpuFMA)
> >>>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>>>>>> APX_F(CpuCMPCCXADD)
> >>>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>>>>>> APX_F(CpuAVX512DQ)
> >>>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> >>>> APX_F(CpuBMI2))
> >>>>>>>>>  	{
> >>>>>>>>>  	  if (need_evex_encoding ())
> >>>>>>>>
> >>>>>>>> There are several issues here:
> >>>>>>>> - Why did you need to change (to the worse) the original code?
> >>>>>>>> - Why did you not model the addition after that original code?
> >>>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
> >>>>>> AVX512
> >>>>>>>> insn can be VEX-encoded?
> >>>>>>>
> >>>>>>>  I don't understand what you mean, we have this combination.
> >>>>>>>
> >>>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> >>>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> >>>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> >>>>>>
> >>>>>> Oh, I'm sorry: I forgot about the mask register insns.
> >>>>>>
> >>>>>>>> - If these new macros are really needed for whatever reason,
> >>>>>>>> they
> >>>>>> shouldn't
> >>>>>>>>   be added to opcodes/i386-opc.h when they're useful only in
> >>>>>>>> the
> >>>>>> assembler.
> >>>>>>>> - Style requires a blank before the opening parenthesis in function
> >>>>>>>>   invocations (which also covers function-like macro invocations).
> >>>>>>>>
> >>>>>>>> I think I asked before: How is it that you get away without
> >>>>>>>> altering cpu_flags_match(), containing related and quite
> >>>>>>>> similar
> >> logic?
> >>>>>>>>
> >>>>>>>
> >>>>>>> For the original logic ( ... || ... ) && ( ... || ...), the
> >>>>>>> content in the first bracket
> >>>>>> and the content in the following brackets can be combined
> >>>>>> arbitrarily. I think it is Inaccurate.
> >>>>>>
> >>>>>> In which way? If there are issues with the existing code, these
> >>>>>> issues want taking care of in separate (prereq) patches. Of
> >>>>>> course there are assumptions made here about the CPU combinations
> >>>>>> that can (and cannot) occur in any of our templates. Similar
> >>>>>> assumptions are imo
> >>>> fine to make in the APX additions.
> >>>>>>
> >>>>>> Note how I used two nested if()s despite that not having been
> >>>>>> necessary at that time. I did so in anticipation that for APX
> >>>>>> you'd want to add another
> >>>>>> (separate) inner if(), rather than altering the one that's there.
> >>>>>
> >>>>> Could we remove the CPU check here? it's a bit ugly and has
> >>>>> limited
> >>>> effectiveness.
> >>>>>
> >>>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>>     {
> >>>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>> APX_F(CpuCMPCCXADD)
> >>>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>> APX_F(CpuAVX512DQ)
> >>>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> >>>>> APX_F(CpuBMI2))
> >>>>
> >>>> I agree on the "a bit ugly" part, but taking what's there right now
> >>>> I don't understand "has limited effectiveness". Of course you can
> >>>> remove any code you want, provided you can prove nothing breaks.
> >>>>
> >>>
> >>> Here is install_template().
> >>> All I can say is that after removing the CPU check, no test cases
> >>> failed. I
> >> know it's hard to convince you to delete this place, or what do you
> >> suggest to do with this? APX requires this, otherwise the test cases will fail.
> >>>
> >>> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >> APX_F(CpuCMPCCXADD)
> >>> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >> APX_F(CpuAVX512DQ)
> >>> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >>> -       {
> >>
> >> So be it then (assuming you don't delete any pre-existing code
> >> there). As said, I expect this will bite us later.
> >
> > Done.
> 
> I can't connect this with ...
> 
> > +      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> > +          || maybe_cpu (t, CpuFMA))
> > +         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
> > +         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) ||
> APX_F(CpuAVX512F)
> > +         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) ||
> APX_F(CpuBMI)
> > +         || APX_F(CpuBMI2))
> 
> ... this: You said you want to remove all the new checks. And now you say
> "done" with the checks all still there? And even if I misunderstood you, I still
> don't see why you'd modify the existing condition: The adjustments made in
> the body of the if() aren't applicable to APX afaict. Plus there are still the odd
> APX_F() uses; I'm sure I commented on that before. If any adjustments need
> making for APX, you want to add a 2nd inner if() inside the enclosing one.
> 

I want to remove all, including your pre-existing code,  there is an EVEX testcase failure due to not clean i.tm.opcode_modifier.vex = 0;  As you required that don't delete any pre-existing code, so I still need to add my new combination,  

How about this ?


I want to remove all code, including your pre-existing code, VEX test case fails because it wasn't cleaned up i.tm.opcode_modifier.evex = 0; As you asked, don't remove any pre-existing code, so I still need to add my new combinations.

How about this?

  /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
  if (t->opcode_modifier.vex && t->opcode_modifier.evex)
    {
      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
           || maybe_cpu (t, CpuFMA))
          && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
        {
          if (need_evex_encoding ())
            {
              i.tm.opcode_modifier.vex = 0;
              i.tm.cpu.bitfield.cpuavx512f = i.tm.cpu_any.bitfield.cpuavx512f;
              i.tm.cpu.bitfield.cpuavx512vl = i.tm.cpu_any.bitfield.cpuavx512vl;
            }
          else
            {
              i.tm.opcode_modifier.evex = 0;
              if (i.tm.cpu_any.bitfield.cpuavx)
                i.tm.cpu.bitfield.cpuavx = 1;
              else if (!i.tm.cpu.bitfield.isa)
                i.tm.cpu.bitfield.isa = i.tm.cpu_any.bitfield.isa;
              else
                gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
            }
        }

      if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
          || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
          || APX_F(CpuBMI2))
        if (need_evex_encoding ())
          i.tm.opcode_modifier.vex = 0;
        else
          i.tm.opcode_modifier.evex = 0;
    }

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-13  7:36                     ` Cui, Lili
@ 2023-12-13  7:48                       ` Jan Beulich
  0 siblings, 0 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-13  7:48 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 13.12.2023 08:36, Cui, Lili wrote:
>>>>>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const
>> insn_template
>>>>>>>>>>> *t)
>>>>>>>>>>>
>>>>>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the
>>>>>>>>>>> possible
>>>>>>>> variants.  */
>>>>>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>>>>>> -  {
>>>>>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>>>>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
>>>> CpuAVX512VL)))
>>>>>>>>>>> +    {
>>>>>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
>>>> AVX512F(CpuFMA)
>>>>>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>>>>>> APX_F(CpuAVX512DQ)
>>>>>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>>>>>> APX_F(CpuBMI2))
>>>>>>>>>>>  	{
>>>>>>>>>>>  	  if (need_evex_encoding ())
>>>>>>>>>>
>>>>>>>>>> There are several issues here:
>>>>>>>>>> - Why did you need to change (to the worse) the original code?
>>>>>>>>>> - Why did you not model the addition after that original code?
>>>>>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>>>>>>>> AVX512
>>>>>>>>>> insn can be VEX-encoded?
>>>>>>>>>
>>>>>>>>>  I don't understand what you mean, we have this combination.
>>>>>>>>>
>>>>>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>>>>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>>>>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>>>>>>>
>>>>>>>> Oh, I'm sorry: I forgot about the mask register insns.
>>>>>>>>
>>>>>>>>>> - If these new macros are really needed for whatever reason,
>>>>>>>>>> they
>>>>>>>> shouldn't
>>>>>>>>>>   be added to opcodes/i386-opc.h when they're useful only in
>>>>>>>>>> the
>>>>>>>> assembler.
>>>>>>>>>> - Style requires a blank before the opening parenthesis in function
>>>>>>>>>>   invocations (which also covers function-like macro invocations).
>>>>>>>>>>
>>>>>>>>>> I think I asked before: How is it that you get away without
>>>>>>>>>> altering cpu_flags_match(), containing related and quite
>>>>>>>>>> similar
>>>> logic?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For the original logic ( ... || ... ) && ( ... || ...), the
>>>>>>>>> content in the first bracket
>>>>>>>> and the content in the following brackets can be combined
>>>>>>>> arbitrarily. I think it is Inaccurate.
>>>>>>>>
>>>>>>>> In which way? If there are issues with the existing code, these
>>>>>>>> issues want taking care of in separate (prereq) patches. Of
>>>>>>>> course there are assumptions made here about the CPU combinations
>>>>>>>> that can (and cannot) occur in any of our templates. Similar
>>>>>>>> assumptions are imo
>>>>>> fine to make in the APX additions.
>>>>>>>>
>>>>>>>> Note how I used two nested if()s despite that not having been
>>>>>>>> necessary at that time. I did so in anticipation that for APX
>>>>>>>> you'd want to add another
>>>>>>>> (separate) inner if(), rather than altering the one that's there.
>>>>>>>
>>>>>>> Could we remove the CPU check here? it's a bit ugly and has
>>>>>>> limited
>>>>>> effectiveness.
>>>>>>>
>>>>>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>>     {
>>>>>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>> APX_F(CpuAVX512DQ)
>>>>>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>>>>>>> APX_F(CpuBMI2))
>>>>>>
>>>>>> I agree on the "a bit ugly" part, but taking what's there right now
>>>>>> I don't understand "has limited effectiveness". Of course you can
>>>>>> remove any code you want, provided you can prove nothing breaks.
>>>>>>
>>>>>
>>>>> Here is install_template().
>>>>> All I can say is that after removing the CPU check, no test cases
>>>>> failed. I
>>>> know it's hard to convince you to delete this place, or what do you
>>>> suggest to do with this? APX requires this, otherwise the test cases will fail.
>>>>>
>>>>> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>> APX_F(CpuCMPCCXADD)
>>>>> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>> APX_F(CpuAVX512DQ)
>>>>> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>>> -       {
>>>>
>>>> So be it then (assuming you don't delete any pre-existing code
>>>> there). As said, I expect this will bite us later.
>>>
>>> Done.
>>
>> I can't connect this with ...
>>
>>> +      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>> +          || maybe_cpu (t, CpuFMA))
>>> +         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
>>> +         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) ||
>> APX_F(CpuAVX512F)
>>> +         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) ||
>> APX_F(CpuBMI)
>>> +         || APX_F(CpuBMI2))
>>
>> ... this: You said you want to remove all the new checks. And now you say
>> "done" with the checks all still there? And even if I misunderstood you, I still
>> don't see why you'd modify the existing condition: The adjustments made in
>> the body of the if() aren't applicable to APX afaict. Plus there are still the odd
>> APX_F() uses; I'm sure I commented on that before. If any adjustments need
>> making for APX, you want to add a 2nd inner if() inside the enclosing one.
>>
> 
> I want to remove all, including your pre-existing code,  there is an EVEX testcase failure due to not clean i.tm.opcode_modifier.vex = 0;  As you required that don't delete any pre-existing code, so I still need to add my new combination,  
> 
> How about this ?
> 
> 
> I want to remove all code, including your pre-existing code, VEX test case fails because it wasn't cleaned up i.tm.opcode_modifier.evex = 0; As you asked, don't remove any pre-existing code, so I still need to add my new combinations.
> 
> How about this?
> 
>   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>     {
>       if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>            || maybe_cpu (t, CpuFMA))
>           && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>         {
>           if (need_evex_encoding ())
>             {
>               i.tm.opcode_modifier.vex = 0;
>               i.tm.cpu.bitfield.cpuavx512f = i.tm.cpu_any.bitfield.cpuavx512f;
>               i.tm.cpu.bitfield.cpuavx512vl = i.tm.cpu_any.bitfield.cpuavx512vl;
>             }
>           else
>             {
>               i.tm.opcode_modifier.evex = 0;
>               if (i.tm.cpu_any.bitfield.cpuavx)
>                 i.tm.cpu.bitfield.cpuavx = 1;
>               else if (!i.tm.cpu.bitfield.isa)
>                 i.tm.cpu.bitfield.isa = i.tm.cpu_any.bitfield.isa;
>               else
>                 gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
>             }
>         }
> 
>       if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
>           || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
>           || APX_F(CpuBMI2))
>         if (need_evex_encoding ())
>           i.tm.opcode_modifier.vex = 0;
>         else
>           i.tm.opcode_modifier.evex = 0;
>     }

Something along these lines, indeed. But without APX_F(). I've just looked
it up again:

#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t, cpuid))

Why would you test CpuAPX_F over and over again in the conditional? See
how the code that has been there for a little while checks each CpuXYZ
exactly once.

Plus, simply as a style remark, you want to add braces around the if/else,
to make entirely clear that the else belongs to the inner if() (iirc some
compiler versions warn about code as you have it above).

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-12-13  6:06         ` Hu, Lin1
@ 2023-12-13  8:19           ` Jan Beulich
  2023-12-13  8:34             ` Hu, Lin1
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2023-12-13  8:19 UTC (permalink / raw)
  To: Hu, Lin1; +Cc: Lu, Hongjiu, binutils, Cui, Lili

On 13.12.2023 07:06, Hu, Lin1 wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Tuesday, December 12, 2023 4:46 PM
>> To: Hu, Lin1 <lin1.hu@intel.com>
>> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org; Cui, Lili
>> <lili.cui@intel.com>
>> Subject: Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
>>
>> On 12.12.2023 04:18, Hu, Lin1 wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Monday, December 11, 2023 8:28 PM
>>>>
>>>> On 24.11.2023 08:02, Cui, Lili wrote:
>>>>> --- a/gas/config/tc-i386.c
>>>>> +++ b/gas/config/tc-i386.c
>>>>> @@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t)
>>>>>    return 0;
>>>>>  }
>>>>>
>>>>> +/* Check if the instruction use the REX registers.  */ static bool
>>>>> +check_RexOperands () {
>>>>> +  for (unsigned int op = 0; op < i.operands; op++)
>>>>> +    {
>>>>> +      if (i.types[op].bitfield.class != Reg)
>>>>> +	continue;
>>>>> +
>>>>> +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
>>>>> +	return true;
>>>>> +    }
>>>>> +
>>>>> +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
>>>>> +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
>>>>> +    return true;
>>>>> +
>>>>> +  /* Check pseudo prefix {rex} are valid.  */  return
>>>>> + i.rex_encoding;
>>>>
>>>> Can this actually happen, when we're converting from EVEX to legacy?
>>>> (Initially I wanted to ask about "rex" and alike prefixes, i.e. the
>>>> non- pseudo
>>>> ones.)
>>>>
>>>
>>> This is to align with check_EgprOperands. I hope the function be more general.
>> Not just for this optimization problem.
>>
>> But then the comment shouldn't say "REX registers", and "Operands" in its name
>> isn't quite right either.
>>
>> Also you want to make the function be a proper modern declaration, by adding
>> "void" between the parentheses.
>>
> 
> I have modified the comment like " Check if the instruction use the REX registers or REX prefix." And function name is check_RexOperands_or_RexPrefix.

How about check_Rex_required() or is_rex_required() or some such? No need
to have two "Rex" in a single name.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 8/9] Support APX NDD optimized encoding.
  2023-12-13  8:19           ` Jan Beulich
@ 2023-12-13  8:34             ` Hu, Lin1
  0 siblings, 0 replies; 69+ messages in thread
From: Hu, Lin1 @ 2023-12-13  8:34 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils, Cui, Lili

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, December 13, 2023 4:19 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org; Cui, Lili
> <lili.cui@intel.com>
> Subject: Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
> 
> On 13.12.2023 07:06, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Tuesday, December 12, 2023 4:46 PM
> >> To: Hu, Lin1 <lin1.hu@intel.com>
> >> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org; Cui,
> >> Lili <lili.cui@intel.com>
> >> Subject: Re: [PATCH v3 8/9] Support APX NDD optimized encoding.
> >>
> >> On 12.12.2023 04:18, Hu, Lin1 wrote:
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: Monday, December 11, 2023 8:28 PM
> >>>>
> >>>> On 24.11.2023 08:02, Cui, Lili wrote:
> >>>>> --- a/gas/config/tc-i386.c
> >>>>> +++ b/gas/config/tc-i386.c
> >>>>> @@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t)
> >>>>>    return 0;
> >>>>>  }
> >>>>>
> >>>>> +/* Check if the instruction use the REX registers.  */ static
> >>>>> +bool check_RexOperands () {
> >>>>> +  for (unsigned int op = 0; op < i.operands; op++)
> >>>>> +    {
> >>>>> +      if (i.types[op].bitfield.class != Reg)
> >>>>> +	continue;
> >>>>> +
> >>>>> +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
> >>>>> +	return true;
> >>>>> +    }
> >>>>> +
> >>>>> +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
> >>>>> +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
> >>>>> +    return true;
> >>>>> +
> >>>>> +  /* Check pseudo prefix {rex} are valid.  */  return
> >>>>> + i.rex_encoding;
> >>>>
> >>>> Can this actually happen, when we're converting from EVEX to legacy?
> >>>> (Initially I wanted to ask about "rex" and alike prefixes, i.e. the
> >>>> non- pseudo
> >>>> ones.)
> >>>>
> >>>
> >>> This is to align with check_EgprOperands. I hope the function be more
> general.
> >> Not just for this optimization problem.
> >>
> >> But then the comment shouldn't say "REX registers", and "Operands" in
> >> its name isn't quite right either.
> >>
> >> Also you want to make the function be a proper modern declaration, by
> >> adding "void" between the parentheses.
> >>
> >
> > I have modified the comment like " Check if the instruction use the REX
> registers or REX prefix." And function name is check_RexOperands_or_RexPrefix.
> 
> How about check_Rex_required() or is_rex_required() or some such? No need to
> have two "Rex" in a single name.
> 

OK, thanks for your advice, Have modified to check_Rex_required().

BRs,
Lin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-12 14:04           ` Jan Beulich
@ 2023-12-13  8:35             ` Cui, Lili
  2023-12-13  9:13               ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-13  8:35 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> >>>>> @@ -14233,6 +14276,12 @@ static bool check_register (const
> >>>>> reg_entry
> >>>> *r)
> >>>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
> >>>>>  	  || flag_code != CODE_64BIT)
> >>>>>  	return false;
> >>>>> +
> >>>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be
> >>>>> + marked as
> >>>> EVEX.
> >>>>> +	 For the later install_template function.  */
> >>>>> +      if (current_templates->start->opcode_modifier.vex
> >>>>> +	  && current_templates->start->opcode_modifier.evex)
> >>>>> +	i.vec_encoding = vex_encoding_evex;
> >>>>
> >>>> I'm afraid I don't understand the 2nd sentence of the comment. This
> >>>> may be related to my question regarding cpu_flags_match() further up.
> >>>>
> >>>> The first sentence isn't quite correct either - you don't mark any
> >>>> template here (and you can't, because we don't even know yet which
> >>>> template we're going to use).
> >>>>
> >>>> Finally - do you really need the .evex check here? (I won't exclude
> >>>> that this yields a better diagnostic in certain cases, but this
> >>>> wants clarifying if so.)
> >>>>
> >>>
> >>> If you look at install_template(), you'll see that before this
> >>> function we
> >> need to know if the current encoding is evex.
> >>
> >> "This function" being check_register()? If so, then no, we can't know
> >> up front whether EVEX encoding is going to be needed, as operand
> >> parsing happens ahead of template selection. If instead you mean
> >> "that function" and hence install_template(), then yes, we need to know
> whether to use EVEX there.
> >> Yet how does that result in a need for the .evex check here? (Or
> >> maybe your reply was really to the first of the three parts of my
> >> earlier one?)
> >>
> >
> > Agree with you, put them here is unreasonable.
> >
> > For example
> >
> > vtestps (%r27),%ymm6
> >
> > we should report unsupported  Egpr. But without .evex check, it will report
> "Error: no EVEX encoding for `vtestps'"
> >
> >> But anyway - as said earlier on, using current_templates here looks
> >> wrong in the first place. check_register() deals with only a
> >> register, without regard to the context it is used in (with the sole exception
> of allow_pseudo_reg).
> >> May I remind you that earlier on I already indicated that I suspect
> >> you'll need a new enumerator to put in i.vec_encoding for this new
> purpose?
> >>
> >
> > If we don't put it in check_register(), we need to add a for loop at the
> beginning of the install_template() to check RegRex2. Do you think it is okay?
> Or create a function for it.
> >
> > for (unsigned int op = 0; op < i.operands; op++)
> >     {
> >       if (i.types[op].bitfield.class != Reg)
> >         continue;
> >
> >       if (i.op[op].regs->reg_flags & RegRex2)
> >         i.vec_encoding = vex_encoding_evex;
> >     }
> >
> >   if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> >       || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> >     i.vec_encoding = vex_encoding_evex;
> 
> As a last resort this may be an option. But until my suggestion wasn't at least
> tried or demonstrated to be worse, I don't think the above would be
> acceptable.
> 

> >> May I remind you that earlier on I already indicated that I suspect
> >> you'll need a new enumerator to put in i.vec_encoding for this new
> purpose?

Jan, I didn't get your point, I think the enumerator vex_encoding_evex works well, the question is how to filter if Egpr is used in the current instruction, we should make a choice before install_template, whether it's an evex template or a vex template.

Thanks,
Lili



^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
  2023-12-13  8:35             ` Cui, Lili
@ 2023-12-13  9:13               ` Jan Beulich
  0 siblings, 0 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-13  9:13 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 13.12.2023 09:35, Cui, Lili wrote:
>>>>>>> @@ -14233,6 +14276,12 @@ static bool check_register (const
>>>>>>> reg_entry
>>>>>> *r)
>>>>>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
>>>>>>>  	  || flag_code != CODE_64BIT)
>>>>>>>  	return false;
>>>>>>> +
>>>>>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be
>>>>>>> + marked as
>>>>>> EVEX.
>>>>>>> +	 For the later install_template function.  */
>>>>>>> +      if (current_templates->start->opcode_modifier.vex
>>>>>>> +	  && current_templates->start->opcode_modifier.evex)
>>>>>>> +	i.vec_encoding = vex_encoding_evex;
>>>>>>
>>>>>> I'm afraid I don't understand the 2nd sentence of the comment. This
>>>>>> may be related to my question regarding cpu_flags_match() further up.
>>>>>>
>>>>>> The first sentence isn't quite correct either - you don't mark any
>>>>>> template here (and you can't, because we don't even know yet which
>>>>>> template we're going to use).
>>>>>>
>>>>>> Finally - do you really need the .evex check here? (I won't exclude
>>>>>> that this yields a better diagnostic in certain cases, but this
>>>>>> wants clarifying if so.)
>>>>>>
>>>>>
>>>>> If you look at install_template(), you'll see that before this
>>>>> function we
>>>> need to know if the current encoding is evex.
>>>>
>>>> "This function" being check_register()? If so, then no, we can't know
>>>> up front whether EVEX encoding is going to be needed, as operand
>>>> parsing happens ahead of template selection. If instead you mean
>>>> "that function" and hence install_template(), then yes, we need to know
>> whether to use EVEX there.
>>>> Yet how does that result in a need for the .evex check here? (Or
>>>> maybe your reply was really to the first of the three parts of my
>>>> earlier one?)
>>>>
>>>
>>> Agree with you, put them here is unreasonable.
>>>
>>> For example
>>>
>>> vtestps (%r27),%ymm6
>>>
>>> we should report unsupported  Egpr. But without .evex check, it will report
>> "Error: no EVEX encoding for `vtestps'"
>>>
>>>> But anyway - as said earlier on, using current_templates here looks
>>>> wrong in the first place. check_register() deals with only a
>>>> register, without regard to the context it is used in (with the sole exception
>> of allow_pseudo_reg).
>>>> May I remind you that earlier on I already indicated that I suspect
>>>> you'll need a new enumerator to put in i.vec_encoding for this new
>> purpose?
>>>>
>>>
>>> If we don't put it in check_register(), we need to add a for loop at the
>> beginning of the install_template() to check RegRex2. Do you think it is okay?
>> Or create a function for it.
>>>
>>> for (unsigned int op = 0; op < i.operands; op++)
>>>     {
>>>       if (i.types[op].bitfield.class != Reg)
>>>         continue;
>>>
>>>       if (i.op[op].regs->reg_flags & RegRex2)
>>>         i.vec_encoding = vex_encoding_evex;
>>>     }
>>>
>>>   if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>>>       || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>>>     i.vec_encoding = vex_encoding_evex;
>>
>> As a last resort this may be an option. But until my suggestion wasn't at least
>> tried or demonstrated to be worse, I don't think the above would be
>> acceptable.
>>
> 
>>>> May I remind you that earlier on I already indicated that I suspect
>>>> you'll need a new enumerator to put in i.vec_encoding for this new
>> purpose?
> 
> Jan, I didn't get your point, I think the enumerator vex_encoding_evex works well, the question is how to filter if Egpr is used in the current instruction, we should make a choice before install_template, whether it's an evex template or a vex template.

Well, of course you can make the existing enumerator work, by - see above -
adding yet another loop over all operands. In a similar way it may have
been possible to avoid the introduction of vex_encoding_evex512. But it is
generally better to record the precise reason for requiring a certain
encoding / putting in place a certain restriction, i.e. going even beyond
the desire to avoid introducing new loops over all operands if at all
possible.

Here as soon as an eGPR is used, various encodings cannot be used anymore.
That's best expressed explicitly. (It might even be that such a new
enumerator would help with the REX2 encodings. Recall that I said earlier
already that both field and enumerator names aren't fully appropriate
anymore. Yet changing them isn't a priority, so we can defer that until
after all the APX work has landed.)

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 6/9] Support APX NDD
  2023-12-11 16:50       ` Jan Beulich
@ 2023-12-13 10:42         ` Cui, Lili
  0 siblings, 0 replies; 69+ messages in thread
From: Cui, Lili @ 2023-12-13 10:42 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, Kong, Lingling, binutils

> >>> +add, 0x0, APX_F,
> +D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|N
> >> F, {
> >>> +Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>
> >> There is _still_ Byte|Word|Dword|Qword in here (and below), when I
> >> think I pointed out more than once before that in new templates such
> >> redundancy wants omitting.
> >>
> >> Since this isn't the first instance of earlier review comments not
> >> taken care of, may I please ask that you make reasonably sure that
> >> new versions aren't sent out like this?
> >>
> >
> > This part could indeed be omitted, but I really don't remember you
> mentioning it on the APX patches.
> 
> Already in e.g.
> https://sourceware.org/pipermail/binutils/2023-November/130422.html
> I pointed out that such earlier comments in e.g.
> https://sourceware.org/pipermail/binutils/2023-September/129590.html
> were not addressed.
> 

Sorry, movbe was indeed caused by the reg I added, I didn't notice that the leagcy template have this issue as well.  when you said I had something need to change, I didn't realize it was here at all.

> > There are still a lot of redundant Byte|Word|Dword|Qword in the opcode
> table, APX just added some flags on top of the old ones. Do you mind if I
> create a patch first to remove the redundant parts of master?
> 
> I don't mind you cleaning up first. It's just that normally I wouldn't do so in a
> separate patch (one of the reasons being that such non-functional changes get
> in the way of using "git blame" or alike when trying to find the most recent
> real change to a line), unless it was only a handful of instances left. Instead I
> typically do such tidying as lines are touched anyway. Thing here simply is that
> new templates shouldn't have such anomalies anymore.
> 

I still want to change them. It's easy to be misled.

> +D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Opti
> >> mize|
> >>> +NF, { Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x,
> >>> +Reg8|Reg16|Reg32|Reg64, }
> >>
> >> Here and elsewhere, what's Optimize for? It not being there on other
> >> templates, it can't be for the EVEX->REX2 optimization? If there are
> >> further optimization plans, that's (again) something to mention in
> >> the description. Yet better would be if such attributes were added
> >> only when respective optimizations are actually introduced. Unlike
> >> e.g. NF, which would mean another bulk update if not added right
> >> away, new optimizations typically affect only a few templates at a time.
> >>
> >
> > Optimize is not new.
> >
> > sub, 0x28, APX_F,
> >
> D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Opti
> mize|N
> > F, { Reg8|Reg16|Reg32|Reg64,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex,
> > Reg8|Reg16|Reg32|Reg64, } sub, 0x28, 0,
> > D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
> > Reg8|Reg16|Reg32|Reg64,
> >
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> }
> 
> Optimize is legitimately there for the legacy template. If the new template also
> wants it, there needs to be some reason. Otherwise it is part of the
> tranformation to APX/EVEX to drop it.
> 

Dropped Optimize, thanks.

> >>>  sub, 0x28, 0,
> >>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
> >>> Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >> }
> >>> +sub, 0x83/5, APX_F,
> >>> +Modrm|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S,
> >>> +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> >>> +Reg16|Reg32|Reg64 }
> >>>  sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> >>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  sub,
> >> 0x2c,
> >>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> >> Acc|Byte|Word|Dword|Qword }
> >>> +sub, 0x80/5, APX_F,
> >>>
> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
> >>> +Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >>> Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >> }
> >>
> >> There are still only 3 new templates here (and also above for add,
> >> plus for other similar insns), when ...
> >>
> >>>  dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
> >>> +dec, 0xfe/1, APX_F,
> >>>
> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >> }
> >>>
> >>> +sbb, 0x18, APX_F,
> >>> +D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4,
> {
> >>> +Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, {
> >>> Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >> }
> >>> +sbb, 0x18, APX_F,
> >>> +D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, {
> >>> +Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x }
> >>> +sbb, 0x83/3, APX_F,
> >>>
> >>
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4,
> >> {
> >>> +Imm8S,
> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> >>> +Reg16|Reg32|Reg64 }
> >>>  sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> >>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> >>> +sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf,
> >> { Imm8S,
> >>> +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
> >>>  sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> >>> Acc|Byte|Word|Dword|Qword }
> >>> +sbb, 0x80/3, APX_F,
> >>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, {
> >>> +Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >>> Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >> }
> >>> +sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, {
> >>> +Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x }
> >>
> >> ... there are 6 new templates here. This is again an aspect I had
> >> pointed out before. You cannot defer the addition of the other 3
> >> until the NF patch, as you want to make sure that with just this
> >> patch in place something both
> >>
> >>     {evex} sbb %eax, %eax
> >>
> >> and
> >>
> >>     {evex} sub %eax, %eax
> >>
> >> actually assemble, and to EVEX encodings. I can't see how that would
> >> work in the latter case without those further templates.
> >>
> >> The alternative is to also defer adding the 2-operand SBB templates
> >> (and any others you add here which don't use DstVVVV).
> >>
> >
> > I'm having a headache with this, some instructions like sbb don't support NF,
> originally they were in the 4/9 patch, but their disassemblers are in the NDD
> patch, and you agreed to put them in the NDD patch.
> 
> Right, yet still the overall result wants to be consistent. Hence why I'm not
> demanding that you move these templates yet later (which is one option).
> Instead I've indicated that moving the others ahead would also be okay.
> 
 
I'd like to move them into the NF patch, only need to move the templates.  However, the second method is more cumbersome and requires moving the encoder, decoder, and test cases to the evex egpr patch.

> Like with any series, you want it to be in a shape where it can be committed
> piecemeal. Which is even more important with a release around the corner.
> If we end up with just partial APX support in 2.42, that partial support should
> be in a shape that's predictable to users.
> 
> > Now I really don't know where to move. Moving encoding, decoding, and
> especially test cases for instructions between patches is cumbersome and I
> really don't think it makes much sense.
> 
> I can see your point, and I'm sorry for the hassle. Part of the problem of the
> moving being troublesome is (imo) that many of the patches simply were
> (are) doing too many things at a time anyway.
> 
> >>>  xor, 0x30, 0,
> >>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
> >>> Reg8|Reg16|Reg32|Reg64,
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >> }
> >>> +xor, 0x83/6, APX_F,
> >>>
> >>
> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|
> >> NF, {
> >>> +Imm8S,
> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
> >>> +Reg16|Reg32|Reg64 }
> >>>  xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
> >>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  xor,
> >> 0x34,
> >>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
> >> Acc|Byte|Word|Dword|Qword }
> >>> +xor, 0x80/6, APX_F,
> >>>
> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
> >>> +Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
> >> x,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
> >>> Imm8|Imm16|Imm32|Imm32S,
> >>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >> }
> >>>
> >>>  // clr with 1 operand is really xor with 2 operands.
> >>>  clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, {
> >>> Reg8|Reg16|Reg32|Reg64 }
> >>
> >> Btw., for consistency this may also want accompanying with an EVEX
> >> counterpart.
> >>
> >
> > Do you mean to add an entry like this? It should belong to the previous
> patch.
> >
> > // clr with 1 operand is really xor with 2 operands.
> > clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, {
> > Reg8|Reg16|Reg32|Reg64 } clr, 0x30, APX_F,
> > W|Modrm|No_sSuf|RegKludge|EVex128|EVexMap4|Optimize, {
> > Reg8|Reg16|Reg32|Reg64 }
> 
> Yes, something like this. And possibly indeed not the patch here; the template
> simply happened to be in context. Where exactly it wants to go depends - see
> above - on where other similar templates are introduced. Note however that
> the corresponding XOR templates are introduced here, just above and still in
> context.
> 

For clr's evex format template, I think it should be in the NF patch, since xor's evex format template is also in that patch.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 7/9] Support APX Push2/Pop2
  2023-12-11 11:17   ` Jan Beulich
@ 2023-12-15  8:38     ` Cui, Lili
  2023-12-15  8:44       ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2023-12-15  8:38 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, binutils

> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -248,6 +248,7 @@ enum i386_error
> >      invalid_vector_register_set,
> >      invalid_tmm_register_set,
> >      invalid_dest_and_src_register_set,
> > +    invalid_src_register_set,
> >      invalid_pseudo_prefix,
> >      unsupported_vector_index_register,
> >      unsupported_broadcast,
> > @@ -256,6 +257,7 @@ enum i386_error
> >      mask_not_on_destination,
> >      no_default_mask,
> >      unsupported_rc_sae,
> > +    unsupported_rsp_register,
> >      invalid_register_operand,
> >      internal_error,
> >    };
> > @@ -5398,6 +5400,9 @@ md_assemble (char *line)
> >  	case invalid_dest_and_src_register_set:
> >  	  err_msg = _("destination and source registers must be distinct");
> >  	  break;
> > +	case invalid_src_register_set:
> 
> Did you mean invalid_dest_register_set and ...
> 
> > +	  err_msg = _("two source registers must be distinct");
> 
> ... "two destination ..."? This is for POP2, after all, which has no source register
> at all.
> 

Done.

> > @@ -5422,6 +5427,9 @@ md_assemble (char *line)
> >  	case unsupported_rc_sae:
> >  	  err_msg = _("unsupported static rounding/sae");
> >  	  break;
> > +	case unsupported_rsp_register:
> > +	  err_msg = _("cannot be used with %rsp register");
> > +	  break;
> 
> While this wording looks okay as visible here, please consider it in the context
> it is used in: "cannot be used with %rsp register for `push2'"
> is, I'm sorry to say that, clumsy at best. If you want to stick to setting err_msg,
> how about "%rsp register cannot be used"? Personally I'd prefer a resulting
> output of "%rsp register cannot be used with `push2'", but I wouldn't insist on
> you going that route if you don't like that.
> 

"%rsp register cannot be used" ,this is much better, thanks.

> > @@ -7113,6 +7121,33 @@ check_EgprOperands (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if APX operands are valid for the instruction.  */ static
> > +int
> 
> Please can functions returning boolean indicators have a return type of "bool"
> (and perhaps use "true" as the success indicator, not "false")?
> 

Done.

> > +check_APX_operands (const insn_template *t) {
> > +  /* Push2* and Pop2* cannot use RSP and Pop2* cannot pop two same
> registers.
> > +   */
> > +  if (t->mnem_off == MN_push2 || t->mnem_off == MN_push2p
> > +      || t->mnem_off == MN_pop2 || t->mnem_off == MN_pop2p)
> 
> Considering (perhaps just theoretical) further additions here, did you consider
> using switch()? Even without further additions this would imo be more legible
> (due to there being slightly less redundancy).
> 

Done.

  /* Push2* and Pop2* cannot use RSP and Pop2* cannot pop two same registers.
   */
  switch (t->mnem_off)
    {
    case MN_pop2:
    case MN_pop2p:
      if (register_number (i.op[0].regs) == register_number (i.op[1].regs))
        {
          i.error = invalid_dest_register_set;
          return 1;
        }
    case MN_push2:
    case MN_push2p:
      if (register_number (i.op[0].regs) == 4
          || register_number (i.op[1].regs) == 4)
        {
          i.error = unsupported_rsp_register;
          return 1;
        }
    }

> > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s
> > @@ -28,3 +28,9 @@ _start:
> >  	.byte 0xff
> >  	#{evex} inc %rax %rbx EVEX.vvvv' != 1111 && EVEX.ND = 0.
> >  	.insn EVEX.L0.NP.M4.W1 0xff, %rax, %rbx
> > +	.byte 0xff
> > +	# pop2 %rax, %rbx set EVEX.ND=0.
> > +	.byte 0x62,0xf4,0x64,0x08,0x8f,0xc0
> > +	.byte 0xff, 0xff, 0xff
> > +	# pop2 %rax set EVEX.vvvv' = 1111.
> 
> Another instance of the unclear EVEX.vvvv' (i.e. the questionable nature if '
> here). Yet then - what is the test below checking? EVEX.vvvv encodes one of
> the two operands, so all values are valid? Isn't this about both operands being
> the same? That would better be said then explicitly, e.g.
> simply
> 
> 	# pop2 %rax, %rax (twice same destination)
> 
> > +	.byte 0x62,0xf4,0x7c,0x18,0x8f,0xc0
> 
> Also again both new tests use .byte instead of .insn: Is there a particular
> reason? Here are a couple of examples that I have readily available (Intel
> syntax again, ftaod):
> 
> 	.insn EVEX.L0.M4.W0 0x8f/0, r8, rax{sae}	; pop2 r8, rax
> 	.insn EVEX.L0.M4.W0 0x8f/0, xmm16, rax{sae}	; pop2 r16, rax
> 	.insn EVEX.L0.M4.W0 0x8f/0, rax, r8{sae}	; pop2 rax, r8
> 	.insn EVEX.L0.M12.W0 0x8f/0, rax, rax{sae}	; pop2 rax, r16
> 	.insn EVEX.L0.M4.W1 0x8f/0, rax, rcx{sae}	; pop2.x rax, rcx
> 
> I'm sure you can derive from them what you're actually after.
> 

Thanks!

        # pop2 %rax, %r8 set EVEX.ND=0.
        .insn EVEX.L0.M4.W0 0x8f/0,  %rax, %r8
        .byte 0xff, 0xff, 0xff
        # pop2 %rax, %r8 set EVEX.vvvv = 1111.
        .insn EVEX.L0.M4.W0 0x8f,  %rax, {rn-sae},%r8
        # pop2 %r8, %r8.
        .insn EVEX.L0.M4.W0 0x8f/0,  %r8,{rn-sae}, %r8


> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-push2pop2.s
> > @@ -0,0 +1,39 @@
> > +# Check 64bit APX-Push2Pop2 instructions
> > +
> > +	.allow_index_reg
> > +	.text
> > +_start:
> > +	push2 %rbx, %rax
> > +	push2 %r17, %r8
> > +	push2 %r9, %r31
> > +	push2 %r31, %r24
> > +	push2p %rbx, %rax
> > +	push2p %r17, %r8
> > +	push2p %r9, %r31
> > +	push2p %r31, %r24
> > +	pop2 %rax, %rbx
> > +	pop2 %r8, %r17
> > +	pop2 %r31, %r9
> > +	pop2 %r24, %r31
> > +	pop2p %rax, %rbx
> > +	pop2p %r8, %r17
> > +	pop2p %r31, %r9
> > +	pop2p %r24, %r31
> > +
> > +.intel_syntax noprefix
> 
> Nit: Un-indented directive again.

Done.

> 
> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> > @@ -105,6 +105,7 @@ static bool FXSAVE_Fixup (instr_info *, int, int);
> > static bool MOVSXD_Fixup (instr_info *, int, int);  static bool
> > DistinctDest_Fixup (instr_info *, int, int);  static bool
> > PREFETCHI_Fixup (instr_info *, int, int);
> > +static bool PUSH2_POP2_Fixup (instr_info *, int, int);
> >
> >  static void ATTRIBUTE_PRINTF_3 i386_dis_printf (const disassemble_info *,
> >  						enum disassembler_style,
> > @@ -225,6 +226,9 @@ struct instr_info
> >    }
> >    vex;
> >
> > +/* For APX EVEX-promoted prefix, EVEX.ND shares the same bit as
> > +vex.b.  */ #define nd b
> 
> Can this be moved ahead to patch 4, such that it can be used there (instead of
> vex.b) as well? IOW ...
> 
> > @@ -9125,7 +9133,7 @@ get_valid_dis386 (const struct dis386 *dp,
> > instr_info *ins)
> >
> >        /* EVEX from legacy instructions, when the EVEX.ND bit is 0,
> >  	 all bits of EVEX.vvvv and EVEX.V' must be 1.  */
> > -      if (ins->evex_type == evex_from_legacy && !ins->vex.b
> > +      if (ins->evex_type == evex_from_legacy && !ins->vex.nd
> >  	  && (ins->vex.register_specifier || !ins->vex.v))
> >  	return &bad_opcode;
> 
> ... neither this nor ...
> 
> > @@ -13388,11 +13396,10 @@ OP_VEX (instr_info *ins, int bytemode, int
> sizeflag ATTRIBUTE_UNUSED)
> >    if (!ins->need_vex)
> >      return true;
> >
> > -  /* Here vex.b is treated as "EVEX.ND".  */
> >    if (ins->evex_type == evex_from_legacy)
> >      {
> >        ins->evex_used |= EVEX_b_used;
> > -      if (!ins->vex.b)
> > +      if (!ins->vex.nd)
> >  	return true;
> >      }       
> 
> ... this should require touching here.
> 

I moved them into NDD patch, , which adds these checks.

> > @@ -13884,3 +13894,26 @@ PREFETCHI_Fixup (instr_info *ins, int
> > bytemode, int sizeflag)
> >
> >    return OP_M (ins, bytemode, sizeflag);  }
> > +
> > +static bool
> > +PUSH2_POP2_Fixup (instr_info *ins, int bytemode, int sizeflag) {
> > +  if (ins->modrm.mod != 3 || !ins->vex.b)
> 
> Did you mean vex.nd? Plus, considering the vex.nd check further down, why is
> this checked both here and there?
> 

Dropped.

> > +    return true;
> 
> Doesn't this result in silently bogus/wrong output? Shouldn't you print
> "(bad)" like you do further down? At which point it may make sense to simply
> fold both if()s?
> 
> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -807,6 +807,7 @@ typedef struct i386_opcode_modifier
> >    unsigned int isa64:2;
> >    unsigned int noegpr:1;
> >    unsigned int nf:1;
> > +  unsigned int push2pop2:1;
> >  } i386_opcode_modifier;
> 
> Still a new modifier despite my earlier request to avoid adding one when you
> easily can? Here OperandConstraint is actually fully applicable to use, as what
> you want to enforce is a constraint on operands.
> 

I also found this issue and removed it locally.

Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 7/9] Support APX Push2/Pop2
  2023-12-15  8:38     ` Cui, Lili
@ 2023-12-15  8:44       ` Jan Beulich
  0 siblings, 0 replies; 69+ messages in thread
From: Jan Beulich @ 2023-12-15  8:44 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, binutils

On 15.12.2023 09:38, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> @@ -5422,6 +5427,9 @@ md_assemble (char *line)
>>>  	case unsupported_rc_sae:
>>>  	  err_msg = _("unsupported static rounding/sae");
>>>  	  break;
>>> +	case unsupported_rsp_register:
>>> +	  err_msg = _("cannot be used with %rsp register");
>>> +	  break;
>>
>> While this wording looks okay as visible here, please consider it in the context
>> it is used in: "cannot be used with %rsp register for `push2'"
>> is, I'm sorry to say that, clumsy at best. If you want to stick to setting err_msg,
>> how about "%rsp register cannot be used"? Personally I'd prefer a resulting
>> output of "%rsp register cannot be used with `push2'", but I wouldn't insist on
>> you going that route if you don't like that.
> 
> "%rsp register cannot be used" ,this is much better, thanks.

It occurs to me only now (sorry) that %rsp is inappropriate to use
when assembling Intel syntax insns. In that case the % may not be there
(and as a result the entire register name then wants putting in single
quotes, as we do elsewhere).

>>> +check_APX_operands (const insn_template *t) {
>>> +  /* Push2* and Pop2* cannot use RSP and Pop2* cannot pop two same
>> registers.
>>> +   */
>>> +  if (t->mnem_off == MN_push2 || t->mnem_off == MN_push2p
>>> +      || t->mnem_off == MN_pop2 || t->mnem_off == MN_pop2p)
>>
>> Considering (perhaps just theoretical) further additions here, did you consider
>> using switch()? Even without further additions this would imo be more legible
>> (due to there being slightly less redundancy).
>>
> 
> Done.
> 
>   /* Push2* and Pop2* cannot use RSP and Pop2* cannot pop two same registers.
>    */
>   switch (t->mnem_off)
>     {
>     case MN_pop2:
>     case MN_pop2p:
>       if (register_number (i.op[0].regs) == register_number (i.op[1].regs))
>         {
>           i.error = invalid_dest_register_set;
>           return 1;
>         }

Fall-through comment here please, ...

>     case MN_push2:
>     case MN_push2p:
>       if (register_number (i.op[0].regs) == 4
>           || register_number (i.op[1].regs) == 4)
>         {
>           i.error = unsupported_rsp_register;
>           return 1;
>         }

... and break here please.

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2023-12-08 14:12   ` Jan Beulich
  2023-12-11 13:36     ` Cui, Lili
@ 2024-03-22 10:02     ` Jan Beulich
  2024-03-22 10:31       ` Jan Beulich
  2024-03-22 10:59       ` Jan Beulich
  1 sibling, 2 replies; 69+ messages in thread
From: Jan Beulich @ 2024-03-22 10:02 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, konglin1, binutils

On 08.12.2023 15:12, Jan Beulich wrote:
> On 24.11.2023 08:02, Cui, Lili wrote:
>> +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>> +rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>> +rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>> +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
> 
> Didn't we agree to avoid adding this (and its sibling) template, for the omitted
> shift count being ambiguous? Consider
> 
>     rol %cl, %al
> 
> Is this a rotate by %cl, or a 1-bit NDD rotate?

Btw, while this comment was taken into account for the "normal" shifts and
rotates, SHLD / SHRD still have this odd extra form. There's not as much
of an ambiguity there, but I think we should demand %cl to be specified
consistently across all respective APX insn forms.

I'm noticing this in the context of templatizing insn groups, and I could
certainly fold the dropping of those two templates into the respective
patch (suitably mentioning it in the description).

One further question regarding that work of mine: While ideally this
templatization would go ahead of the NF work, I can see that this would
cause re-basing troubles to you. Could you please indicate you preference?

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2024-03-22 10:02     ` Jan Beulich
@ 2024-03-22 10:31       ` Jan Beulich
  2024-03-26  2:04         ` Cui, Lili
  2024-03-22 10:59       ` Jan Beulich
  1 sibling, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2024-03-22 10:31 UTC (permalink / raw)
  To: Cui, Lili, hongjiu.lu; +Cc: konglin1, binutils

On 22.03.2024 11:02, Jan Beulich wrote:
> On 08.12.2023 15:12, Jan Beulich wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>>> +rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>>> +rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>>> +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>
>> Didn't we agree to avoid adding this (and its sibling) template, for the omitted
>> shift count being ambiguous? Consider
>>
>>     rol %cl, %al
>>
>> Is this a rotate by %cl, or a 1-bit NDD rotate?
> 
> Btw, while this comment was taken into account for the "normal" shifts and
> rotates, SHLD / SHRD still have this odd extra form. There's not as much
> of an ambiguity there, but I think we should demand %cl to be specified
> consistently across all respective APX insn forms.

Actually the overall situation (for legacy shift insns) is even worse: For
"normal" shifts / rotates, omitting the shift count means "$1", whereas for
SHLD/SHRD it means "%cl". Prior to Lili's recent disassembler change it was
also the case that only the "$1" would be omitted from output, but not the
"%cl" (at least the disassembler is consistent now).

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2024-03-22 10:02     ` Jan Beulich
  2024-03-22 10:31       ` Jan Beulich
@ 2024-03-22 10:59       ` Jan Beulich
  2024-03-26  8:22         ` Cui, Lili
  1 sibling, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2024-03-22 10:59 UTC (permalink / raw)
  To: Cui, Lili; +Cc: hongjiu.lu, konglin1, binutils

On 22.03.2024 11:02, Jan Beulich wrote:
> On 08.12.2023 15:12, Jan Beulich wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>>> +rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>>> +rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }
>>> +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }
>>
>> Didn't we agree to avoid adding this (and its sibling) template, for the omitted
>> shift count being ambiguous? Consider
>>
>>     rol %cl, %al
>>
>> Is this a rotate by %cl, or a 1-bit NDD rotate?
> 
> Btw, while this comment was taken into account for the "normal" shifts and
> rotates, SHLD / SHRD still have this odd extra form.

I have to correct myself here: RCL and RCR had such an odd form retained, too
(as, perhaps, a side effect of prematurely adding the non-NDD forms there).

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 6/9] Support APX NDD
  2024-03-22 10:31       ` Jan Beulich
@ 2024-03-26  2:04         ` Cui, Lili
  2024-03-26  7:06           ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2024-03-26  2:04 UTC (permalink / raw)
  To: Beulich, Jan, Lu, Hongjiu; +Cc: Kong, Lingling, binutils



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, March 22, 2024 6:31 PM
> To: Cui, Lili <lili.cui@intel.com>; Lu, Hongjiu <hongjiu.lu@intel.com>
> Cc: Kong, Lingling <lingling.kong@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH v3 6/9] Support APX NDD
> 
> On 22.03.2024 11:02, Jan Beulich wrote:
> > On 08.12.2023 15:12, Jan Beulich wrote:
> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> +rol, 0xd0/0, APX_F,
> >>>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>> +Imm1,
> >>>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1,
> >>>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>  }
> >>> +rol, 0xc0/0, APX_F,
> >>>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>> +Imm8|Imm8S,
> >>>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S,
> >>>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>  }
> >>> +rol, 0xd2/0, APX_F,
> >>>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>> +ShiftCount,
> >>>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount,
> >>>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>  }
> >>> +rol, 0xd0/0, APX_F,
> >>>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>
> >> Didn't we agree to avoid adding this (and its sibling) template, for
> >> the omitted shift count being ambiguous? Consider
> >>
> >>     rol %cl, %al
> >>
> >> Is this a rotate by %cl, or a 1-bit NDD rotate?
> >
> > Btw, while this comment was taken into account for the "normal" shifts
> > and rotates, SHLD / SHRD still have this odd extra form. There's not
> > as much of an ambiguity there, but I think we should demand %cl to be
> > specified consistently across all respective APX insn forms.
> 
> Actually the overall situation (for legacy shift insns) is even worse: For "normal"
> shifts / rotates, omitting the shift count means "$1", whereas for SHLD/SHRD it
> means "%cl". Prior to Lili's recent disassembler change it was also the case that
> only the "$1" would be omitted from output, but not the "%cl" (at least the
> disassembler is consistent now).
> 

I'm really confused about SHLD / SHRD, I think there is some problem with the legacy format. Normally we will omit $1, but we will not omit %cl. Should the opcode of the third item be 0fac?

shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }


Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2024-03-26  2:04         ` Cui, Lili
@ 2024-03-26  7:06           ` Jan Beulich
  2024-03-26  7:18             ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2024-03-26  7:06 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Kong, Lingling, binutils, Lu, Hongjiu

On 26.03.2024 03:04, Cui, Lili wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, March 22, 2024 6:31 PM
>> To: Cui, Lili <lili.cui@intel.com>; Lu, Hongjiu <hongjiu.lu@intel.com>
>> Cc: Kong, Lingling <lingling.kong@intel.com>; binutils@sourceware.org
>> Subject: Re: [PATCH v3 6/9] Support APX NDD
>>
>> On 22.03.2024 11:02, Jan Beulich wrote:
>>> On 08.12.2023 15:12, Jan Beulich wrote:
>>>> On 24.11.2023 08:02, Cui, Lili wrote:
>>>>> +rol, 0xd0/0, APX_F,
>>>>>
>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
>>>>> +Imm1,
>>>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>>>> +Reg8|Reg16|Reg32|Reg64 }
>>>>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1,
>>>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>>  }
>>>>> +rol, 0xc0/0, APX_F,
>>>>>
>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
>>>>> +Imm8|Imm8S,
>>>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>>>> +Reg8|Reg16|Reg32|Reg64 }
>>>>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S,
>>>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>>  }
>>>>> +rol, 0xd2/0, APX_F,
>>>>>
>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
>>>>> +ShiftCount,
>>>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>>>> +Reg8|Reg16|Reg32|Reg64 }
>>>>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount,
>>>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>>  }
>>>>> +rol, 0xd0/0, APX_F,
>>>>>
>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
>>>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>>>> +Reg8|Reg16|Reg32|Reg64 }
>>>>
>>>> Didn't we agree to avoid adding this (and its sibling) template, for
>>>> the omitted shift count being ambiguous? Consider
>>>>
>>>>     rol %cl, %al
>>>>
>>>> Is this a rotate by %cl, or a 1-bit NDD rotate?
>>>
>>> Btw, while this comment was taken into account for the "normal" shifts
>>> and rotates, SHLD / SHRD still have this odd extra form. There's not
>>> as much of an ambiguity there, but I think we should demand %cl to be
>>> specified consistently across all respective APX insn forms.
>>
>> Actually the overall situation (for legacy shift insns) is even worse: For "normal"
>> shifts / rotates, omitting the shift count means "$1", whereas for SHLD/SHRD it
>> means "%cl". Prior to Lili's recent disassembler change it was also the case that
>> only the "$1" would be omitted from output, but not the "%cl" (at least the
>> disassembler is consistent now).
>>
> 
> I'm really confused about SHLD / SHRD, I think there is some problem with the legacy format. Normally we will omit $1, but we will not omit %cl. Should the opcode of the third item be 0fac?

I don't think so, but I have no idea what the origin of this omitted
operand form is. At least if you look at the APX spec, it (imo wrongly)
omits %cl as an operand, too. "Wrongly" not the least because that's
not in line with the SDM.

Jan

> shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
> shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
> shrd, 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
> 
> 
> Thanks,
> Lili.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 6/9] Support APX NDD
  2024-03-26  7:06           ` Jan Beulich
@ 2024-03-26  7:18             ` Cui, Lili
  0 siblings, 0 replies; 69+ messages in thread
From: Cui, Lili @ 2024-03-26  7:18 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Kong, Lingling, binutils, Lu, Hongjiu



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Tuesday, March 26, 2024 3:06 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Kong, Lingling <lingling.kong@intel.com>; binutils@sourceware.org; Lu,
> Hongjiu <hongjiu.lu@intel.com>
> Subject: Re: [PATCH v3 6/9] Support APX NDD
> 
> On 26.03.2024 03:04, Cui, Lili wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Friday, March 22, 2024 6:31 PM
> >> To: Cui, Lili <lili.cui@intel.com>; Lu, Hongjiu
> >> <hongjiu.lu@intel.com>
> >> Cc: Kong, Lingling <lingling.kong@intel.com>; binutils@sourceware.org
> >> Subject: Re: [PATCH v3 6/9] Support APX NDD
> >>
> >> On 22.03.2024 11:02, Jan Beulich wrote:
> >>> On 08.12.2023 15:12, Jan Beulich wrote:
> >>>> On 24.11.2023 08:02, Cui, Lili wrote:
> >>>>> +rol, 0xd0/0, APX_F,
> >>>>>
> >>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>>>> +Imm1,
> >>>>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>>>> +Reg8|Reg16|Reg32|Reg64 }
> >>>>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1,
> >>>>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >>  }
> >>>>> +rol, 0xc0/0, APX_F,
> >>>>>
> >>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>>>> +Imm8|Imm8S,
> >>>>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>>>> +Reg8|Reg16|Reg32|Reg64 }
> >>>>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S,
> >>>>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >>  }
> >>>>> +rol, 0xd2/0, APX_F,
> >>>>>
> >>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>>>> +ShiftCount,
> >>>>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>>>> +Reg8|Reg16|Reg32|Reg64 }
> >>>>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount,
> >>>>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >>  }
> >>>>> +rol, 0xd0/0, APX_F,
> >>>>>
> >>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>>>>
> >>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>>>> +Reg8|Reg16|Reg32|Reg64 }
> >>>>
> >>>> Didn't we agree to avoid adding this (and its sibling) template,
> >>>> for the omitted shift count being ambiguous? Consider
> >>>>
> >>>>     rol %cl, %al
> >>>>
> >>>> Is this a rotate by %cl, or a 1-bit NDD rotate?
> >>>
> >>> Btw, while this comment was taken into account for the "normal"
> >>> shifts and rotates, SHLD / SHRD still have this odd extra form.
> >>> There's not as much of an ambiguity there, but I think we should
> >>> demand %cl to be specified consistently across all respective APX insn
> forms.
> >>
> >> Actually the overall situation (for legacy shift insns) is even worse: For
> "normal"
> >> shifts / rotates, omitting the shift count means "$1", whereas for
> >> SHLD/SHRD it means "%cl". Prior to Lili's recent disassembler change
> >> it was also the case that only the "$1" would be omitted from output,
> >> but not the "%cl" (at least the disassembler is consistent now).
> >>
> >
> > I'm really confused about SHLD / SHRD, I think there is some problem with
> the legacy format. Normally we will omit $1, but we will not omit %cl. Should
> the opcode of the third item be 0fac?
> 
> I don't think so, but I have no idea what the origin of this omitted operand form
> is. At least if you look at the APX spec, it (imo wrongly) omits %cl as an operand,
> too. "Wrongly" not the least because that's not in line with the SDM.
> 
> Jan

Yes, I also found it this morning that the APX spec doesn't have %cl and will confirm it with the doc.

>
> > shrd, 0xfac, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8,
> > Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex } shrd,
> > 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { ShiftCount,
> > Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex } shrd,
> > 0xfad, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, {
> > Reg16|Reg32|Reg64, Reg16|Reg32|Reg64|Unspecified|BaseIndex }
> >
> >
> > Thanks,
> > Lili.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 6/9] Support APX NDD
  2024-03-22 10:59       ` Jan Beulich
@ 2024-03-26  8:22         ` Cui, Lili
  2024-03-26  9:30           ` Jan Beulich
  0 siblings, 1 reply; 69+ messages in thread
From: Cui, Lili @ 2024-03-26  8:22 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, Kong, Lingling, binutils

> On 22.03.2024 11:02, Jan Beulich wrote:
> > On 08.12.2023 15:12, Jan Beulich wrote:
> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> +rol, 0xd0/0, APX_F,
> >>>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>> +Imm1,
> >>>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1,
> >>>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>  }
> >>> +rol, 0xc0/0, APX_F,
> >>>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>> +Imm8|Imm8S,
> >>>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S,
> >>>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>  }
> >>> +rol, 0xd2/0, APX_F,
> >>>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>> +ShiftCount,
> >>>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount,
> >>>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>  }
> >>> +rol, 0xd0/0, APX_F,
> >>>
> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
> >>>
> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> ex,
> >>> +Reg8|Reg16|Reg32|Reg64 }
> >>
> >> Didn't we agree to avoid adding this (and its sibling) template, for
> >> the omitted shift count being ambiguous? Consider
> >>
> >>     rol %cl, %al
> >>
> >> Is this a rotate by %cl, or a 1-bit NDD rotate?
> >
> > Btw, while this comment was taken into account for the "normal" shifts
> > and rotates, SHLD / SHRD still have this odd extra form.
> 
> I have to correct myself here: RCL and RCR had such an odd form retained, too
> (as, perhaps, a side effect of prematurely adding the non-NDD forms there).
> 

For RCL/RCR, we dropped the format of omitting $1.
rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }

For example, when the register is %rcx, it will conflict with the following template.

rcl, 0xd2/2, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }

I'm confused if we really want to omit the %cl case. I'll confirm it later.


Thanks,
Lili.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v3 6/9] Support APX NDD
  2024-03-26  8:22         ` Cui, Lili
@ 2024-03-26  9:30           ` Jan Beulich
  2024-03-27  2:41             ` Cui, Lili
  0 siblings, 1 reply; 69+ messages in thread
From: Jan Beulich @ 2024-03-26  9:30 UTC (permalink / raw)
  To: Cui, Lili; +Cc: Lu, Hongjiu, Kong, Lingling, binutils

On 26.03.2024 09:22, Cui, Lili wrote:
>> On 22.03.2024 11:02, Jan Beulich wrote:
>>> On 08.12.2023 15:12, Jan Beulich wrote:
>>>> On 24.11.2023 08:02, Cui, Lili wrote:
>>>>> +rol, 0xd0/0, APX_F,
>>>>>
>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
>>>>> +Imm1,
>>>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>>>> +Reg8|Reg16|Reg32|Reg64 }
>>>>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1,
>>>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>>  }
>>>>> +rol, 0xc0/0, APX_F,
>>>>>
>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
>>>>> +Imm8|Imm8S,
>>>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>>>> +Reg8|Reg16|Reg32|Reg64 }
>>>>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S,
>>>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>>  }
>>>>> +rol, 0xd2/0, APX_F,
>>>>>
>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
>>>>> +ShiftCount,
>>>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>>>> +Reg8|Reg16|Reg32|Reg64 }
>>>>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount,
>>>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>>  }
>>>>> +rol, 0xd0/0, APX_F,
>>>>>
>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {
>>>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
>> ex,
>>>>> +Reg8|Reg16|Reg32|Reg64 }
>>>>
>>>> Didn't we agree to avoid adding this (and its sibling) template, for
>>>> the omitted shift count being ambiguous? Consider
>>>>
>>>>     rol %cl, %al
>>>>
>>>> Is this a rotate by %cl, or a 1-bit NDD rotate?
>>>
>>> Btw, while this comment was taken into account for the "normal" shifts
>>> and rotates, SHLD / SHRD still have this odd extra form.
>>
>> I have to correct myself here: RCL and RCR had such an odd form retained, too
>> (as, perhaps, a side effect of prematurely adding the non-NDD forms there).
>>
> 
> For RCL/RCR, we dropped the format of omitting $1.
> rcl, 0xd0/2, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 }

That's my point: This was supposed to be dropped, but is still there. Only
rol/ror and the four shifts are where it was properly dropped. The rcl/rcr
ones disappear in "x86: templatize shift/rotate insns" now.

> For example, when the register is %rcx, it will conflict with the following template.
> 
> rcl, 0xd2/2, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
> 
> I'm confused if we really want to omit the %cl case. I'll confirm it later.

I'm confused by this. What are you talking about? Hmm, perhaps there was
some confusion from me originally saying "had such an odd form retained".
That wasn't specifically about it being %cl or $1 omitted, but more
generally about insn forms with no explicit shift count (of whatever
shape).

Jan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* RE: [PATCH v3 6/9] Support APX NDD
  2024-03-26  9:30           ` Jan Beulich
@ 2024-03-27  2:41             ` Cui, Lili
  0 siblings, 0 replies; 69+ messages in thread
From: Cui, Lili @ 2024-03-27  2:41 UTC (permalink / raw)
  To: Beulich, Jan; +Cc: Lu, Hongjiu, Kong, Lingling, binutils

> On 26.03.2024 09:22, Cui, Lili wrote:
> >> On 22.03.2024 11:02, Jan Beulich wrote:
> >>> On 08.12.2023 15:12, Jan Beulich wrote:
> >>>> On 24.11.2023 08:02, Cui, Lili wrote:
> >>>>> +rol, 0xd0/0, APX_F,
> >>>>>
> >> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF,
> {
> >>>>> +Imm1,
> >>>>>
> >> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>>>> +Reg8|Reg16|Reg32|Reg64 }
> >>>>>  rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1,
> >>>>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >>  }
> >>>>> +rol, 0xc0/0, APX_F,
> >>>>>
> >> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF,
> {
> >>>>> +Imm8|Imm8S,
> >>>>>
> >> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>>>> +Reg8|Reg16|Reg32|Reg64 }
> >>>>>  rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S,
> >>>>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >>  }
> >>>>> +rol, 0xd2/0, APX_F,
> >>>>>
> >> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF,
> {
> >>>>> +ShiftCount,
> >>>>>
> >> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>>>> +Reg8|Reg16|Reg32|Reg64 }
> >>>>>  rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount,
> >>>>>
> >>
> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
> >>  }
> >>>>> +rol, 0xd0/0, APX_F,
> >>>>>
> >> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF,
> {
> >>>>>
> >> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInd
> >> ex,
> >>>>> +Reg8|Reg16|Reg32|Reg64 }
> >>>>
> >>>> Didn't we agree to avoid adding this (and its sibling) template,
> >>>> for the omitted shift count being ambiguous? Consider
> >>>>
> >>>>     rol %cl, %al
> >>>>
> >>>> Is this a rotate by %cl, or a 1-bit NDD rotate?
> >>>
> >>> Btw, while this comment was taken into account for the "normal"
> >>> shifts and rotates, SHLD / SHRD still have this odd extra form.
> >>
> >> I have to correct myself here: RCL and RCR had such an odd form
> >> retained, too (as, perhaps, a side effect of prematurely adding the non-NDD
> forms there).
> >>
> >
> > For RCL/RCR, we dropped the format of omitting $1.
> > rcl, 0xd0/2, APX_F,
> W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4,
> > { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex,
> Reg8|Reg16|Reg32|Reg64
> > }
> 
> That's my point: This was supposed to be dropped, but is still there. Only rol/ror
> and the four shifts are where it was properly dropped. The rcl/rcr ones
> disappear in "x86: templatize shift/rotate insns" now.
> 
> > For example, when the register is %rcx, it will conflict with the following
> template.
> >
> > rcl, 0xd2/2, 0, W|Modrm|No_sSuf, { ShiftCount,
> > Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex }
> >
> > I'm confused if we really want to omit the %cl case. I'll confirm it later.
> 
> I'm confused by this. What are you talking about? Hmm, perhaps there was
> some confusion from me originally saying "had such an odd form retained".
> That wasn't specifically about it being %cl or $1 omitted, but more generally
> about insn forms with no explicit shift count (of whatever shape).
> 

Oh, got it, I dropped the NDD format for them, but promoted the RCL/ROR without explicit shift count. I see you corrected RCL/ROR and SHLD/SHRD in your patches. Thanks.

Lili.


^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2024-03-27  2:41 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30   ` Jan Beulich
2023-12-05 13:31     ` Cui, Lili
2023-12-06  7:52       ` Jan Beulich
2023-12-06 12:43         ` Cui, Lili
2023-12-07  9:01           ` Jan Beulich
2023-12-08  3:10             ` Cui, Lili
2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38   ` Jan Beulich
2023-12-08 15:21     ` Cui, Lili
2023-12-11  8:34       ` Jan Beulich
2023-12-12 10:44         ` Cui, Lili
2023-12-12 11:16           ` Jan Beulich
2023-12-12 12:32             ` Cui, Lili
2023-12-12 12:39               ` Jan Beulich
2023-12-12 13:15                 ` Cui, Lili
2023-12-12 14:13                   ` Jan Beulich
2023-12-13  7:36                     ` Cui, Lili
2023-12-13  7:48                       ` Jan Beulich
2023-12-12 12:58         ` Cui, Lili
2023-12-12 14:04           ` Jan Beulich
2023-12-13  8:35             ` Cui, Lili
2023-12-13  9:13               ` Jan Beulich
2023-12-07 13:34   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:43       ` Jan Beulich
2023-12-11 11:50   ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:55       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12   ` Jan Beulich
2023-12-11 13:36     ` Cui, Lili
2023-12-11 16:50       ` Jan Beulich
2023-12-13 10:42         ` Cui, Lili
2024-03-22 10:02     ` Jan Beulich
2024-03-22 10:31       ` Jan Beulich
2024-03-26  2:04         ` Cui, Lili
2024-03-26  7:06           ` Jan Beulich
2024-03-26  7:18             ` Cui, Lili
2024-03-22 10:59       ` Jan Beulich
2024-03-26  8:22         ` Cui, Lili
2024-03-26  9:30           ` Jan Beulich
2024-03-27  2:41             ` Cui, Lili
2023-12-08 14:27   ` Jan Beulich
2023-12-12  5:53     ` Cui, Lili
2023-12-12  8:28       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17   ` Jan Beulich
2023-12-15  8:38     ` Cui, Lili
2023-12-15  8:44       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27   ` Jan Beulich
2023-12-12  3:18     ` Hu, Lin1
2023-12-12  8:41       ` Jan Beulich
2023-12-13  5:31         ` Hu, Lin1
2023-12-12  8:45       ` Jan Beulich
2023-12-13  6:06         ` Hu, Lin1
2023-12-13  8:19           ` Jan Beulich
2023-12-13  8:34             ` Hu, Lin1
2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22   ` Cui, Lili
2023-11-24 12:14     ` Jan Beulich
2023-12-12  2:57 ` Lu, Hongjiu
2023-12-12  8:16 ` Cui, Lili

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).