public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Binutils <binutils@sourceware.org>
Subject: [PATCH 5/6] x86: VPSADBW's source operands are also commutative
Date: Fri, 26 Mar 2021 11:52:24 +0100	[thread overview]
Message-ID: <f6dc8e4d-c155-5daf-650a-aad8765e91a2@suse.com> (raw)
In-Reply-To: <287ad145-1fe3-2477-327a-30e8d45a4be7@suse.com>

In commit 79dec6b7baa2 ("x86-64: optimize certain commutative
VEX-encoded insns") I missed the fact that there being subtraction
involved here doesn't matter, as absolute differences get summed up.

gas/
2021-03-XX  Jan Beulich  <jbeulich@suse.com>

	* testsuite/gas/i386/x86-64-sse2avx.s: Add vpsadbw case.
	* testsuite/gas/i386/x86-64-avx-swap-2.d.
	testsuite/gas/i386/x86-64-sse2avx.d: Adjust expectations.

opcodes/
2021-03-XX  Jan Beulich  <jbeulich@suse.com>

	* i386-opc.tbl (psadbw): Add <sse2:comm>.
	(vpsadbw): Add C.
	* i386-tbl.h: Re-generate.

--- a/gas/testsuite/gas/i386/x86-64-avx-swap-2.d
+++ b/gas/testsuite/gas/i386/x86-64-avx-swap-2.d
@@ -69,7 +69,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	c5 8d f4 d6          	vpmuludq %ymm6,%ymm14,%ymm2
 [ 	]*[a-f0-9]+:	c4 c2 4d 28 d6       	vpmuldq %ymm14,%ymm6,%ymm2
 [ 	]*[a-f0-9]+:	c5 8d eb d6          	vpor   %ymm6,%ymm14,%ymm2
-[ 	]*[a-f0-9]+:	c4 c1 4d f6 d6       	vpsadbw %ymm14,%ymm6,%ymm2
+[ 	]*[a-f0-9]+:	c5 8d f6 d6          	vpsadbw %ymm6,%ymm14,%ymm2
 [ 	]*[a-f0-9]+:	c4 c1 4d f8 d6       	vpsubb %ymm14,%ymm6,%ymm2
 [ 	]*[a-f0-9]+:	c4 c1 4d f9 d6       	vpsubw %ymm14,%ymm6,%ymm2
 [ 	]*[a-f0-9]+:	c4 c1 4d fa d6       	vpsubd %ymm14,%ymm6,%ymm2
@@ -211,7 +211,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	c5 89 f4 d6          	vpmuludq %xmm6,%xmm14,%xmm2
 [ 	]*[a-f0-9]+:	c4 c2 49 28 d6       	vpmuldq %xmm14,%xmm6,%xmm2
 [ 	]*[a-f0-9]+:	c5 89 eb d6          	vpor   %xmm6,%xmm14,%xmm2
-[ 	]*[a-f0-9]+:	c4 c1 49 f6 d6       	vpsadbw %xmm14,%xmm6,%xmm2
+[ 	]*[a-f0-9]+:	c5 89 f6 d6          	vpsadbw %xmm6,%xmm14,%xmm2
 [ 	]*[a-f0-9]+:	c4 c1 49 f8 d6       	vpsubb %xmm14,%xmm6,%xmm2
 [ 	]*[a-f0-9]+:	c4 c1 49 f9 d6       	vpsubw %xmm14,%xmm6,%xmm2
 [ 	]*[a-f0-9]+:	c4 c1 49 fa d6       	vpsubd %xmm14,%xmm6,%xmm2
--- a/gas/testsuite/gas/i386/x86-64-sse2avx.d
+++ b/gas/testsuite/gas/i386/x86-64-sse2avx.d
@@ -273,6 +273,7 @@ Disassembly of section .text:
 [ 	]*[a-f0-9]+:	c5 89 eb f6          	vpor   %xmm6,%xmm14,%xmm6
 [ 	]*[a-f0-9]+:	c5 c9 eb 31          	vpor   \(%rcx\),%xmm6,%xmm6
 [ 	]*[a-f0-9]+:	c5 c9 f6 f4          	vpsadbw %xmm4,%xmm6,%xmm6
+[ 	]*[a-f0-9]+:	c5 89 f6 f6          	vpsadbw %xmm6,%xmm14,%xmm6
 [ 	]*[a-f0-9]+:	c5 c9 f6 31          	vpsadbw \(%rcx\),%xmm6,%xmm6
 [ 	]*[a-f0-9]+:	c4 e2 49 00 f4       	vpshufb %xmm4,%xmm6,%xmm6
 [ 	]*[a-f0-9]+:	c4 e2 49 00 31       	vpshufb \(%rcx\),%xmm6,%xmm6
--- a/gas/testsuite/gas/i386/x86-64-sse2avx.s
+++ b/gas/testsuite/gas/i386/x86-64-sse2avx.s
@@ -280,6 +280,7 @@ _start:
 	por %xmm14,%xmm6
 	por (%rcx),%xmm6
 	psadbw %xmm4,%xmm6
+	psadbw %xmm14,%xmm6
 	psadbw (%rcx),%xmm6
 	pshufb %xmm4,%xmm6
 	pshufb (%rcx),%xmm6
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -1131,7 +1131,7 @@ prefetcht0, 0xf18, 1, CpuSSE|Cpu3dnowA,
 prefetcht1, 0xf18, 2, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoAVX, { BaseIndex }
 prefetcht2, 0xf18, 3, CpuSSE|Cpu3dnowA, Modrm|Anysize|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoAVX, { BaseIndex }
 psadbw, 0xff6, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|NoAVX, { Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
-psadbw<sse2>, 0x660ff6, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+psadbw<sse2>, 0x660ff6, None, <sse2:cpu>, Modrm|<sse2:attr>|<sse2:vvvv>|<sse2:comm>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 pshufw, 0xf70, None, CpuSSE|Cpu3dnowA, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|NoAVX, { Imm8, Qword|Unspecified|BaseIndex|RegMMX, RegMMX }
 rcpps<sse>, 0x0f53, None, <sse:cpu>, Modrm|<sse:attr>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
 rcpss<sse>, 0xf30f53, None, <sse:cpu>, Modrm|<sse:scal>|<sse:vvvv>|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1703,7 +1703,7 @@ vpmulld, 0x6640, None, CpuAVX, Modrm|Vex
 vpmullw, 0x66d5, None, CpuAVX, Modrm|C|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vpmuludq, 0x66f4, None, CpuAVX, Modrm|C|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vpor, 0x66eb, None, CpuAVX, Modrm|C|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
-vpsadbw, 0x66f6, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
+vpsadbw, 0x66f6, None, CpuAVX, Modrm|Vex|Space0F|VexVVVV=1|VexWIG|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vpshufb, 0x6600, None, CpuAVX, Modrm|Vex|Space0F38|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegXMM, RegXMM, RegXMM }
 vpshufd, 0x6670, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
 vpshufhw, 0xf370, None, CpuAVX, Modrm|Vex|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegXMM, RegXMM }
@@ -1854,7 +1854,7 @@ vpmulld, 0x6640, None, CpuAVX2, Modrm|Ve
 vpmullw, 0x66d5, None, CpuAVX2, Modrm|C|Vex=2|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
 vpmuludq, 0x66f4, None, CpuAVX2, Modrm|C|Vex=2|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
 vpor, 0x66eb, None, CpuAVX2, Modrm|C|Vex=2|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
-vpsadbw, 0x66f6, None, CpuAVX2, Modrm|Vex=2|Space0F|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
+vpsadbw, 0x66f6, None, CpuAVX2, Modrm|Vex=2|Space0F|VexVVVV=1|VexWIG|C|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
 vpshufb, 0x6600, None, CpuAVX2, Modrm|Vex=2|Space0F38|VexVVVV=1|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM }
 vpshufd, 0x6670, None, CpuAVX2, Modrm|Vex=2|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegYMM, RegYMM }
 vpshufhw, 0xf370, None, CpuAVX2, Modrm|Vex=2|Space0F|VexWIG|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, Unspecified|BaseIndex|RegYMM, RegYMM }


  parent reply	other threads:[~2021-03-26 10:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 10:48 [PATCH 0/6] x86: further opcode table compaction plus fallout Jan Beulich
2021-03-26 10:50 ` [PATCH 1/6] x86: derive opcode encoding space attribute from base opcode Jan Beulich
2021-03-26 10:50 ` [PATCH 2/6] x86: shrink some struct insn_template fields Jan Beulich
2021-03-29 13:00   ` H.J. Lu
2021-03-29 14:03     ` Jan Beulich
2021-03-29 14:41       ` H.J. Lu
2021-03-29 14:49         ` Jan Beulich
2021-03-29 14:51           ` H.J. Lu
2021-03-26 10:51 ` [PATCH 3/6] x86: undo Prefix_0X<nn> use in opcode table Jan Beulich
2021-03-26 10:51 ` [PATCH 4/6] x86: fold SSE2AVX and their base MMX/SSE templates Jan Beulich
2021-03-29 13:31   ` H.J. Lu
2021-03-26 10:52 ` Jan Beulich [this message]
2021-03-26 10:53 ` [PATCH 6/6] x86: move some opcode table entries Jan Beulich
2021-03-26 21:16 ` [PATCH 0/6] x86: further opcode table compaction plus fallout H.J. Lu
2021-03-29 10:08   ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6dc8e4d-c155-5daf-650a-aad8765e91a2@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).