From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 6FDCD3858D39 for ; Mon, 23 Oct 2023 03:30:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6FDCD3858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6FDCD3858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=134.134.136.126 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698031819; cv=none; b=JA7cfGkZhAh/fZ9y6BrbaXjTu7D+wpGqn65085ygPPjp8aMIvA1r5UoiNAXzngVbhgX/v2sAnNck36de9yAViZ51iR2mFvFcAKlaM6yL4if5I3eK7FpfUprQkM8zvjmRpbrfsjCr8P0kP/ZJKq0XzYhK6ermtj6bDOAyeadutAY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698031819; c=relaxed/simple; bh=rla97R9f/e8g0kDP3TlAV1zkwRTP7SWJ2ZZXgUzG32o=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=VbGmFynNY4bXm25wxNlWB+S08+j1v9uf52Taty17rt9cLCP6PsT0e6NLX1M4/BqA9wQTpWIFH91zZ4OwdLSMRsOymumcyq6kdE9eQ+AmG8g0ntn/XGTlNZGOMlZR11/E7W6OaMuaZHuKG43PIZ5MbH1n39pwv0NonOeFpZrqqfo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698031816; x=1729567816; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rla97R9f/e8g0kDP3TlAV1zkwRTP7SWJ2ZZXgUzG32o=; b=DDGXOQBo9KiVT3yyNy80jTexD5BwPkVM8oPn3iZ6qrd+dvfn8IArNIr5 wpxEnYszCZCYm8HOxrnwURYOv4PFe11zMOLROsnyZu3pHoDGQG6EiIG4h fbu1m7mcOhY0HR181EexG9onxAvQaMA01HiA5+iMTaxEjzA1c2bmNO2C2 21fP2/m6mjkqiS6h2GzzDAup4J3hvwUezxK6xUw413Lq8rqLKimPP61y4 zU/T2vc/EpGAID+FDMZGktwJ5zMrHBrQjqPGQ8K7E/QoHVvqKcdql4Pfz Vu05ic+sfounkImcqiW9PO8/mKaCyRMe/+V2uFYohaENKwKMqjPYBZhOa w==; X-IronPort-AV: E=McAfee;i="6600,9927,10871"; a="371827559" X-IronPort-AV: E=Sophos;i="6.03,244,1694761200"; d="scan'208";a="371827559" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Oct 2023 20:30:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10871"; a="848662371" X-IronPort-AV: E=Sophos;i="6.03,244,1694761200"; d="scan'208";a="848662371" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by FMSMGA003.fm.intel.com with ESMTP; 22 Oct 2023 20:30:09 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 7CDDF1005660; Mon, 23 Oct 2023 11:30:08 +0800 (CST) From: "Hu, Lin1" To: binutils@sourceware.org Cc: JBeulich@suse.com, hongjiu.lu@intel.com Subject: [PATCH 5/8] [v2] Support APX NDD optimized encoding. Date: Mon, 23 Oct 2023 11:30:08 +0800 Message-Id: <20231023033008.3256485-1-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SCC_10_SHORT_WORD_LINES,SCC_20_SHORT_WORD_LINES,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The new version of PATCH has been adjusted mainly based on comments. This patch aims to optimize: add %r16, %r15, %r15 -> add %r16, %r15 gas/ChangeLog: * config/tc-i386.c (optimize_NDD_to_nonNDD): New function. (match_template): If we can optimzie APX NDD insns, so rematch template. * testsuite/gas/i386/x86-64.exp: Add test. * testsuite/gas/i386/x86-64-apx-ndd-optimize.d: New test. * testsuite/gas/i386/x86-64-apx-ndd-optimize.s: Ditto. --- gas/config/tc-i386.c | 45 + .../gas/i386/x86-64-apx-ndd-optimize.d | 124 + .../gas/i386/x86-64-apx-ndd-optimize.s | 118 + gas/testsuite/gas/i386/x86-64.exp | 1 + opcodes/i386-opc.tbl | 22 +- 8 files changed, 11414 insertions(+), 8754 deletions(-) create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index 5a40fdcce40..5e6bb5435e3 100644 --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -7186,6 +7186,43 @@ check_EgprOperands (const insn_template *t) return 0; } +/* Optimize APX NDD insns to non-NDD insns. */ + +static bool +optimize_NDD_to_nonNDD (const insn_template *t) +{ + if (t->opcode_modifier.vexvvvv + && t->opcode_space == SPACE_EVEXMAP4 + && i.reg_operands >= 2 + && i.types[i.operands - 1].bitfield.class == Reg) + { + unsigned int readonly_var = ~0; + unsigned int dest = i.operands - 1; + unsigned int src1 = (i.operands > 2) ? i.operands - 2 : 0; + unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0; + + if (i.types[src1].bitfield.class == Reg + && i.op[src1].regs == i.op[dest].regs) + readonly_var = src2; + /* adcx, adox and imul don't have D bit. */ + else if (i.types[src2].bitfield.class == Reg + && i.op[src2].regs == i.op[dest].regs + && t->opcode_modifier.commutative) + readonly_var = src1; + if (readonly_var != (unsigned int) ~0) + { + --i.operands; + --i.reg_operands; + --i.tm.operands; + + if (readonly_var != src2) + swap_2_operands (readonly_var, src2); + return 1; + } + } + return 0; +} + /* Helper function for the progress() macro in match_template(). */ static INLINE enum i386_error progress (enum i386_error new, enum i386_error last, @@ -7706,6 +7743,14 @@ match_template (char mnem_suffix) i.memshift = memshift; } + /* If we can optimize a NDD insn to non-NDD insn, like + add %r16, %r8, %r8 -> add %r16, %r8, then rematch template. */ + if (optimize == 1 && optimize_NDD_to_nonNDD (t)) + { + t = current_templates->start - 1; + continue; + } + /* We've found a match; break out of loop. */ break; } diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d new file mode 100644 index 00000000000..f23b2b127b6 --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d @@ -0,0 +1,124 @@ +#as: -O1 +#objdump: -drw +#name: x86-64 APX NDD optimized encoding +#source: x86-64-apx-ndd-optimize.s + +.*: +file format .* + + +Disassembly of section .text: + +0+ <_start>: +\s*[a-f0-9]+:\s*d5 19 ff c7 inc %r31 +\s*[a-f0-9]+:\s*d5 11 fe c7 inc %r31b +\s*[a-f0-9]+:\s*d5 4d 01 f8 add %r31,%r8 +\s*[a-f0-9]+:\s*d5 45 00 f8 add %r31b,%r8b +\s*[a-f0-9]+:\s*d5 4d 01 f8 add %r31,%r8 +\s*[a-f0-9]+:\s*d5 1d 03 c7 add %r31,%r8 +\s*[a-f0-9]+:\s*d5 4d 03 38 add \(%r8\),%r31 +\s*[a-f0-9]+:\s*d5 1d 03 07 add \(%r31\),%r8 +\s*[a-f0-9]+:\s*49 81 c7 33 44 34 12 add \$0x12344433,%r15 +\s*[a-f0-9]+:\s*49 81 c0 11 22 33 f4 add \$0xfffffffff4332211,%r8 +\s*[a-f0-9]+:\s*d5 18 ff c9 dec %r17 +\s*[a-f0-9]+:\s*d5 10 fe c9 dec %r17b +\s*[a-f0-9]+:\s*d5 18 f7 d1 not %r17 +\s*[a-f0-9]+:\s*d5 10 f6 d1 not %r17b +\s*[a-f0-9]+:\s*d5 18 f7 d9 neg %r17 +\s*[a-f0-9]+:\s*d5 10 f6 d9 neg %r17b +\s*[a-f0-9]+:\s*d5 1c 29 f9 sub %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 28 f9 sub %r15b,%r17b +\s*[a-f0-9]+:\s*62 54 84 18 29 38 sub %r15,\(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 2b 04 07 sub \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 ee 34 12 00 00 sub \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 1c 19 f9 sbb %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 18 f9 sbb %r15b,%r17b +\s*[a-f0-9]+:\s*62 54 84 18 19 38 sbb %r15,\(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 1b 04 07 sbb \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 de 34 12 00 00 sbb \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 1c 11 f9 adc %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 10 f9 adc %r15b,%r17b +\s*[a-f0-9]+:\s*4d 13 38 adc \(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 13 04 07 adc \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 d6 34 12 00 00 adc \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 1c 09 f9 or %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 08 f9 or %r15b,%r17b +\s*[a-f0-9]+:\s*4d 0b 38 or \(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 0b 04 07 or \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 ce 34 12 00 00 or \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 1c 31 f9 xor %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 30 f9 xor %r15b,%r17b +\s*[a-f0-9]+:\s*4d 33 38 xor \(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 33 04 07 xor \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 f6 34 12 00 00 xor \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 1c 21 f9 and %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 20 f9 and %r15b,%r17b +\s*[a-f0-9]+:\s*4d 23 38 and \(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 23 04 07 and \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 11 81 e6 34 12 00 00 and \$0x1234,%r30d +\s*[a-f0-9]+:\s*d5 19 d1 cf ror %r31 +\s*[a-f0-9]+:\s*d5 11 d0 cf ror %r31b +\s*[a-f0-9]+:\s*49 c1 cc 02 ror \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 cc 02 ror \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 c7 rol %r31 +\s*[a-f0-9]+:\s*d5 11 d0 c7 rol %r31b +\s*[a-f0-9]+:\s*49 c1 c4 02 rol \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 c4 02 rol \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 df rcr %r31 +\s*[a-f0-9]+:\s*d5 11 d0 df rcr %r31b +\s*[a-f0-9]+:\s*49 c1 dc 02 rcr \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 dc 02 rcr \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 d7 rcl %r31 +\s*[a-f0-9]+:\s*d5 11 d0 d7 rcl %r31b +\s*[a-f0-9]+:\s*49 c1 d4 02 rcl \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 d4 02 rcl \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 e7 shl %r31 +\s*[a-f0-9]+:\s*d5 11 d0 e7 shl %r31b +\s*[a-f0-9]+:\s*49 c1 e4 02 shl \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 e4 02 shl \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 ff sar %r31 +\s*[a-f0-9]+:\s*d5 11 d0 ff sar %r31b +\s*[a-f0-9]+:\s*49 c1 fc 02 sar \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 fc 02 sar \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 e7 shl %r31 +\s*[a-f0-9]+:\s*d5 11 d0 e7 shl %r31b +\s*[a-f0-9]+:\s*49 c1 e4 02 shl \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 e4 02 shl \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 ef shr %r31 +\s*[a-f0-9]+:\s*d5 11 d0 ef shr %r31b +\s*[a-f0-9]+:\s*49 c1 ec 02 shr \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 ec 02 shr \$0x2,%r12b +\s*[a-f0-9]+:\s*62 74 9c 18 24 20 01 shld \$0x1,%r12,\(%rax\),%r12 +\s*[a-f0-9]+:\s*4d 0f a4 c4 02 shld \$0x2,%r8,%r12 +\s*[a-f0-9]+:\s*62 74 b4 18 a5 08 shld %cl,%r9,\(%rax\),%r9 +\s*[a-f0-9]+:\s*d5 9c a5 e0 shld %cl,%r12,%r16 +\s*[a-f0-9]+:\s*62 7c 94 18 a5 2c 83 shld %cl,%r13,\(%r19,%rax,4\),%r13 +\s*[a-f0-9]+:\s*62 74 9c 18 2c 20 01 shrd \$0x1,%r12,\(%rax\),%r12 +\s*[a-f0-9]+:\s*4d 0f ac ec 01 shrd \$0x1,%r13,%r12 +\s*[a-f0-9]+:\s*62 74 b4 18 ad 08 shrd %cl,%r9,\(%rax\),%r9 +\s*[a-f0-9]+:\s*d5 9c ad e0 shrd %cl,%r12,%r16 +\s*[a-f0-9]+:\s*62 7c 94 18 ad 2c 83 shrd %cl,%r13,\(%r19,%rax,4\),%r13 +\s*[a-f0-9]+:\s*66 4d 0f 38 f6 c7 adcx %r15,%r8 +\s*[a-f0-9]+:\s*62 14 f9 08 66 04 3f adcx \(%r15,%r31,1\),%r8 +\s*[a-f0-9]+:\s*66 4d 0f 38 f6 c1 adcx %r9,%r8 +\s*[a-f0-9]+:\s*f3 4d 0f 38 f6 c7 adox %r15,%r8 +\s*[a-f0-9]+:\s*62 14 fa 08 66 04 3f adox \(%r15,%r31,1\),%r8 +\s*[a-f0-9]+:\s*f3 4d 0f 38 f6 c1 adox %r9,%r8 +\s*[a-f0-9]+:\s*67 0f 40 90 90 90 90 90 cmovo -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 41 90 90 90 90 90 cmovno -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 42 90 90 90 90 90 cmovb -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 43 90 90 90 90 90 cmovae -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 44 90 90 90 90 90 cmove -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 45 90 90 90 90 90 cmovne -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 46 90 90 90 90 90 cmovbe -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 47 90 90 90 90 90 cmova -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 48 90 90 90 90 90 cmovs -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 49 90 90 90 90 90 cmovns -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4a 90 90 90 90 90 cmovp -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4b 90 90 90 90 90 cmovnp -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4c 90 90 90 90 90 cmovl -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4d 90 90 90 90 90 cmovge -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4e 90 90 90 90 90 cmovle -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4f 90 90 90 90 90 cmovg -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f af 90 09 09 09 00 imul 0x90909\(%eax\),%edx +\s*[a-f0-9]+:\s*d5 aa af 94 f8 09 09 00 00 imul 0x909\(%rax,%r31,8\),%rdx +\s*[a-f0-9]+:\s*48 0f af d0 imul %rax,%rdx diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s new file mode 100644 index 00000000000..0f5c15a2f9c --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s @@ -0,0 +1,118 @@ +# Check 64bit APX NDD instructions with optimized encoding + + .allow_index_reg + .text +_start: +inc %r31,%r31 +incb %r31b,%r31b +add %r31,%r8,%r8 +addb %r31b,%r8b,%r8b +{store} add %r31,%r8,%r8 +{load} add %r31,%r8,%r8 +add %r31,(%r8),%r31 +add (%r31),%r8,%r8 +add $0x12344433,%r15,%r15 +add $0xfffffffff4332211,%r8,%r8 +dec %r17,%r17 +decb %r17b,%r17b +not %r17,%r17 +notb %r17b,%r17b +neg %r17,%r17 +negb %r17b,%r17b +sub %r15,%r17,%r17 +subb %r15b,%r17b,%r17b +sub %r15,(%r8),%r15 +sub (%r15,%rax,1),%r16,%r16 +sub $0x1234,%r30,%r30 +sbb %r15,%r17,%r17 +sbbb %r15b,%r17b,%r17b +sbb %r15,(%r8),%r15 +sbb (%r15,%rax,1),%r16,%r16 +sbb $0x1234,%r30,%r30 +adc %r15,%r17,%r17 +adcb %r15b,%r17b,%r17b +adc %r15,(%r8),%r15 +adc (%r15,%rax,1),%r16,%r16 +adc $0x1234,%r30,%r30 +or %r15,%r17,%r17 +orb %r15b,%r17b,%r17b +or %r15,(%r8),%r15 +or (%r15,%rax,1),%r16,%r16 +or $0x1234,%r30,%r30 +xor %r15,%r17,%r17 +xorb %r15b,%r17b,%r17b +xor %r15,(%r8),%r15 +xor (%r15,%rax,1),%r16,%r16 +xor $0x1234,%r30,%r30 +and %r15,%r17,%r17 +andb %r15b,%r17b,%r17b +and %r15,(%r8),%r15 +and (%r15,%rax,1),%r16,%r16 +and $0x1234,%r30,%r30 +ror %r31,%r31 +rorb %r31b,%r31b +ror $0x2,%r12,%r12 +rorb $0x2,%r12b,%r12b +rol %r31,%r31 +rolb %r31b,%r31b +rol $0x2,%r12,%r12 +rolb $0x2,%r12b,%r12b +rcr %r31,%r31 +rcrb %r31b,%r31b +rcr $0x2,%r12,%r12 +rcrb $0x2,%r12b,%r12b +rcl %r31,%r31 +rclb %r31b,%r31b +rcl $0x2,%r12,%r12 +rclb $0x2,%r12b,%r12b +shl %r31,%r31 +shlb %r31b,%r31b +shl $0x2,%r12,%r12 +shlb $0x2,%r12b,%r12b +sar %r31,%r31 +sarb %r31b,%r31b +sar $0x2,%r12,%r12 +sarb $0x2,%r12b,%r12b +shl %r31,%r31 +shlb %r31b,%r31b +shl $0x2,%r12,%r12 +shlb $0x2,%r12b,%r12b +shr %r31,%r31 +shrb %r31b,%r31b +shr $0x2,%r12,%r12 +shrb $0x2,%r12b,%r12b +shld $0x1,%r12,(%rax),%r12 +shld $0x2,%r8,%r12,%r12 +shld %cl,%r9,(%rax),%r9 +shld %cl,%r12,%r16,%r16 +shld %cl,%r13,(%r19,%rax,4),%r13 +shrd $0x1,%r12,(%rax),%r12 +shrd $0x1,%r13,%r12,%r12 +shrd %cl,%r9,(%rax),%r9 +shrd %cl,%r12,%r16,%r16 +shrd %cl,%r13,(%r19,%rax,4),%r13 +adcx %r15,%r8,%r8 +adcx (%r15,%r31,1),%r8,%r8 +adcx %r8,%r9,%r8 +adox %r15,%r8,%r8 +adox (%r15,%r31,1),%r8,%r8 +adox %r8,%r9,%r8 +cmovo 0x90909090(%eax),%edx,%edx +cmovno 0x90909090(%eax),%edx,%edx +cmovb 0x90909090(%eax),%edx,%edx +cmovae 0x90909090(%eax),%edx,%edx +cmove 0x90909090(%eax),%edx,%edx +cmovne 0x90909090(%eax),%edx,%edx +cmovbe 0x90909090(%eax),%edx,%edx +cmova 0x90909090(%eax),%edx,%edx +cmovs 0x90909090(%eax),%edx,%edx +cmovns 0x90909090(%eax),%edx,%edx +cmovp 0x90909090(%eax),%edx,%edx +cmovnp 0x90909090(%eax),%edx,%edx +cmovl 0x90909090(%eax),%edx,%edx +cmovge 0x90909090(%eax),%edx,%edx +cmovle 0x90909090(%eax),%edx,%edx +cmovg 0x90909090(%eax),%edx,%edx +imul 0x90909(%eax),%edx,%edx +imul 0x909(%rax,%r31,8),%rdx,%rdx +imul %rdx,%rax,%rdx diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp index 07cb716d2a5..38fbed8a388 100644 --- a/gas/testsuite/gas/i386/x86-64.exp +++ b/gas/testsuite/gas/i386/x86-64.exp @@ -549,6 +549,7 @@ run_dump_test "x86-64-optimize-6" run_list_test "x86-64-optimize-7a" "-I${srcdir}/$subdir -march=+noavx -al" run_dump_test "x86-64-optimize-7b" run_list_test "x86-64-optimize-8" "-I${srcdir}/$subdir -march=+noavx2 -al" +run_dump_test "x86-64-apx-ndd-optimize" run_dump_test "x86-64-align-branch-1a" run_dump_test "x86-64-align-branch-1b" run_dump_test "x86-64-align-branch-1c" diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl index f68940b9b4a..3c255e79a91 100644 --- a/opcodes/i386-opc.tbl +++ b/opcodes/i386-opc.tbl @@ -145,6 +145,8 @@ // The EVEX purpose of StaticRounding appears only together with SAE. Re-use // the bit to mark commutative VEX encodings where swapping the source // operands may allow to switch from 3-byte to 2-byte VEX encoding. +// And re-use the bit to mark some NDD insns that swapping the source operands +// may allow to switch from 3 operands to 2 operands. #define C StaticRounding #define FP 387|287|8087 @@ -166,6 +168,10 @@ ### MARKER ### +// Please don't add a NDD insn which may be optimized to a REX2 insn before the +// mov. It may result that a good UB checker object the behavior +// "template->start - 1" at the end of match_template. + // Move instructions. mov, 0xa0, No64, D|W|CheckOperandSize|No_sSuf|No_qSuf, { Disp16|Disp32|Unspecified|Byte|Word|Dword, Acc|Byte|Word|Dword } mov, 0xa0, x64, D|W|CheckOperandSize|No_sSuf, { Disp64|Unspecified|Byte|Word|Dword|Qword, Acc|Byte|Word|Dword|Qword } @@ -295,7 +301,7 @@ add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg3 add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } -add, 0x0, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } +add, 0x0, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64} @@ -339,7 +345,7 @@ and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8| and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } -and, 0x20, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } +and, 0x20, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } and, 0x80/4, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } @@ -347,7 +353,7 @@ or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Re or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } -or, 0x8, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } +or, 0x8, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } or, 0x80/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } @@ -355,7 +361,7 @@ xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8| xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } -xor, 0x30, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } +xor, 0x30, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } xor, 0x80/6, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } @@ -369,7 +375,7 @@ adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|R adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } adc, 0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } -adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } +adc, 0x10, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } @@ -412,7 +418,7 @@ cqto, 0x99, x64, Size64|NoSuf, {} mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64 } -imul, 0xaf, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 } +imul, 0xaf, APX_F, C|Modrm|CheckOperandSize|No_bSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 } imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } // imul with 2 operands mimics imul with 3 by putting the register in @@ -2126,10 +2132,10 @@ xstore, 0xfa7c0, PadLock, NoSuf|RepPrefixOk, {} // Multy-precision Add Carry, rdseed instructions. adcx, 0x660f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } -adcx, 0x6666, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } +adcx, 0x6666, ADX|APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } adox, 0xf30f38f6, ADX, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 } -adox, 0xf366, ADX|APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } +adox, 0xf366, ADX|APX_F, C|Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|VexVVVVDest|EVex128|EVexMap4, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 } rdseed, 0xfc7/7, RdSeed, Modrm|NoSuf, { Reg16|Reg32|Reg64 } // SMAP instructions. -- 2.31.1