From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [IPv6:2607:f8b0:4864:20::333]) by sourceware.org (Postfix) with ESMTPS id B04C33858402 for ; Thu, 28 Dec 2023 01:56:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B04C33858402 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B04C33858402 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::333 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703728608; cv=none; b=Ut7EWupTNG2xVfmrKeEunVTVCgImPC9EF4kSLV9n+KNqY8ntshZBx7cgtGMCHOs6V5eO40K6+2ERx3Ci5DvrxTyQJi/URLAM4oucYOBNCXf5AxKgYcAxlcgPj5oYjHg9s39BieuXQcE9WUGouusz/PVRxF2Lo5mrUli+Sa8wA5Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703728608; c=relaxed/simple; bh=zTUnYxcDo9PbPAFA7DBT55Iy/0xQq8m93fJPF9NdIXQ=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=MQbKhySX1gnxuT1DIQDoXd8IgZBlMmgz1ZqpQD3/3d/bmpODj9l0iQ53YF662+BNiwSRCE15UhAqw6rF7W+g1in7eL2XInPDIgEl4mMF6DLHRt2pTIBM4fIX6cxA7w53E01swEMDziMJjwArD08RgNXb3gh3mlf6RrR7AH+F/iY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ot1-x333.google.com with SMTP id 46e09a7af769-6dc02ab3cc9so806475a34.3 for ; Wed, 27 Dec 2023 17:56:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703728585; x=1704333385; darn=sourceware.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=MUcp7bbMUzmp559wJxdpwplX15mcgVHwo03p+of9ddQ=; b=BkUw1JCUBU7dFiXGbCfrHhA4jp81FPtfxh7EmUzSaSgW/QSvCESe4Tvykc2F8VpgCv jpf+p4codbd2vULwKIO3qmatVztegV1Q0qDkGeVgrKzmEaAcWxlo1fFoA5U/N1QQTh7v YhBdQm7ON20nXEeGZrRlA2WJaxuputI/lxxDPGaisdbhkKMHb/hgxhDCWG5rFZfrRxw7 LnfnaolgOplGB+qvZAU2wbErXY9LAqqkqvVyUyiAvt8QLlC5qUHpVyQoqndmKVcaWZmY VcYTDDvwMTzemgzVqsx+KbBjPmetViPur5vRGmdMiFMSergAoyCmO1r289eRgvdCJS63 pPzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703728585; x=1704333385; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MUcp7bbMUzmp559wJxdpwplX15mcgVHwo03p+of9ddQ=; b=Hk76L32LaQK3fiUV5MBsRwFEADAduC4Q6c5mqMC5L8339aASLgcgW7wsGAf8iYo0Cz e5qLHb8uPT5zvr/WvyY1F0UYWo/plqE2x3fZ1SoPbKdyxf4YUeyIryPXWLeZeQoa7EXU +GEvLPsrRuF088WB4WZALBSrjhCUQnjtn4xbmi4/B21z3ZvBC4zeWZm+EK/sPmFZP0qc DezQ75bWd7qHiVwpqE4EFwcX1AIPMa+r1eMNwjt6RzSbx/xoTFIkW7H9JE8Z3g2+RAlp g+gnmIr7SwuScawrfJvLo5NjcsWf/s/IUyWCX/mBc8AqBxACwDMi+nS26CLGEqmwsSak lLrQ== X-Gm-Message-State: AOJu0YwCVTEmiZhm7qVwOCevM6h9LMLmmIqMgVWojwmh6pz3CbH/MtMU PCH9qxCoZXBDg3GPjBHeHyw= X-Google-Smtp-Source: AGHT+IFirxFzVFab3+6xQGEEHBTrDhZpDCEeW19Av35vyAz7xK8OOK6hPKCRSG/gSGLzsN3+knTWmQ== X-Received: by 2002:a05:6358:2c93:b0:175:b87:1a3c with SMTP id l19-20020a0563582c9300b001750b871a3cmr1009682rwm.6.1703728584757; Wed, 27 Dec 2023 17:56:24 -0800 (PST) Received: from gnu-cfl-3.localdomain ([172.56.169.119]) by smtp.gmail.com with ESMTPSA id 19-20020a17090a005300b0028be216595csm15126836pjb.4.2023.12.27.17.56.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Dec 2023 17:56:24 -0800 (PST) Received: by gnu-cfl-3.localdomain (Postfix, from userid 1000) id CDE227403E1; Wed, 27 Dec 2023 17:56:22 -0800 (PST) Date: Wed, 27 Dec 2023 17:56:22 -0800 From: "H.J. Lu" To: "Cui, Lili" Cc: binutils@sourceware.org, jbeulich@suse.com, "Hu, Lin1" Subject: Re: [PATCH V5 8/9] Support APX NDD optimized encoding. Message-ID: References: <20231228012714.2989658-1-lili.cui@intel.com> <20231228012714.2989658-9-lili.cui@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231228012714.2989658-9-lili.cui@intel.com> X-Spam-Status: No, score=-3024.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Dec 28, 2023 at 01:27:13AM +0000, Cui, Lili wrote: > From: "Hu, Lin1" > > This patch aims to optimize: > > add %r16, %r15, %r15 -> add %r16, %r15 > > gas/ChangeLog: > > * config/tc-i386.c (check_Rex_required): New function. > (can_convert_NDD_to_legacy): Ditto. > (match_template): If we can optimzie APX NDD insns, so rematch > template. > * testsuite/gas/i386/x86-64.exp: Add test. > * testsuite/gas/i386/x86-64-apx-ndd-optimize.d: New test. > * testsuite/gas/i386/x86-64-apx-ndd-optimize.s: Ditto. > --- > gas/config/tc-i386.c | 104 ++++++++++++++ > .../gas/i386/x86-64-apx-ndd-optimize.d | 132 ++++++++++++++++++ > .../gas/i386/x86-64-apx-ndd-optimize.s | 125 +++++++++++++++++ > gas/testsuite/gas/i386/x86-64.exp | 1 + > 4 files changed, 362 insertions(+) > create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d > create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s > > diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c > index 640c6511f20..3b0ba41cc72 100644 > --- a/gas/config/tc-i386.c > +++ b/gas/config/tc-i386.c > @@ -7212,6 +7212,56 @@ check_APX_operands (const insn_template *t) > return 0; > } > > +/* Check if the instruction use the REX registers or REX prefix. */ > +static bool > +check_Rex_required (void) > +{ > + for (unsigned int op = 0; op < i.operands; op++) > + { > + if (i.types[op].bitfield.class != Reg) > + continue; > + > + if (i.op[op].regs->reg_flags & (RegRex | RegRex64)) > + return true; > + } > + > + if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64))) > + || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64)))) > + return true; > + > + /* Check pseudo prefix {rex} are valid. */ > + return i.rex_encoding; > +} > + > +/* Optimize APX NDD insns to legacy insns. */ > +static unsigned int > +can_convert_NDD_to_legacy (const insn_template *t) > +{ > + unsigned int match_dest_op = ~0; > + > + if (!i.tm.opcode_modifier.nf > + && i.reg_operands >= 2) > + { > + unsigned int dest = i.operands - 1; > + unsigned int src1 = i.operands - 2; > + unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0; > + > + if (i.types[src1].bitfield.class == Reg > + && i.op[src1].regs == i.op[dest].regs) > + match_dest_op = src1; > + /* If the first operand is the same as the third operand, > + these instructions need to support the ability to commutative > + the first two operands and still not change the semantics in order > + to be optimized. */ > + else if (optimize > 1 > + && t->opcode_modifier.commutative > + && i.types[src2].bitfield.class == Reg > + && i.op[src2].regs == i.op[dest].regs) > + match_dest_op = src2; > + } > + return match_dest_op; > +} > + > /* Helper function for the progress() macro in match_template(). */ > static INLINE enum i386_error progress (enum i386_error new, > enum i386_error last, > @@ -7754,6 +7804,60 @@ match_template (char mnem_suffix) > i.memshift = memshift; > } > > + /* If we can optimize a NDD insn to legacy insn, like > + add %r16, %r8, %r8 -> add %r16, %r8, > + add %r8, %r16, %r8 -> add %r16, %r8, then rematch template. > + Note that the semantics have not been changed. */ > + if (optimize > + && !i.no_optimize > + && i.vec_encoding != vex_encoding_evex > + && t + 1 < current_templates.end > + && !t[1].opcode_modifier.evex > + && t[1].opcode_space <= SPACE_0F38 > + && t->opcode_modifier.vexvvvv == VexVVVV_DST > + && (i.types[i.operands - 1].bitfield.dword > + || i.types[i.operands - 1].bitfield.qword)) > + { > + unsigned int match_dest_op = can_convert_NDD_to_legacy (t); > + > + if (match_dest_op != (unsigned int) ~0) > + { > + size_match = true; > + /* We ensure that the next template has the same input > + operands as the original matching template by the first > + opernd (ATT). To avoid someone support new NDD insns and > + put it in the wrong position. */ > + overlap0 = operand_type_and (i.types[0], > + t[1].operand_types[0]); > + if (t->opcode_modifier.d) > + overlap1 = operand_type_and (i.types[0], > + t[1].operand_types[1]); > + if (!operand_type_match (overlap0, i.types[0]) > + && (!t->opcode_modifier.d > + || !operand_type_match (overlap1, i.types[0]))) > + size_match = false; > + > + if (size_match > + && (t[1].opcode_space <= SPACE_0F > + /* Some non-legacy-map0/1 insns can be shorter when > + legacy-encoded and when no REX prefix is required. */ > + || (!check_EgprOperands (t + 1) > + && !check_Rex_required () > + && !i.op[i.operands - 1].regs->reg_type.bitfield.qword))) > + { > + if (i.operands > 2 && match_dest_op == i.operands - 3) > + swap_2_operands (match_dest_op, i.operands - 2); > + > + --i.operands; > + --i.reg_operands; > + > + specific_error = progress (internal_error); > + continue; > + } > + > + } > + } > + > /* We've found a match; break out of loop. */ > break; > } > diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d > new file mode 100644 > index 00000000000..48f0f1ceee3 > --- /dev/null > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d > @@ -0,0 +1,132 @@ > +#as: -Os > +#objdump: -drw > +#name: x86-64 APX NDD optimized encoding > +#source: x86-64-apx-ndd-optimize.s > + > +.*: +file format .* > + > + > +Disassembly of section .text: > + > +0+ <_start>: > +\s*[a-f0-9]+:\s*d5 4d 01 f8 add %r31,%r8 > +\s*[a-f0-9]+:\s*62 44 3c 18 00 f8 add %r31b,%r8b,%r8b > +\s*[a-f0-9]+:\s*d5 4d 01 f8 add %r31,%r8 > +\s*[a-f0-9]+:\s*d5 1d 03 c7 add %r31,%r8 > +\s*[a-f0-9]+:\s*d5 4d 03 38 add \(%r8\),%r31 > +\s*[a-f0-9]+:\s*d5 1d 03 07 add \(%r31\),%r8 > +\s*[a-f0-9]+:\s*49 81 c7 33 44 34 12 add \$0x12344433,%r15 > +\s*[a-f0-9]+:\s*49 81 c0 11 22 33 f4 add \$0xfffffffff4332211,%r8 > +\s*[a-f0-9]+:\s*d5 19 ff c7 inc %r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 fe c7 inc %r31b,%r31b > +\s*[a-f0-9]+:\s*d5 1c 29 f9 sub %r15,%r17 > +\s*[a-f0-9]+:\s*62 7c 74 10 28 f9 sub %r15b,%r17b,%r17b > +\s*[a-f0-9]+:\s*62 54 84 18 29 38 sub %r15,\(%r8\),%r15 > +\s*[a-f0-9]+:\s*d5 49 2b 04 07 sub \(%r15,%rax,1\),%r16 > +\s*[a-f0-9]+:\s*d5 19 81 ee 34 12 00 00 sub \$0x1234,%r30 > +\s*[a-f0-9]+:\s*d5 18 ff c9 dec %r17 > +\s*[a-f0-9]+:\s*62 fc 74 10 fe c9 dec %r17b,%r17b > +\s*[a-f0-9]+:\s*d5 1c 19 f9 sbb %r15,%r17 > +\s*[a-f0-9]+:\s*62 7c 74 10 18 f9 sbb %r15b,%r17b,%r17b > +\s*[a-f0-9]+:\s*62 54 84 18 19 38 sbb %r15,\(%r8\),%r15 > +\s*[a-f0-9]+:\s*d5 49 1b 04 07 sbb \(%r15,%rax,1\),%r16 > +\s*[a-f0-9]+:\s*d5 19 81 de 34 12 00 00 sbb \$0x1234,%r30 > +\s*[a-f0-9]+:\s*d5 1c 21 f9 and %r15,%r17 > +\s*[a-f0-9]+:\s*62 7c 74 10 20 f9 and %r15b,%r17b,%r17b > +\s*[a-f0-9]+:\s*4d 23 38 and \(%r8\),%r15 > +\s*[a-f0-9]+:\s*d5 49 23 04 07 and \(%r15,%rax,1\),%r16 > +\s*[a-f0-9]+:\s*d5 11 81 e6 34 12 00 00 and \$0x1234,%r30d > +\s*[a-f0-9]+:\s*d5 1c 09 f9 or %r15,%r17 > +\s*[a-f0-9]+:\s*62 7c 74 10 08 f9 or %r15b,%r17b,%r17b > +\s*[a-f0-9]+:\s*4d 0b 38 or \(%r8\),%r15 > +\s*[a-f0-9]+:\s*d5 49 0b 04 07 or \(%r15,%rax,1\),%r16 > +\s*[a-f0-9]+:\s*d5 19 81 ce 34 12 00 00 or \$0x1234,%r30 > +\s*[a-f0-9]+:\s*d5 1c 31 f9 xor %r15,%r17 > +\s*[a-f0-9]+:\s*62 7c 74 10 30 f9 xor %r15b,%r17b,%r17b > +\s*[a-f0-9]+:\s*4d 33 38 xor \(%r8\),%r15 > +\s*[a-f0-9]+:\s*d5 49 33 04 07 xor \(%r15,%rax,1\),%r16 > +\s*[a-f0-9]+:\s*d5 19 81 f6 34 12 00 00 xor \$0x1234,%r30 > +\s*[a-f0-9]+:\s*d5 1c 11 f9 adc %r15,%r17 > +\s*[a-f0-9]+:\s*62 7c 74 10 10 f9 adc %r15b,%r17b,%r17b > +\s*[a-f0-9]+:\s*4d 13 38 adc \(%r8\),%r15 > +\s*[a-f0-9]+:\s*d5 49 13 04 07 adc \(%r15,%rax,1\),%r16 > +\s*[a-f0-9]+:\s*d5 19 81 d6 34 12 00 00 adc \$0x1234,%r30 > +\s*[a-f0-9]+:\s*d5 18 f7 d9 neg %r17 > +\s*[a-f0-9]+:\s*62 fc 74 10 f6 d9 neg %r17b,%r17b > +\s*[a-f0-9]+:\s*d5 18 f7 d1 not %r17 > +\s*[a-f0-9]+:\s*62 fc 74 10 f6 d1 not %r17b,%r17b > +\s*[a-f0-9]+:\s*67 0f af 90 09 09 09 00 imul 0x90909\(%eax\),%edx > +\s*[a-f0-9]+:\s*d5 aa af 94 f8 09 09 00 00 imul 0x909\(%rax,%r31,8\),%rdx > +\s*[a-f0-9]+:\s*48 0f af d0 imul %rax,%rdx > +\s*[a-f0-9]+:\s*d5 19 d1 c7 rol \$1,%r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 d0 c7 rol \$1,%r31b,%r31b > +\s*[a-f0-9]+:\s*49 c1 c4 02 rol \$0x2,%r12 > +\s*[a-f0-9]+:\s*62 d4 1c 18 c0 c4 02 rol \$0x2,%r12b,%r12b > +\s*[a-f0-9]+:\s*d5 19 d1 cf ror \$1,%r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 d0 cf ror \$1,%r31b,%r31b > +\s*[a-f0-9]+:\s*49 c1 cc 02 ror \$0x2,%r12 > +\s*[a-f0-9]+:\s*62 d4 1c 18 c0 cc 02 ror \$0x2,%r12b,%r12b > +\s*[a-f0-9]+:\s*d5 19 d1 d7 rcl \$1,%r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 d0 d7 rcl \$1,%r31b,%r31b > +\s*[a-f0-9]+:\s*49 c1 d4 02 rcl \$0x2,%r12 > +\s*[a-f0-9]+:\s*62 d4 1c 18 c0 d4 02 rcl \$0x2,%r12b,%r12b > +\s*[a-f0-9]+:\s*d5 19 d1 df rcr \$1,%r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 d0 df rcr \$1,%r31b,%r31b > +\s*[a-f0-9]+:\s*49 c1 dc 02 rcr \$0x2,%r12 > +\s*[a-f0-9]+:\s*62 d4 1c 18 c0 dc 02 rcr \$0x2,%r12b,%r12b > +\s*[a-f0-9]+:\s*d5 19 d1 e7 shl \$1,%r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 d0 e7 shl \$1,%r31b,%r31b > +\s*[a-f0-9]+:\s*49 c1 e4 02 shl \$0x2,%r12 > +\s*[a-f0-9]+:\s*62 d4 1c 18 c0 e4 02 shl \$0x2,%r12b,%r12b > +\s*[a-f0-9]+:\s*d5 19 d1 e7 shl \$1,%r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 d0 e7 shl \$1,%r31b,%r31b > +\s*[a-f0-9]+:\s*49 c1 e4 02 shl \$0x2,%r12 > +\s*[a-f0-9]+:\s*62 d4 1c 18 c0 e4 02 shl \$0x2,%r12b,%r12b > +\s*[a-f0-9]+:\s*d5 19 d1 ef shr \$1,%r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 d0 ef shr \$1,%r31b,%r31b > +\s*[a-f0-9]+:\s*49 c1 ec 02 shr \$0x2,%r12 > +\s*[a-f0-9]+:\s*62 d4 1c 18 c0 ec 02 shr \$0x2,%r12b,%r12b > +\s*[a-f0-9]+:\s*d5 19 d1 ff sar \$1,%r31 > +\s*[a-f0-9]+:\s*62 dc 04 10 d0 ff sar \$1,%r31b,%r31b > +\s*[a-f0-9]+:\s*49 c1 fc 02 sar \$0x2,%r12 > +\s*[a-f0-9]+:\s*62 d4 1c 18 c0 fc 02 sar \$0x2,%r12b,%r12b > +\s*[a-f0-9]+:\s*62 74 9c 18 24 20 01 shld \$0x1,%r12,\(%rax\),%r12 > +\s*[a-f0-9]+:\s*4d 0f a4 c4 02 shld \$0x2,%r8,%r12 > +\s*[a-f0-9]+:\s*62 54 bc 18 24 c4 02 shld \$0x2,%r8,%r12,%r8 > +\s*[a-f0-9]+:\s*62 74 b4 18 a5 08 shld %cl,%r9,\(%rax\),%r9 > +\s*[a-f0-9]+:\s*d5 9c a5 e0 shld %cl,%r12,%r16 > +\s*[a-f0-9]+:\s*62 7c 9c 18 a5 e0 shld %cl,%r12,%r16,%r12 > +\s*[a-f0-9]+:\s*62 74 9c 18 2c 20 01 shrd \$0x1,%r12,\(%rax\),%r12 > +\s*[a-f0-9]+:\s*4d 0f ac ec 01 shrd \$0x1,%r13,%r12 > +\s*[a-f0-9]+:\s*62 54 94 18 2c ec 01 shrd \$0x1,%r13,%r12,%r13 > +\s*[a-f0-9]+:\s*62 74 b4 18 ad 08 shrd %cl,%r9,\(%rax\),%r9 > +\s*[a-f0-9]+:\s*d5 9c ad e0 shrd %cl,%r12,%r16 > +\s*[a-f0-9]+:\s*62 7c 9c 18 ad e0 shrd %cl,%r12,%r16,%r12 > +\s*[a-f0-9]+:\s*67 0f 40 90 90 90 90 90 cmovo -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 41 90 90 90 90 90 cmovno -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 42 90 90 90 90 90 cmovb -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 43 90 90 90 90 90 cmovae -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 44 90 90 90 90 90 cmove -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 45 90 90 90 90 90 cmovne -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 46 90 90 90 90 90 cmovbe -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 47 90 90 90 90 90 cmova -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 48 90 90 90 90 90 cmovs -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 49 90 90 90 90 90 cmovns -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 4a 90 90 90 90 90 cmovp -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 4b 90 90 90 90 90 cmovnp -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 4c 90 90 90 90 90 cmovl -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 4d 90 90 90 90 90 cmovge -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 4e 90 90 90 90 90 cmovle -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*67 0f 4f 90 90 90 90 90 cmovg -0x6f6f6f70\(%eax\),%edx > +\s*[a-f0-9]+:\s*66 0f 38 f6 c3 adcx %ebx,%eax > +\s*[a-f0-9]+:\s*66 0f 38 f6 c3 adcx %ebx,%eax > +\s*[a-f0-9]+:\s*62 f4 fd 18 66 c3 adcx %rbx,%rax,%rax > +\s*[a-f0-9]+:\s*62 74 3d 18 66 c0 adcx %eax,%r8d,%r8d > +\s*[a-f0-9]+:\s*62 d4 7d 18 66 c7 adcx %r15d,%eax,%eax > +\s*[a-f0-9]+:\s*67 66 0f 38 f6 04 0a adcx \(%edx,%ecx,1\),%eax > +\s*[a-f0-9]+:\s*f3 0f 38 f6 c3 adox %ebx,%eax > +\s*[a-f0-9]+:\s*f3 0f 38 f6 c3 adox %ebx,%eax > +\s*[a-f0-9]+:\s*62 f4 fe 18 66 c3 adox %rbx,%rax,%rax > +\s*[a-f0-9]+:\s*62 74 3e 18 66 c0 adox %eax,%r8d,%r8d > +\s*[a-f0-9]+:\s*62 d4 7e 18 66 c7 adox %r15d,%eax,%eax > +\s*[a-f0-9]+:\s*67 f3 0f 38 f6 04 0a adox \(%edx,%ecx,1\),%eax > diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s > new file mode 100644 > index 00000000000..6ffdf5a6390 > --- /dev/null > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s > @@ -0,0 +1,125 @@ > +# Check 64bit APX NDD instructions with optimized encoding > + > + .text > +_start: > +add %r31,%r8,%r8 > +addb %r31b,%r8b,%r8b > +{store} add %r31,%r8,%r8 > +{load} add %r31,%r8,%r8 > +add %r31,(%r8),%r31 > +add (%r31),%r8,%r8 > +add $0x12344433,%r15,%r15 > +add $0xfffffffff4332211,%r8,%r8 > +inc %r31,%r31 > +incb %r31b,%r31b > +sub %r15,%r17,%r17 > +subb %r15b,%r17b,%r17b > +sub %r15,(%r8),%r15 > +sub (%r15,%rax,1),%r16,%r16 > +sub $0x1234,%r30,%r30 > +dec %r17,%r17 > +decb %r17b,%r17b > +sbb %r15,%r17,%r17 > +sbbb %r15b,%r17b,%r17b > +sbb %r15,(%r8),%r15 > +sbb (%r15,%rax,1),%r16,%r16 > +sbb $0x1234,%r30,%r30 > +and %r15,%r17,%r17 > +andb %r15b,%r17b,%r17b > +and %r15,(%r8),%r15 > +and (%r15,%rax,1),%r16,%r16 > +and $0x1234,%r30,%r30 > +or %r15,%r17,%r17 > +orb %r15b,%r17b,%r17b > +or %r15,(%r8),%r15 > +or (%r15,%rax,1),%r16,%r16 > +or $0x1234,%r30,%r30 > +xor %r15,%r17,%r17 > +xorb %r15b,%r17b,%r17b > +xor %r15,(%r8),%r15 > +xor (%r15,%rax,1),%r16,%r16 > +xor $0x1234,%r30,%r30 > +adc %r15,%r17,%r17 > +adcb %r15b,%r17b,%r17b > +adc %r15,(%r8),%r15 > +adc (%r15,%rax,1),%r16,%r16 > +adc $0x1234,%r30,%r30 > +neg %r17,%r17 > +negb %r17b,%r17b > +not %r17,%r17 > +notb %r17b,%r17b > +imul 0x90909(%eax),%edx,%edx > +imul 0x909(%rax,%r31,8),%rdx,%rdx > +imul %rdx,%rax,%rdx > +rol $0x1,%r31,%r31 > +rolb $0x1,%r31b,%r31b > +rol $0x2,%r12,%r12 > +rolb $0x2,%r12b,%r12b > +ror $0x1,%r31,%r31 > +rorb $0x1,%r31b,%r31b > +ror $0x2,%r12,%r12 > +rorb $0x2,%r12b,%r12b > +rcl $0x1,%r31,%r31 > +rclb $0x1,%r31b,%r31b > +rcl $0x2,%r12,%r12 > +rclb $0x2,%r12b,%r12b > +rcr $0x1,%r31,%r31 > +rcrb $0x1,%r31b,%r31b > +rcr $0x2,%r12,%r12 > +rcrb $0x2,%r12b,%r12b > +sal $0x1,%r31,%r31 > +salb $0x1,%r31b,%r31b > +sal $0x2,%r12,%r12 > +salb $0x2,%r12b,%r12b > +shl $0x1,%r31,%r31 > +shlb $0x1,%r31b,%r31b > +shl $0x2,%r12,%r12 > +shlb $0x2,%r12b,%r12b > +shr $0x1,%r31,%r31 > +shrb $0x1,%r31b,%r31b > +shr $0x2,%r12,%r12 > +shrb $0x2,%r12b,%r12b > +sar $0x1,%r31,%r31 > +sarb $0x1,%r31b,%r31b > +sar $0x2,%r12,%r12 > +sarb $0x2,%r12b,%r12b > +shld $0x1,%r12,(%rax),%r12 > +shld $0x2,%r8,%r12,%r12 > +shld $0x2,%r8,%r12,%r8 > +shld %cl,%r9,(%rax),%r9 > +shld %cl,%r12,%r16,%r16 > +shld %cl,%r12,%r16,%r12 > +shrd $0x1,%r12,(%rax),%r12 > +shrd $0x1,%r13,%r12,%r12 > +shrd $0x1,%r13,%r12,%r13 > +shrd %cl,%r9,(%rax),%r9 > +shrd %cl,%r12,%r16,%r16 > +shrd %cl,%r12,%r16,%r12 > +cmovo 0x90909090(%eax),%edx,%edx > +cmovno 0x90909090(%eax),%edx,%edx > +cmovb 0x90909090(%eax),%edx,%edx > +cmovae 0x90909090(%eax),%edx,%edx > +cmove 0x90909090(%eax),%edx,%edx > +cmovne 0x90909090(%eax),%edx,%edx > +cmovbe 0x90909090(%eax),%edx,%edx > +cmova 0x90909090(%eax),%edx,%edx > +cmovs 0x90909090(%eax),%edx,%edx > +cmovns 0x90909090(%eax),%edx,%edx > +cmovp 0x90909090(%eax),%edx,%edx > +cmovnp 0x90909090(%eax),%edx,%edx > +cmovl 0x90909090(%eax),%edx,%edx > +cmovge 0x90909090(%eax),%edx,%edx > +cmovle 0x90909090(%eax),%edx,%edx > +cmovg 0x90909090(%eax),%edx,%edx > +adcx %ebx,%eax,%eax > +adcx %eax,%ebx,%eax > +adcx %rbx,%rax,%rax > +adcx %eax,%r8d,%r8d > +adcx %r15d,%eax,%eax > +adcx (%edx,%ecx,1),%eax,%eax > +adox %ebx,%eax,%eax > +adox %eax,%ebx,%eax > +adox %rbx,%rax,%rax > +adox %eax,%r8d,%r8d > +adox %r15d,%eax,%eax > +adox (%edx,%ecx,1),%eax,%eax > diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp > index 1b13c52454e..2ba4c49417a 100644 > --- a/gas/testsuite/gas/i386/x86-64.exp > +++ b/gas/testsuite/gas/i386/x86-64.exp > @@ -561,6 +561,7 @@ run_dump_test "x86-64-optimize-6" > run_list_test "x86-64-optimize-7a" "-I${srcdir}/$subdir -march=+noavx -al" > run_dump_test "x86-64-optimize-7b" > run_list_test "x86-64-optimize-8" "-I${srcdir}/$subdir -march=+noavx2 -al" > +run_dump_test "x86-64-apx-ndd-optimize" > run_dump_test "x86-64-align-branch-1a" > run_dump_test "x86-64-align-branch-1b" > run_dump_test "x86-64-align-branch-1c" > -- > 2.25.1 > OK. Thanks. H.J.