From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id 7B8953858C20 for ; Fri, 8 Dec 2023 14:13:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7B8953858C20 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7B8953858C20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::334 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702044797; cv=none; b=lAeStyTDxNEbWoCBKGFVEXilL2QHdmGCkZyX/A6GXYshJhwORTxiRSoEbdJKwRkdxGaRkP4baf4RLVkdtS2dPNG0a2Ndm4KKxN12CBc5Yi3pM5ny2EqRyh6O0TqjMra28UTKbhVNPOQT7oV30o+Om1EHElC3E9fiIe8Apva/gVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702044797; c=relaxed/simple; bh=kl+Mw4G36ni2L32JNkXWvCN6z7PcQW+kEsdZV9DTOuY=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=h/zRtZIi9Tbr+5SIbWq1LHpJVKT5eOQ/zEXRVev07cB/x7gjgH7qdiwEXlWzeGKJtb3juDS7GcUmMYqdObiiuIpimZiiy3VnW6e4KfUSk7aoZfZDrjfNYW8wGhrCevg8dMewhv/JF3AxwsU09l0RQBPv6kPQg9iG45U6f0a3jrk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-40c09d62b70so24830005e9.1 for ; Fri, 08 Dec 2023 06:13:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1702044792; x=1702649592; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Bvj101yPiPSl3xld0ZfOmGl8VHfv8XR6J3CZzYRg/N4=; b=acCM4PmuwtLzimCPoyX4gxWbSmhSYmx8gL1zp3bnAnZMSq6SsDT6vdxldZW1Y7wO3l KNhFRFi+rdr9l5/5fKcrq1d7vqBl9231b2CX9qRM92lTxLsJDSTAUOcS6i50upzwWMiH biM1lGT/HfBYFB55Lx0BJdPY7soTbrLoM8iAk+pPadUvMTPnMUznq3Olq7xvnHXT1Ja6 oQXc1PkLDe51WYZF0/63ihUuLKrIc5rHViXSq6wrnRzLR4Nk8X27Amr+OM3MyUXDpoie pQkXL79m/Ovxh8Uy6tCwSMdIX9J92ipZQ+2wFUd2gQ/YdQ1wJzeGNnVPRj65n1y5ELEl kG7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702044792; x=1702649592; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Bvj101yPiPSl3xld0ZfOmGl8VHfv8XR6J3CZzYRg/N4=; b=PPi3gii1VlIc0S3z0MCQyYSyJxVdnTkzS6WHqifcy8uraiTMrY2E10hhMSC4wBhN+z tks1tPxUo8oDe4zDppESHfx3gi+1S0/+mkSW+VQm0ec2KAde9Ns/lssxabT16whH0iy/ 9IQ2qlS0SWRLhevWyP/t1coj/Ek9DfWau6uUmHRuq7N9WbX0dn2vCZ65NTom0qy32Mk/ WoqKE5QWpEogEIhUVYRRyAKeCXgRd6AEK1TzlTgym4j2mqLRp++yFmxHkeovigsgnKIN OcSbDGfrwBxTtPocHHX00G/rmjRk23Dpf6frQM7/sK5jnATH7jorGRt80K2oCMy1SmzB q63A== X-Gm-Message-State: AOJu0YxwmR7cT+VmGfYe8IV0EcM1KCQjcpsMAskNo+Wuh+g1WfJ5RaJ8 V1JKaAU0G6qpHxRVDLLdsWSH X-Google-Smtp-Source: AGHT+IHQT3i092ww4bhjCO9laU2QSd1iYqrmSWTv1W+HAfKQ2bJAICjbaG8ieAAzG9/myo4zNzqS+A== X-Received: by 2002:a05:600c:2d86:b0:40c:35d4:6063 with SMTP id i6-20020a05600c2d8600b0040c35d46063mr12552wmg.188.1702044792163; Fri, 08 Dec 2023 06:13:12 -0800 (PST) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id f11-20020a5d664b000000b003333e71ef9asm2102764wrw.115.2023.12.08.06.13.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Dec 2023 06:13:11 -0800 (PST) Message-ID: Date: Fri, 8 Dec 2023 15:12:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 6/9] Support APX NDD Content-Language: en-US To: "Cui, Lili" Cc: hongjiu.lu@intel.com, konglin1 , binutils@sourceware.org References: <20231124070213.3886483-1-lili.cui@intel.com> <20231124070213.3886483-6-lili.cui@intel.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <20231124070213.3886483-6-lili.cui@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3026.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 24.11.2023 08:02, Cui, Lili wrote: > @@ -8870,25 +8890,33 @@ build_modrm_byte (void) > || i.vec_encoding == vex_encoding_evex)); > } > > - for (v = source + 1; v < dest; ++v) > - if (v != reg_slot) > - break; > - if (v >= dest) > - v = ~0; > - if (i.tm.extension_opcode != None) > + if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST) > { > - if (dest != source) > - v = dest; > - dest = ~0; > + v = dest; > + dest-- ; Nit: Stray blank. > } > - gas_assert (source < dest); Starting from this line, do you really need to move that into the "else" branch? It looks to me as it it could stay here. (Maybe I'm wrong with the assertion itself, but ... > - if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES > - && source != op) ... this entire if() pretty surely can stay as is, as there are no templates with both DstVVVV and SwapSources afaict. (Thing is - as before - that it isn't easy to see that what is happening here is really just re-indentation. Iirc in an earlier version there actually were hidden changes.) If you want this moved as an optimization, please do so in a separate patch. > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d > @@ -27,4 +27,6 @@ Disassembly of section .text: > [ ]*[a-f0-9]+:[ ]+c8 ff ff ff[ ]+enter \$0xffff,\$0xff > [ ]*[a-f0-9]+:[ ]+67 62 f2 7c 18 f5[ ]+addr32 \(bad\) > [ ]*[a-f0-9]+:[ ]+0b ff[ ]+or %edi,%edi > +[ ]*[a-f0-9]+:[ ]+62 f4 fc 08 ff[ ]+\(bad\) > +[ ]*[a-f0-9]+:[ ]+d8[ ]+.byte 0xd8 > #pass > --- a/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s > +++ b/gas/testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s > @@ -26,3 +26,5 @@ _start: > #EVEX from VEX bzhi %ebx,%eax,%ecx EVEX.P[20](EVEX.b) == 1 (illegal value). > .insn EVEX.L0.NP.0f38.W0 0xf5, %eax ,(%ebx){1to8}, %ecx > .byte 0xff > + #{evex} inc %rax %rbx EVEX.vvvv' != 1111 && EVEX.ND = 0. > + .insn EVEX.L0.NP.M4.W1 0xff, %rax, %rbx I don't think this does what you want. In the .d file the 4 bits are all set. I think you mean something like .insn EVEX.L0.NP.M4.W1 0xff/0, %rcx, %rbx (i.e. ModR/M.reg specified as opcode extension _and_ the first operand not the accumulator). The reason disassembly fails for what you've used looks to be ModR/M.reg == 0b011 (resulting from the use of %rbx). (Also, nit: What's EVEX.vvvv' ? I.e. what's the ' there about?) > --- /dev/null > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s > @@ -0,0 +1,155 @@ > +# Check 64bit APX NDD instructions with evex prefix encoding > + > + .allow_index_reg > + .text > +_start: > + adc $0x1234,%ax,%r30w > + adc %r15b,%r17b,%r18b > + adc %r15d,(%r8),%r18d > + adc (%r15,%rax,1),%r16b,%r8b > + adc (%r15,%rax,1),%r16w,%r8w > + adcl $0x11,(%r19,%rax,4),%r20d > + adcx %r15d,%r8d,%r18d > + adcx (%r15,%r31,1),%r8 > + adcx (%r15,%r31,1),%r8d,%r18d > + add $0x1234,%ax,%r30w > + add $0x12344433,%r15,%r16 > + add $0x34,%r13b,%r17b > + add $0xfffffffff4332211,%rax,%r8 > + add %r31,%r8,%r16 > + add %r31,(%r8),%r16 > + add %r31,(%r8,%r16,8),%r16 > + add %r31b,%r8b,%r16b > + add %r31d,%r8d,%r16d > + add %r31w,%r8w,%r16w > + add (%r31),%r8,%r16 > + add 0x9090(%r31,%r16,1),%r8,%r16 > + addb %r31b,%r8b,%r16b > + addl %r31d,%r8d,%r16d > + addl $0x11,(%r19,%rax,4),%r20d > + addq %r31,%r8,%r16 > + addq $0x12344433,(%r15,%rcx,4),%r16 > + addw %r31w,%r8w,%r16w > + adox %r15d,%r8d,%r18d Nit: Inconsistent blank padding. > + {load} add %r31,%r8,%r16 > + {store} add %r31,%r8,%r16 > + adox (%r15,%r31,1),%r8 > + adox (%r15,%r31,1),%r8d,%r18d > + and $0x1234,%ax,%r30w > + and %r15b,%r17b,%r18b > + and %r15d,(%r8),%r18d > + and (%r15,%rax,1),%r16b,%r8b > + and (%r15,%rax,1),%r16w,%r8w > + andl $0x11,(%r19,%rax,4),%r20d > + cmova 0x90909090(%eax),%edx,%r8d > + cmovae 0x90909090(%eax),%edx,%r8d > + cmovb 0x90909090(%eax),%edx,%r8d > + cmovbe 0x90909090(%eax),%edx,%r8d > + cmove 0x90909090(%eax),%edx,%r8d > + cmovg 0x90909090(%eax),%edx,%r8d > + cmovge 0x90909090(%eax),%edx,%r8d > + cmovl 0x90909090(%eax),%edx,%r8d > + cmovle 0x90909090(%eax),%edx,%r8d > + cmovne 0x90909090(%eax),%edx,%r8d > + cmovno 0x90909090(%eax),%edx,%r8d > + cmovnp 0x90909090(%eax),%edx,%r8d > + cmovns 0x90909090(%eax),%edx,%r8d > + cmovo 0x90909090(%eax),%edx,%r8d > + cmovp 0x90909090(%eax),%edx,%r8d > + cmovs 0x90909090(%eax),%edx,%r8d > + dec %rax,%r17 > + decb (%r31,%r12,1),%r8b > + imul 0x909(%rax,%r31,8),%rdx,%r25 > + imul 0x90909(%eax),%edx,%r8d > + inc %r31,%r16 > + inc %r31,%r8 > + inc %rax,%rbx > + neg %rax,%r17 > + negb (%r31,%r12,1),%r8b > + not %rax,%r17 > + notb (%r31,%r12,1),%r8b > + or $0x1234,%ax,%r30w > + or %r15b,%r17b,%r18b > + or %r15d,(%r8),%r18d > + or (%r15,%rax,1),%r16b,%r8b > + or (%r15,%rax,1),%r16w,%r8w > + orl $0x11,(%r19,%rax,4),%r20d > + rcl $0x2,%r12b,%r31b > + rcl %cl,%r16b,%r8b > + rclb $0x1, (%rax),%r31b > + rcll $0x2,(%rax),%r31d > + rclw $0x1, (%rax),%r31w Nit: Would be nice if there consistently were or were not blanks after the commas. > --- a/opcodes/i386-opc.tbl > +++ b/opcodes/i386-opc.tbl > @@ -139,9 +139,13 @@ > #define Vsz256 Vsz=VSZ256 > #define Vsz512 Vsz=VSZ512 > > +#define DstVVVV VexVVVV=VexVVVV_DST > + > // The EVEX purpose of StaticRounding appears only together with SAE. Re-use > // the bit to mark commutative VEX encodings where swapping the source > // operands may allow to switch from 3-byte to 2-byte VEX encoding. > +// And re-use the bit to mark some NDD insns that swapping the source operands > +// may allow to switch from EVEX encoding to REX2 encoding. > #define C StaticRounding > > #define FP 387|287|8087 > @@ -288,26 +292,40 @@ std, 0xfd, 0, NoSuf, {} > sti, 0xfb, 0, NoSuf, {} > > // Arithmetic. > +add, 0x0, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } There is _still_ Byte|Word|Dword|Qword in here (and below), when I think I pointed out more than once before that in new templates such redundancy wants omitting. Since this isn't the first instance of earlier review comments not taken care of, may I please ask that you make reasonably sure that new versions aren't sent out like this? > add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > +add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64} > add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 } > +inc, 0xfe/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, {Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64} > inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > +sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Optimize|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, } Here and elsewhere, what's Optimize for? It not being there on other templates, it can't be for the EVEX->REX2 optimization? If there are further optimization plans, that's (again) something to mention in the description. Yet better would be if such attributes were added only when respective optimizations are actually introduced. Unlike e.g. NF, which would mean another bulk update if not added right away, new optimizations typically affect only a few templates at a time. > sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +sub, 0x83/5, APX_F, Modrm|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > sub, 0x2c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > +sub, 0x80/5, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } There are still only 3 new templates here (and also above for add, plus for other similar insns), when ... > dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 } > +dec, 0xfe/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > +sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +sbb, 0x18, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +sbb, 0x83/3, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > +sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > +sbb, 0x80/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } ... there are 6 new templates here. This is again an aspect I had pointed out before. You cannot defer the addition of the other 3 until the NF patch, as you want to make sure that with just this patch in place something both {evex} sbb %eax, %eax and {evex} sub %eax, %eax actually assemble, and to EVEX encodings. I can't see how that would work in the latter case without those further templates. The alternative is to also defer adding the 2-operand SBB templates (and any others you add here which don't use DstVVVV). > cmp, 0x38, 0, D|W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > cmp, 0x83/7, 0, Modrm|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > @@ -318,31 +336,50 @@ test, 0x84, 0, D|W|C|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64, R > test, 0xa8, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > test, 0xf6/0, 0, W|Modrm|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > +and, 0x20, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > and, 0x20, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +and, 0x83/4, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > and, 0x83/4, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock|Optimize, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > and, 0x24, 0, W|No_sSuf|Optimize, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > +and, 0x80/4, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > and, 0x80/4, 0, W|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > +or, 0x8, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > or, 0x8, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +or, 0x83/1, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > or, 0x83/1, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > or, 0xc, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > +or, 0x80/1, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > or, 0x80/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > +xor, 0x30, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|NF|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > xor, 0x30, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +xor, 0x83/6, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > xor, 0x34, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > +xor, 0x80/6, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > // clr with 1 operand is really xor with 2 operands. > clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 } Btw., for consistency this may also want accompanying with an EVEX counterpart. > +adc, 0x10, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > adc, 0x10, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +adc, 0x10, APX_F, D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +adc, 0x83/2, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > adc, 0x83/2, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > +adc, 0x83/2, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex } > adc, 0x14, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > +adc, 0x80/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > adc, 0x80/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +adc, 0x80/2, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > +neg, 0xf6/3, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > neg, 0xf6/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > + > +not, 0xf6/2, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > not, 0xf6/2, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +not, 0xf6/2, APX_F, W|Modrm|No_sSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > > aaa, 0x37, No64, NoSuf, {} > aas, 0x3f, No64, NoSuf, {} > @@ -375,6 +412,7 @@ cqto, 0x99, x64, Size64|NoSuf, {} > // These multiplies can only be selected with single operand forms. > mul, 0xf6/4, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > imul, 0xf6/5, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +imul, 0xaf, APX_F, C|Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64, Reg16|Reg32|Reg64 } Missing NF? > imul, 0xfaf, i386, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Unspecified|Word|Dword|Qword|BaseIndex, Reg16|Reg32|Reg64 } > imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > @@ -389,52 +427,98 @@ div, 0xf6/6, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte| > idiv, 0xf6/7, 0, W|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > idiv, 0xf6/7, 0, W|CheckOperandSize|Modrm|No_sSuf, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Acc|Byte|Word|Dword|Qword } > > +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > rol, 0xd0/0, 0, W|Modrm|No_sSuf, { Imm1, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +rol, 0xc0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > rol, 0xc0/0, i186, W|Modrm|No_sSuf, { Imm8|Imm8S, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +rol, 0xd2/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > rol, 0xd2/0, 0, W|Modrm|No_sSuf, { ShiftCount, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex } > +rol, 0xd0/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } Didn't we agree to avoid adding this (and its sibling) template, for the omitted shift count being ambiguous? Consider rol %cl, %al Is this a rotate by %cl, or a 1-bit NDD rotate? Jan