From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by sourceware.org (Postfix) with ESMTPS id CFA533858C52 for ; Thu, 29 Feb 2024 11:21:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CFA533858C52 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CFA533858C52 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::62b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709205682; cv=none; b=eN1y8Ok4wkuwMd0rTRgeM3IkU9/3u3BDJ/12YkSWmtyHwJY5IZgt/1xxkZLOUtgD6OPP0dBvXWMh4Sb140sjdtUhA6SWml5zRtl97vVN2rAX7OsGdyVLUwWfxoO9otIAotUAYF4uguQWnrPKbdl3hzYES/8fbbbBRSgX0LWYAEE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709205682; c=relaxed/simple; bh=9OmV4+fGNtBUJsx5oC+b5pV4ER4Jz7WXmlC54PfhaQI=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=CfPJQHSxd6FZDI0DJcvJJZV5J+XJ+J65wQ/RBendRb1NwkcfX8lLM77kqDqU23lJfPslIC1ejd/YPhblxephIJl2Rq/zG/KLbTzZJhGWHj9/tKvYrS56vYbyewxy4y6e7JbQT834lkhEXM6Uovs3SkL4MgAYhv594Tpq6+PMSBo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x62b.google.com with SMTP id a640c23a62f3a-a3f829cde6dso128663266b.0 for ; Thu, 29 Feb 2024 03:21:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1709205678; x=1709810478; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=8j3cHj4WmOEY72FSKZvNLUuwA/qTcW9eU62+WIYGK94=; b=Iyk/t9icDFWGa8pGBgVgOJ2F/qhTIx9yeATaM/YJ7ibO3dftTsyIOklvCXUipMqTQT w3TpjKVCrkvnJvNdgdhhmMF9f81GcVD2ZRyM43mEy42R0lLcAdlDmpnUhXAnbINv5Xza J/jy1mZBFyrbapMbzUTRRXS8VYGrfWqipY24rhjEY8tkx6sHly9oRpcg+EfQCOPZc8fQ GjMjiSkl9sRYEcMx0eBVbp9F7Fp+KbuB6UQOP91tYhqkGPX0Xy8TjP0VtKXe0T4Urslr 7KC/PhHK8ZLXNkslvYkd8RB25G2DQfSMW2/WFObUDluW/qoZOa+6LPOVkrMnPOjwPIrT QcSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709205678; x=1709810478; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8j3cHj4WmOEY72FSKZvNLUuwA/qTcW9eU62+WIYGK94=; b=OuITpPLBzwYjCJKwahOHdctVhd0Rh7X5URkM+Kt8MFsBNg7KsMiGL2P87pGVgUExlT IsWdvNvm/vHmg5clg7IrQg0KWaAPqg9shEG6wSIYow6wtA04S/rFMNLZC+SB7xgyMsJg E10wtGyDHdMZIICVREYzIHOMJJEfgZ16zA/x07Wk7qnkCOkjpkn06b5V4/blKduQzqpS dp3eWsN5AY1FhAFNi9mYP5Fg2mDNlXbXv5wUc7f66yJy+xT8qUYZfGnpSa23BqtsLYsc iBABtTI94LP0XIONPo/WWZ1AtntocQTabbpZ0Nhi5NeH5mJ+32Hz8zsDbAIpALuUIUuO Ia5w== X-Forwarded-Encrypted: i=1; AJvYcCVBNx2+NJBy8TxWAds9nSIcdfz0dikXfFkMKNWKopLU7pTXQZx0sQIL/mO+e9BA7qrptPF3xUzPeKsHzpzr+7nMvv/DvX3bZg== X-Gm-Message-State: AOJu0Yyep0bddWBiqM7BPSWk23hl5c7qNanciFARQSnxuji7g+DlRdaO iO5mvJUxCYLh36yWJVy5i4bNyxndriibW5K19gA4wOmn8TgSB0K5Es/WFEZ/Gg== X-Google-Smtp-Source: AGHT+IFY1tPyFmCvWheK338hLWx3zKyNhUfWn3dM7W+xa9C8FcTL4SEJXmRZQXyN91HINdhQ7Ei5IQ== X-Received: by 2002:a17:906:7150:b0:a44:17da:41d with SMTP id z16-20020a170906715000b00a4417da041dmr1297707ejj.46.1709205678052; Thu, 29 Feb 2024 03:21:18 -0800 (PST) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id vu4-20020a170907a64400b00a443d6b9a68sm596116ejc.15.2024.02.29.03.21.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 29 Feb 2024 03:21:17 -0800 (PST) Message-ID: Date: Thu, 29 Feb 2024 12:21:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] Support APX NF Content-Language: en-US To: "Cui, Lili" Cc: hongjiu.lu@intel.com, binutils@sourceware.org References: <20240227090106.200134-1-lili.cui@intel.com> From: Jan Beulich Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <20240227090106.200134-1-lili.cui@intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3022.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,LOTS_OF_MONEY,MONEY_NOHTML,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 27.02.2024 10:01, Cui, Lili wrote: > @@ -415,6 +416,9 @@ struct _i386_insn > /* Compressed disp8*N attribute. */ > unsigned int memshift; > > + /* No CSPAZO flags update. */ > + bool has_nf; > + > /* Prefer load or store in encoding. */ > enum > { There's a group of booleans further up and another one further down. Is there any reason not to leverage an available padding slot there? > @@ -6627,6 +6635,9 @@ md_assemble (char *line) > case unsupported_EGPR_for_addressing: > err_msg = _("extended GPR cannot be used as base/index"); > break; > + case unsupported_nf: > + err_msg = _("unsupported NF"); > + break; No tests showing this new error message in action? I'm once again a little worried about the resulting overall wording of the diagnostic. > @@ -7187,6 +7198,10 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only) > /* {rex2} */ > i.rex2_encoding = true; > break; > + case Prefix_NF: > + /* {NF} */ > + i.has_nf = true; > + break; > case Prefix_NoOptimize: > /* {nooptimize} */ > i.no_optimize = true; Nit: Preferably {nf} in the comment, matching comments in context. > @@ -8860,6 +8880,9 @@ match_template (char mnem_suffix) > goto check_operands_345; > } > else if (t->opcode_space != SPACE_BASE > + /* Map0 and map1 are promoted to MAP4 when NF is enabled. > + */ > + && !t->opcode_modifier.nf > && (t->opcode_space != SPACE_0F > /* MOV to/from CR/DR/TR, as an exception, follow > the base opcode space encoding model. */ I don't understand this: How does a template permitting NF matter here? I could see the immediately preceding "else if" become something along the lines of else if (is_cpu (t, CpuAPX_F) && (i.operands == 3 || i.has_nf)) But I admit I didn't fully think this through. It's just that the change as is looks wrong to me. > --- /dev/null > +++ b/gas/testsuite/gas/i386/x86-64-apx-nf.s >[...] > + {nf} ror %cl, 291(%r8, %rax, 4), %r9 > + {nf} sar $1, %bl > + {nf} sar $1, %bl, %dl > + {nf} sar $1, %dx > + {nf} sar $1, %dx, %ax > + {nf} sar $1, %ecx > + {nf} sar $1, %ecx, %edx > + {nf} sar $1, %r9 > + {nf} sar $1, %r9, %r31 > + {nf} sarb $1, 291(%r8, %rax, 4) > + {nf} sar $1, 291(%r8, %rax, 4), %bl > + {nf} sarw $1, 291(%r8, %rax, 4) > + {nf} sar $1, 291(%r8, %rax, 4), %dx > + {nf} sarl $1, 291(%r8, %rax, 4) > + {nf} sar $1, 291(%r8, %rax, 4), %ecx > + {nf} sarq $1, 291(%r8, %rax, 4) > + {nf} sar $1, 291(%r8, %rax, 4), %r9 > + {nf} sar $123, %bl > + {nf} sar $123, %bl, %dl > + {nf} sar $123, %dx > + {nf} sar $123, %dx, %ax > + {nf} sar $123, %ecx > + {nf} sar $123, %ecx, %edx > + {nf} sar $123, %r9 > + {nf} sar $123, %r9, %r31 > + {nf} sarb $123, 291(%r8, %rax, 4) > + {nf} sar $123, 291(%r8, %rax, 4), %bl > + {nf} sarw $123, 291(%r8, %rax, 4) > + {nf} sar $123, 291(%r8, %rax, 4), %dx > + {nf} sarl $123, 291(%r8, %rax, 4) > + {nf} sar $123, 291(%r8, %rax, 4), %ecx > + {nf} sarq $123, 291(%r8, %rax, 4) > + {nf} sar $123, 291(%r8, %rax, 4), %r9 > + {nf} sar %cl, %bl > + {nf} sar %cl, %bl, %dl > + {nf} sar %cl, %dx > + {nf} sar %cl, %dx, %ax > + {nf} sar %cl, %ecx > + {nf} sar %cl, %ecx, %edx > + {nf} sar %cl, %r9 > + {nf} sar %cl, %r9, %r31 > + {nf} sarb %cl, 291(%r8, %rax, 4) > + {nf} sar %cl, 291(%r8, %rax, 4), %bl > + {nf} sarw %cl, 291(%r8, %rax, 4) > + {nf} sar %cl, 291(%r8, %rax, 4), %dx > + {nf} sarl %cl, 291(%r8, %rax, 4) > + {nf} sar %cl, 291(%r8, %rax, 4), %ecx > + {nf} sarq %cl, 291(%r8, %rax, 4) > + {nf} sar %cl, 291(%r8, %rax, 4), %r9 > + {nf} shl $1, %bl > + {nf} shl $1, %bl, %dl > + {nf} shl $1, %dx > + {nf} shl $1, %dx, %ax > + {nf} shl $1, %ecx > + {nf} shl $1, %ecx, %edx > + {nf} shl $1, %r9 > + {nf} shl $1, %r9, %r31 > + {nf} shlb $1, 291(%r8, %rax, 4) > + {nf} shl $1, 291(%r8, %rax, 4), %bl > + {nf} shlw $1, 291(%r8, %rax, 4) > + {nf} shl $1, 291(%r8, %rax, 4), %dx > + {nf} shll $1, 291(%r8, %rax, 4) > + {nf} shl $1, 291(%r8, %rax, 4), %ecx > + {nf} shlq $1, 291(%r8, %rax, 4) > + {nf} shl $1, 291(%r8, %rax, 4), %r9 > + {nf} shl $123, %bl > + {nf} shl $123, %bl, %dl > + {nf} shl $123, %dx > + {nf} shl $123, %dx, %ax > + {nf} shl $123, %ecx > + {nf} shl $123, %ecx, %edx > + {nf} shl $123, %r9 > + {nf} shl $123, %r9, %r31 > + {nf} shlb $123, 291(%r8, %rax, 4) > + {nf} shl $123, 291(%r8, %rax, 4), %bl > + {nf} shlw $123, 291(%r8, %rax, 4) > + {nf} shl $123, 291(%r8, %rax, 4), %dx > + {nf} shll $123, 291(%r8, %rax, 4) > + {nf} shl $123, 291(%r8, %rax, 4), %ecx > + {nf} shlq $123, 291(%r8, %rax, 4) > + {nf} shl $123, 291(%r8, %rax, 4), %r9 > + {nf} shl %cl, %bl > + {nf} shl %cl, %bl, %dl > + {nf} shl %cl, %dx > + {nf} shl %cl, %dx, %ax > + {nf} shl %cl, %ecx > + {nf} shl %cl, %ecx, %edx > + {nf} shl %cl, %r9 > + {nf} shl %cl, %r9, %r31 > + {nf} shlb %cl, 291(%r8, %rax, 4) > + {nf} shl %cl, 291(%r8, %rax, 4), %bl > + {nf} shlw %cl, 291(%r8, %rax, 4) > + {nf} shl %cl, 291(%r8, %rax, 4), %dx > + {nf} shll %cl, 291(%r8, %rax, 4) > + {nf} shl %cl, 291(%r8, %rax, 4), %ecx > + {nf} shlq %cl, 291(%r8, %rax, 4) > + {nf} shl %cl, 291(%r8, %rax, 4), %r9 > + {nf} sal $1, %bl > + {nf} sal $1, %bl, %dl > + {nf} sal $1, %dx > + {nf} sal $1, %dx, %ax > + {nf} sal $1, %ecx > + {nf} sal $1, %ecx, %edx > + {nf} sal $1, %r9 > + {nf} sal $1, %r9, %r31 > + {nf} salb $1, 291(%r8, %rax, 4) > + {nf} sal $1, 291(%r8, %rax, 4), %bl > + {nf} salw $1, 291(%r8, %rax, 4) > + {nf} sal $1, 291(%r8, %rax, 4), %dx > + {nf} sall $1, 291(%r8, %rax, 4) > + {nf} sal $1, 291(%r8, %rax, 4), %ecx > + {nf} salq $1, 291(%r8, %rax, 4) > + {nf} sal $1, 291(%r8, %rax, 4), %r9 > + {nf} sal $123, %bl > + {nf} sal $123, %bl, %dl > + {nf} sal $123, %dx > + {nf} sal $123, %dx, %ax > + {nf} sal $123, %ecx > + {nf} sal $123, %ecx, %edx > + {nf} sal $123, %r9 > + {nf} sal $123, %r9, %r31 > + {nf} salb $123, 291(%r8, %rax, 4) > + {nf} sal $123, 291(%r8, %rax, 4), %bl > + {nf} salw $123, 291(%r8, %rax, 4) > + {nf} sal $123, 291(%r8, %rax, 4), %dx > + {nf} sall $123, 291(%r8, %rax, 4) > + {nf} sal $123, 291(%r8, %rax, 4), %ecx > + {nf} salq $123, 291(%r8, %rax, 4) > + {nf} sal $123, 291(%r8, %rax, 4), %r9 > + {nf} sal %cl, %bl > + {nf} sal %cl, %bl, %dl > + {nf} sal %cl, %dx > + {nf} sal %cl, %dx, %ax > + {nf} sal %cl, %ecx > + {nf} sal %cl, %ecx, %edx > + {nf} sal %cl, %r9 > + {nf} sal %cl, %r9, %r31 > + {nf} salb %cl, 291(%r8, %rax, 4) > + {nf} sal %cl, 291(%r8, %rax, 4), %bl > + {nf} salw %cl, 291(%r8, %rax, 4) > + {nf} sal %cl, 291(%r8, %rax, 4), %dx > + {nf} sall %cl, 291(%r8, %rax, 4) > + {nf} sal %cl, 291(%r8, %rax, 4), %ecx > + {nf} salq %cl, 291(%r8, %rax, 4) > + {nf} sal %cl, 291(%r8, %rax, 4), %r9 Hmm, I think sorting in the source file is more relevant than in the output (expectations), so I think this SAL block wants moving up. Would of course be yet more natural if we actually encoded SAL with ModR/M.reg=6 rather than the same encoding as SHL ... > --- a/opcodes/i386-dis-evex-reg.h > +++ b/opcodes/i386-dis-evex-reg.h > @@ -51,33 +51,33 @@ > }, > /* REG_EVEX_MAP4_80 */ > { > - { "addA", { VexGb, Eb, Ib }, NO_PREFIX }, > - { "orA", { VexGb, Eb, Ib }, NO_PREFIX }, > + { "%XNaddA", { VexGb, Eb, Ib }, NO_PREFIX }, > + { "%XNorA", { VexGb, Eb, Ib }, NO_PREFIX }, Since there are quite a number of entries which are affected (and more to come), did you consider using a single-character macro here? I realize the three we presently have free don't fit overly well letter-wise, but it ought to be possible to e.g. free up F (rarely used, could become a two- letter one) for use here. Seeing that you need to fiddle with the "case 'N'" code anyway, did you further consider giving 'N' a second purpose? Present and projected uses are easy to tell apart by being non-EVEX / EVEX respectively. If we really wanted to stick to a two-letter one, I think it would further want considering to use %NF, such that its purpose is immediately clear from the letters used. > @@ -9147,6 +9150,10 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins) > ins->vex.v = *ins->codep & 0x8; > ins->vex.mask_register_specifier = *ins->codep & 0x7; > ins->vex.zeroing = *ins->codep & 0x80; > + /* Set the NF bit for the EVEX instruction extended from the legacy or > + vex instruction, this bit will be cleared when it can be confirmed > + that its defaut type is evex. */ > + ins->vex.nf = *ins->codep & 0x4; > > if (ins->address_mode != mode_64bit) > { > @@ -9600,6 +9607,15 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax) > && ins.vex.prefix == DATA_PREFIX_OPCODE) > sizeflag ^= DFLAG; > > + if(ins.evex_type == evex_default) > + ins.vex.nf = false; Up to here I think I agree. > + else > + /* For EVEX-promoted formats, we need to clear EVEX.NF (For ccmp and > + ctest, they will be cleared separately.) in mask_register_specifier > + and keep the low 2 bits of mask_register_specifier to report errors > + for invalid cases.*/ > + ins.vex.mask_register_specifier &= 0x3; But this I'm in trouble with: How would you recognize (and accordingly print) insns with NF wrongly set? (By implication there's also a respective testcase [addition] missing.) > --- a/opcodes/i386-opc.h > +++ b/opcodes/i386-opc.h > @@ -1017,7 +1017,8 @@ typedef struct insn_template > #define Prefix_EVEX 7 /* {evex} */ > #define Prefix_REX 8 /* {rex} */ > #define Prefix_REX2 9 /* {rex2} */ > -#define Prefix_NoOptimize 10 /* {nooptimize} */ > +#define Prefix_NF 10 /* {nf} */ > +#define Prefix_NoOptimize 11 /* {nooptimize} */ I find it increasingly puzzling that nooptimize is pushed all further down, for no real reason. > --- a/opcodes/i386-opc.tbl > +++ b/opcodes/i386-opc.tbl > @@ -310,32 +310,42 @@ sti, 0xfb, 0, NoSuf, {} > // Arithmetic. > add, 0x0, APX_F, D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64 } > add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex } > +add, 0x0, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex } > add, 0x83/0, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } > add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex } > +add, 0x83/0, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|NF, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex } > add, 0x4, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S, Acc|Byte|Word|Dword|Qword } > add, 0x80/0, APX_F, W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64} > add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex } > +add, 0x80/0, APX_F, W|Modrm|No_sSuf|EVexMap4|NF, { Imm8|Imm16|Imm32|Imm32S, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex } Adding these templates has a 2nd effect, for which no testcase is being added: They now allow (taking the example here) "{evex} add $1, %eax". Such a new test (which could be less extensive than the -nf one you already add) should then also cover ADCX and ADOX, for which the 2-operand EVEX templates were added prematurely. > inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 } > inc, 0xfe/0, APX_F, W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVexMap4|NF, {Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64} > inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex } > +inc, 0xfe/0, APX_F, W|Modrm|No_sSuf|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex } > > sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, } > sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex } > +sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|Optimize|EVexMap4|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Unspecified|BaseIndex } What's the purpose of Optimize here? Just to repeat my earlier request: Please don't blindly copy all attributes when you clone templates. See how the existing APX template already doesn't have this attribute. Apparently you re-cloned the legacy one, not the APX one. Specifically here, this template will only be chosen if either {nf} or {evex} is present. Both of which preclude the NDD->REX2 transformation, in turn making further optimization impossible. As to {nf} precluding optimization: can_convert_NDD_to_legacy() checks i.tm.opcode_modifier.nf rather than i.has_nf. That's entirely dead code, as i.tm is populated only by install_template(). This check wants dropping in a prereq patch, I suppose, and then the patch here should add the correct check. I recall saying back then that a respective check needs adding here, not already in the patch introducing the transformation. Jan