Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
Date: Mon, 11 Dec 2023 09:34:32 +0100	[thread overview]
Message-ID: <0bb5fbcd-f58e-48ad-a5ee-3413b026f903@suse.com> (raw)
In-Reply-To: <SJ0PR11MB56007B278C5F7522A3F288839E8AA@SJ0PR11MB5600.namprd11.prod.outlook.com>

On 08.12.2023 16:21, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, December 7, 2023 8:39 PM
>>
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>>>
>>>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>> -  {
>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>> -	   || maybe_cpu (t, CpuFMA))
>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>>> +    {
>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>> APX_F(CpuCMPCCXADD)
>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>> APX_F(CpuAVX512DQ)
>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>  	{
>>>  	  if (need_evex_encoding ())
>>
>> There are several issues here:
>> - Why did you need to change (to the worse) the original code?
>> - Why did you not model the addition after that original code?
>> - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512
>> insn can be VEX-encoded?
> 
>  I don't understand what you mean, we have this combination.
> 
> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }

Oh, I'm sorry: I forgot about the mask register insns.

>> - If these new macros are really needed for whatever reason, they shouldn't
>>   be added to opcodes/i386-opc.h when they're useful only in the assembler.
>> - Style requires a blank before the opening parenthesis in function
>>   invocations (which also covers function-like macro invocations).
>>
>> I think I asked before: How is it that you get away without altering
>> cpu_flags_match(), containing related and quite similar logic?
>>
> 
> For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket and the content in the following brackets can be combined arbitrarily. I think it is Inaccurate.

In which way? If there are issues with the existing code, these issues want
taking care of in separate (prereq) patches. Of course there are assumptions
made here about the CPU combinations that can (and cannot) occur in any of
our templates. Similar assumptions are imo fine to make in the APX additions.

Note how I used two nested if()s despite that not having been necessary at
that time. I did so in anticipation that for APX you'd want to add another
(separate) inner if(), rather than altering the one that's there.

> So I give examples one by one for each identified combination.

Which examples are you talking about? I see none given in your reply.

> Just found cpu_flags_match() has similar logic, I think the following is the only code related to CPUID alerts, but none of our combinations are related to cpuavx.
> 
>           if (all.bitfield.cpuavx)
>             {
>               /* We need to check SSE2AVX with AVX.  */
>               if (!t->opcode_modifier.sse2avx
>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>                 match |= CPU_FLAGS_ARCH_MATCH;
>             }

Not sure why you pick out this one. This special case is needed for sse2avx;
I don't see how it's related here. What I've been pointing you at is the
code in that function which follows a similar "Dual VEX/EVEX templates ..."
comment.

>>> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
>> *r)
>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
>>>  	  || flag_code != CODE_64BIT)
>>>  	return false;
>>> +
>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as
>> EVEX.
>>> +	 For the later install_template function.  */
>>> +      if (current_templates->start->opcode_modifier.vex
>>> +	  && current_templates->start->opcode_modifier.evex)
>>> +	i.vec_encoding = vex_encoding_evex;
>>
>> I'm afraid I don't understand the 2nd sentence of the comment. This may be
>> related to my question regarding cpu_flags_match() further up.
>>
>> The first sentence isn't quite correct either - you don't mark any template here
>> (and you can't, because we don't even know yet which template we're going
>> to use).
>>
>> Finally - do you really need the .evex check here? (I won't exclude that this
>> yields a better diagnostic in certain cases, but this wants clarifying if so.)
>>
> 
> If you look at install_template(), you'll see that before this function we need to know if the current encoding is evex.

"This function" being check_register()? If so, then no, we can't know up front
whether EVEX encoding is going to be needed, as operand parsing happens ahead
of template selection. If instead you mean "that function" and hence
install_template(), then yes, we need to know whether to use EVEX there. Yet
how does that result in a need for the .evex check here? (Or maybe your reply
was really to the first of the three parts of my earlier one?)

But anyway - as said earlier on, using current_templates here looks wrong in
the first place. check_register() deals with only a register, without regard
to the context it is used in (with the sole exception of allow_pseudo_reg).
May I remind you that earlier on I already indicated that I suspect you'll
need a new enumerator to put in i.vec_encoding for this new purpose?

> We need to check opcode_modifier.evex here, it is a fix for issues caused by the merge of VEX and EVEX.
>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>     {
>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>         {
>           if (need_evex_encoding ())
>             {
>[...]
>>> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
>>>
>>>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
>>> Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
>>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
>>> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
>>> +Oword|Unspecified|BaseIndex, Reg64 }
>>>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
>>> Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
>>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
>>> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
>>> +Oword|Unspecified|BaseIndex, Reg64 }
>>
>> Seeing these: Are there any Map4 encodings which aren't EVex128? If not
>> (and if you're also not hiddenly aware of some appearing in the near future),
>> please consider making EVexMap4 include this right away. Even if in the longer
>> run other encodings appear, it'll then be easy to simply replace all the
>> EVexMap4 uses in a purely mechanical way. Until then shorter template lines
>> are preferable.
>>
> 
> Would you mind defining it this way? Since #define EVex128 is behind it. Considering that you don't like unnecessary changes.
> 
> +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128

The order of #define-s doesn't matter. There's no reason not to use EVex128 here
even if it's #define-d only a few lines later.

>>> @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
>>>
>>>  // BMI2 instructions.
>>>
>>> -bzhi, 0xf5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -mulx, 0xf2f6, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -pdep, 0xf2f5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -pext, 0xf3f5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -rorx, 0xf2f0, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSu
>> f, {
>>> Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex,
>> Reg32|Reg64
>>> } -sarx, 0xf3f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -shlx, 0x66f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -shrx, 0xf2f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 }
>>> +bzhi, 0xf5, BMI2&(BMI2|APX_F),
>>>
>> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
>> ources|N
>>> +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
>>> +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
>>
>> Hmm, I had specifically suggested a pre-processor macro to use in place of the
>> open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here
>> and below)?
> 
> There are many different types of combinations, and each combination appears relatively few times, so I think adding a #define for each combination feels a bit wasteful.

I never suggested using multiple #define-s. I suggested a single APX_F()
macro which would be used uniformly here and elsewhere (here: APX_F(BMI2)).
And that macro would come with a comment explaining why the expression is
the (seemingly strange) way it is. Right now there's no such explanation
anywhere, and it would also be hard to find a good (central) place where to
put it.

Jan

next prev parent reply	other threads:[~2023-12-11  8:34 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30   ` Jan Beulich
2023-12-05 13:31     ` Cui, Lili
2023-12-06  7:52       ` Jan Beulich
2023-12-06 12:43         ` Cui, Lili
2023-12-07  9:01           ` Jan Beulich
2023-12-08  3:10             ` Cui, Lili
2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38   ` Jan Beulich
2023-12-08 15:21     ` Cui, Lili
2023-12-11  8:34       ` Jan Beulich [this message]
2023-12-12 10:44         ` Cui, Lili
2023-12-12 11:16           ` Jan Beulich
2023-12-12 12:32             ` Cui, Lili
2023-12-12 12:39               ` Jan Beulich
2023-12-12 13:15                 ` Cui, Lili
2023-12-12 14:13                   ` Jan Beulich
2023-12-13  7:36                     ` Cui, Lili
2023-12-13  7:48                       ` Jan Beulich
2023-12-12 12:58         ` Cui, Lili
2023-12-12 14:04           ` Jan Beulich
2023-12-13  8:35             ` Cui, Lili
2023-12-13  9:13               ` Jan Beulich
2023-12-07 13:34   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:43       ` Jan Beulich
2023-12-11 11:50   ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:55       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12   ` Jan Beulich
2023-12-11 13:36     ` Cui, Lili
2023-12-11 16:50       ` Jan Beulich
2023-12-13 10:42         ` Cui, Lili
2024-03-22 10:02     ` Jan Beulich
2024-03-22 10:31       ` Jan Beulich
2024-03-26  2:04         ` Cui, Lili
2024-03-26  7:06           ` Jan Beulich
2024-03-26  7:18             ` Cui, Lili
2024-03-22 10:59       ` Jan Beulich
2024-03-26  8:22         ` Cui, Lili
2024-03-26  9:30           ` Jan Beulich
2024-03-27  2:41             ` Cui, Lili
2023-12-08 14:27   ` Jan Beulich
2023-12-12  5:53     ` Cui, Lili
2023-12-12  8:28       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17   ` Jan Beulich
2023-12-15  8:38     ` Cui, Lili
2023-12-15  8:44       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27   ` Jan Beulich
2023-12-12  3:18     ` Hu, Lin1
2023-12-12  8:41       ` Jan Beulich
2023-12-13  5:31         ` Hu, Lin1
2023-12-12  8:45       ` Jan Beulich
2023-12-13  6:06         ` Hu, Lin1
2023-12-13  8:19           ` Jan Beulich
2023-12-13  8:34             ` Hu, Lin1
2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22   ` Cui, Lili
2023-11-24 12:14     ` Jan Beulich
2023-12-12  2:57 ` Lu, Hongjiu
2023-12-12  8:16 ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0bb5fbcd-f58e-48ad-a5ee-3413b026f903@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    --cc=lili.cui@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).