Re: [PATCH v3 6/9] Support APX NDD

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"Kong, Lingling" <lingling.kong@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: Re: [PATCH v3 6/9] Support APX NDD
Date: Mon, 11 Dec 2023 17:50:12 +0100	[thread overview]
Message-ID: <3ff16131-d959-4ac0-85e2-7f3a7a1b28b3@suse.com> (raw)
In-Reply-To: <SJ0PR11MB560048569C0181D3832F88DF9E8FA@SJ0PR11MB5600.namprd11.prod.outlook.com>

On 11.12.2023 14:36, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> --- a/opcodes/i386-opc.tbl
>>> +++ b/opcodes/i386-opc.tbl
>>> @@ -139,9 +139,13 @@
>>>  #define Vsz256 Vsz=VSZ256
>>>  #define Vsz512 Vsz=VSZ512
>>>
>>> +#define DstVVVV VexVVVV=VexVVVV_DST
>>> +
>>>  // The EVEX purpose of StaticRounding appears only together with SAE.
>>> Re-use  // the bit to mark commutative VEX encodings where swapping
>>> the source  // operands may allow to switch from 3-byte to 2-byte VEX
>> encoding.
>>> +// And re-use the bit to mark some NDD insns that swapping the source
>>> +operands // may allow to switch from EVEX encoding to REX2 encoding.
>>>  #define C StaticRounding
>>>
>>>  #define FP 387|287|8087
>>> @@ -288,26 +292,40 @@ std, 0xfd, 0, NoSuf, {}  sti, 0xfb, 0, NoSuf, {}
>>>
>>>  // Arithmetic.
>>> +add, 0x0, APX_F,
>>>
>> +D|C|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|N
>> F, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>
>> There is _still_ Byte|Word|Dword|Qword in here (and below), when I think I
>> pointed out more than once before that in new templates such redundancy
>> wants omitting.
>>
>> Since this isn't the first instance of earlier review comments not taken care of,
>> may I please ask that you make reasonably sure that new versions aren't sent
>> out like this?
>>
> 
> This part could indeed be omitted, but I really don't remember you mentioning it on the APX patches.

Already in e.g.
https://sourceware.org/pipermail/binutils/2023-November/130422.html
I pointed out that such earlier comments in e.g.
https://sourceware.org/pipermail/binutils/2023-September/129590.html
were not addressed.

> There are still a lot of redundant Byte|Word|Dword|Qword in the opcode table, APX just added some flags on top of the old ones. Do you mind if I create a patch first to remove the redundant parts of master?

I don't mind you cleaning up first. It's just that normally I wouldn't do
so in a separate patch (one of the reasons being that such non-functional
changes get in the way of using "git blame" or alike when trying to find
the most recent real change to a line), unless it was only a handful of
instances left. Instead I typically do such tidying as lines are touched
anyway. Thing here simply is that new templates shouldn't have such
anomalies anymore.

>>>  add, 0x0, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, {
>>> Reg8|Reg16|Reg32|Reg64,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +add, 0x83/0, APX_F,
>>>
>> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|
>> NF, {
>>> +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 }
>>>  add, 0x83/0, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  add,
>> 0x4,
>>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>> Acc|Byte|Word|Dword|Qword }
>>> +add, 0x80/0, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64}
>>>  add, 0x80/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>>
>>>  inc, 0x40, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
>>> +inc, 0xfe/0, APX_F,
>>> +W|Modrm|No_sSuf|CheckOperandSize|DstVVVV|EVex128|EVexMap4|NF,
>>>
>> +{Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64}
>>>  inc, 0xfe/0, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>>
>>> +sub, 0x28, APX_F,
>>>
>> +D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Opti
>> mize|
>>> +NF, { Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64, }
>>
>> Here and elsewhere, what's Optimize for? It not being there on other
>> templates, it can't be for the EVEX->REX2 optimization? If there are further
>> optimization plans, that's (again) something to mention in the description. Yet
>> better would be if such attributes were added only when respective
>> optimizations are actually introduced. Unlike e.g. NF, which would mean
>> another bulk update if not added right away, new optimizations typically affect
>> only a few templates at a time.
>>
> 
> Optimize is not new.
> 
> sub, 0x28, APX_F, D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4|Optimize|NF, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex, Reg8|Reg16|Reg32|Reg64, }
> sub, 0x28, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, { Reg8|Reg16|Reg32|Reg64, Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex }

Optimize is legitimately there for the legacy template. If the new template
also wants it, there needs to be some reason. Otherwise it is part of the
tranformation to APX/EVEX to drop it.

>>>  sub, 0x28, 0,
>>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
>>> Reg8|Reg16|Reg32|Reg64,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +sub, 0x83/5, APX_F,
>>> +Modrm|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, { Imm8S,
>>> +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 }
>>>  sub, 0x83/5, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  sub,
>> 0x2c,
>>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>> Acc|Byte|Word|Dword|Qword }
>>> +sub, 0x80/5, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  sub, 0x80/5, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>
>> There are still only 3 new templates here (and also above for add, plus for
>> other similar insns), when ...
>>
>>>  dec, 0x48, No64, No_bSuf|No_sSuf|No_qSuf, { Reg16|Reg32 }
>>> +dec, 0xfe/1, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  dec, 0xfe/1, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>>
>>> +sbb, 0x18, APX_F,
>>> +D|W|CheckOperandSize|Modrm|No_sSuf|DstVVVV|EVex128|EVexMap4, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  sbb, 0x18, 0, D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock, {
>>> Reg8|Reg16|Reg32|Reg64,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +sbb, 0x18, APX_F,
>>> +D|W|CheckOperandSize|Modrm|EVex128|EVexMap4|No_sSuf, {
>>> +Reg8|Reg16|Reg32|Reg64,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x }
>>> +sbb, 0x83/3, APX_F,
>>>
>> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4,
>> {
>>> +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 }
>>>  sbb, 0x83/3, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>>> +sbb, 0x83/3, APX_F, Modrm|EVex128|EVexMap4|No_bSuf|No_sSuf,
>> { Imm8S,
>>> +Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }
>>>  sbb, 0x1c, 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>>> Acc|Byte|Word|Dword|Qword }
>>> +sbb, 0x80/3, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  sbb, 0x80/3, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +sbb, 0x80/3, APX_F, W|Modrm|EVex128|EVexMap4|No_sSuf, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x }
>>
>> ... there are 6 new templates here. This is again an aspect I had pointed out
>> before. You cannot defer the addition of the other 3 until the NF patch, as you
>> want to make sure that with just this patch in place something both
>>
>>     {evex} sbb %eax, %eax
>>
>> and
>>
>>     {evex} sub %eax, %eax
>>
>> actually assemble, and to EVEX encodings. I can't see how that would work in
>> the latter case without those further templates.
>>
>> The alternative is to also defer adding the 2-operand SBB templates (and any
>> others you add here which don't use DstVVVV).
>>
> 
> I'm having a headache with this, some instructions like sbb don't support NF, originally they were in the 4/9 patch, but their disassemblers are in the NDD patch, and you agreed to put them in the NDD patch.

Right, yet still the overall result wants to be consistent. Hence why I'm
not demanding that you move these templates yet later (which is one
option). Instead I've indicated that moving the others ahead would also
be okay.

Like with any series, you want it to be in a shape where it can be committed
piecemeal. Which is even more important with a release around the corner.
If we end up with just partial APX support in 2.42, that partial support
should be in a shape that's predictable to users.

> Now I really don't know where to move. Moving encoding, decoding, and especially test cases for instructions between patches is cumbersome and I really don't think it makes much sense.

I can see your point, and I'm sorry for the hassle. Part of the problem of
the moving being troublesome is (imo) that many of the patches simply were
(are) doing too many things at a time anyway.

>>>  xor, 0x30, 0,
>>> D|W|CheckOperandSize|Modrm|No_sSuf|HLEPrefixLock|Optimize, {
>>> Reg8|Reg16|Reg32|Reg64,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>> +xor, 0x83/6, APX_F,
>>>
>> +Modrm|CheckOperandSize|No_bSuf|No_sSuf|DstVVVV|EVex128|EVexMap4|
>> NF, {
>>> +Imm8S, Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex,
>>> +Reg16|Reg32|Reg64 }
>>>  xor, 0x83/6, 0, Modrm|No_bSuf|No_sSuf|HLEPrefixLock, { Imm8S,
>>> Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex }  xor,
>> 0x34,
>>> 0, W|No_sSuf, { Imm8|Imm16|Imm32|Imm32S,
>> Acc|Byte|Word|Dword|Qword }
>>> +xor, 0x80/6, APX_F,
>>> +W|Modrm|CheckOperandSize|No_sSuf|DstVVVV|EVex128|EVexMap4|NF, {
>>> +Imm8|Imm16|Imm32|Imm32S,
>>>
>> +Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseInde
>> x,
>>> +Reg8|Reg16|Reg32|Reg64 }
>>>  xor, 0x80/6, 0, W|Modrm|No_sSuf|HLEPrefixLock, {
>>> Imm8|Imm16|Imm32|Imm32S,
>>>
>> Reg8|Reg16|Reg32|Reg64|Byte|Word|Dword|Qword|Unspecified|BaseIndex
>> }
>>>
>>>  // clr with 1 operand is really xor with 2 operands.
>>>  clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, {
>>> Reg8|Reg16|Reg32|Reg64 }
>>
>> Btw., for consistency this may also want accompanying with an EVEX
>> counterpart.
>>
> 
> Do you mean to add an entry like this? It should belong to the previous patch.
> 
> // clr with 1 operand is really xor with 2 operands.
> clr, 0x30, 0, W|Modrm|No_sSuf|RegKludge|Optimize, { Reg8|Reg16|Reg32|Reg64 }
> clr, 0x30, APX_F, W|Modrm|No_sSuf|RegKludge|EVex128|EVexMap4|Optimize, { Reg8|Reg16|Reg32|Reg64 }

Yes, something like this. And possibly indeed not the patch here; the
template simply happened to be in context. Where exactly it wants to
go depends - see above - on where other similar templates are
introduced. Note however that the corresponding XOR templates are
introduced here, just above and still in context.

Jan

next prev parent reply	other threads:[~2023-12-11 16:50 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30   ` Jan Beulich
2023-12-05 13:31     ` Cui, Lili
2023-12-06  7:52       ` Jan Beulich
2023-12-06 12:43         ` Cui, Lili
2023-12-07  9:01           ` Jan Beulich
2023-12-08  3:10             ` Cui, Lili
2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38   ` Jan Beulich
2023-12-08 15:21     ` Cui, Lili
2023-12-11  8:34       ` Jan Beulich
2023-12-12 10:44         ` Cui, Lili
2023-12-12 11:16           ` Jan Beulich
2023-12-12 12:32             ` Cui, Lili
2023-12-12 12:39               ` Jan Beulich
2023-12-12 13:15                 ` Cui, Lili
2023-12-12 14:13                   ` Jan Beulich
2023-12-13  7:36                     ` Cui, Lili
2023-12-13  7:48                       ` Jan Beulich
2023-12-12 12:58         ` Cui, Lili
2023-12-12 14:04           ` Jan Beulich
2023-12-13  8:35             ` Cui, Lili
2023-12-13  9:13               ` Jan Beulich
2023-12-07 13:34   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:43       ` Jan Beulich
2023-12-11 11:50   ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:55       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12   ` Jan Beulich
2023-12-11 13:36     ` Cui, Lili
2023-12-11 16:50       ` Jan Beulich [this message]
2023-12-13 10:42         ` Cui, Lili
2024-03-22 10:02     ` Jan Beulich
2024-03-22 10:31       ` Jan Beulich
2024-03-26  2:04         ` Cui, Lili
2024-03-26  7:06           ` Jan Beulich
2024-03-26  7:18             ` Cui, Lili
2024-03-22 10:59       ` Jan Beulich
2024-03-26  8:22         ` Cui, Lili
2024-03-26  9:30           ` Jan Beulich
2024-03-27  2:41             ` Cui, Lili
2023-12-08 14:27   ` Jan Beulich
2023-12-12  5:53     ` Cui, Lili
2023-12-12  8:28       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17   ` Jan Beulich
2023-12-15  8:38     ` Cui, Lili
2023-12-15  8:44       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27   ` Jan Beulich
2023-12-12  3:18     ` Hu, Lin1
2023-12-12  8:41       ` Jan Beulich
2023-12-13  5:31         ` Hu, Lin1
2023-12-12  8:45       ` Jan Beulich
2023-12-13  6:06         ` Hu, Lin1
2023-12-13  8:19           ` Jan Beulich
2023-12-13  8:34             ` Hu, Lin1
2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22   ` Cui, Lili
2023-11-24 12:14     ` Jan Beulich
2023-12-12  2:57 ` Lu, Hongjiu
2023-12-12  8:16 ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ff16131-d959-4ac0-85e2-7f3a7a1b28b3@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    --cc=lili.cui@intel.com \
    --cc=lingling.kong@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).