From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
"Kong, Lingling" <lingling.kong@intel.com>,
"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH 4/8] Support APX NDD
Date: Sun, 22 Oct 2023 14:05:34 +0000 [thread overview]
Message-ID: <SJ0PR11MB5600EAA8556B44606A5908AB9ED9A@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <9d317289-6d83-3f9b-ef34-af574e798a3f@suse.com>
> > gas/config/tc-i386.c | 80 ++++++++----
> > gas/testsuite/gas/i386/x86-64-apx-ndd.d | 165
> > ++++++++++++++++++++++++ gas/testsuite/gas/i386/x86-64-apx-ndd.s |
> > 156 ++++++++++++++++++++++ gas/testsuite/gas/i386/x86-64-pseudos.d |
> > 42 ++++++ gas/testsuite/gas/i386/x86-64-pseudos.s | 43 ++++++
> > gas/testsuite/gas/i386/x86-64.exp | 1 +
> > opcodes/i386-dis-evex-prefix.h | 4 +-
> > opcodes/i386-dis-evex-reg.h | 123 ++++++++++++++++++
> > opcodes/i386-dis-evex.h | 124 +++++++++---------
> > opcodes/i386-dis.c | 47 ++++++-
> > opcodes/i386-opc.h | 1 +
> > opcodes/i386-opc.tbl | 67 ++++++++++
> > 12 files changed, 762 insertions(+), 91 deletions(-) create mode
> > 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
> > create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
> >
> > diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index
> > 48916bc3846..381e389bb04 100644
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -2261,8 +2261,9 @@ operand_size_match (const insn_template *t)
> > unsigned int given = i.operands - j - 1;
> >
> > /* For FMA4 and XOP insns VEX.W controls just the first two
> > - register operands. */
> > - if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
> > + register operands. And APX insns just swap the first operands. */
> > + if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
> > + || (is_cpu (t,CpuAPX_F) && i.operands == 3))
> > given = j < 2 ? 1 - j : j;
>
> In the comment, how about "And APX_F insns just swap the two source
> operands, with the 3rd one being the destination"?
>
Done.
> Is the "i.operands == 3" part of the condition really needed? I.e. are there any
> APX_F insns which can make it here but must not take this path? Afaict 2-
> operand insns are fine to go here, and more-than-3-operand insns don't come
> with the D attribute.
>
You're right, deleted "i.operands == 3".
> Also (nit) there's a missing blank after the comma.
>
Done.
> > @@ -3876,6 +3877,7 @@ is_any_apx_encoding (void) {
> > return i.rex2
> > || i.rex2_encoding
> > + || i.tm.opcode_space == SPACE_EVEXMAP4
> > || (i.vex.register_specifier
> > && i.vex.register_specifier->reg_flags & RegRex2); } @@ -4204,6
> > +4206,10 @@ build_legacy_insns_with_apx_encoding (void)
> > }
> >
> > build_evex_insns_with_extend_evex_prefix ();
> > +
> > + /* Encode the NDD bit. */
> > + if (i.vex.register_specifier)
> > + i.vex.bytes[3] |= 0x10;
> > }
> >
> > static void
> > @@ -7383,26 +7389,31 @@ match_template (char mnem_suffix)
> > overlap1 = operand_type_and (operand_types[0],
> operand_types[1]);
> > if (t->opcode_modifier.d && i.reg_operands == i.operands
> > && !operand_type_all_zero (&overlap1))
> > - switch (i.dir_encoding)
> > - {
> > - case dir_encoding_load:
> > - if (operand_type_check (operand_types[i.operands - 1],
> anymem)
> > - || t->opcode_modifier.regmem)
> > - goto check_reverse;
> > - break;
> > + {
> > + int isMemOperand = (t->opcode_modifier.vexvvvv
> > + && t->opcode_space == SPACE_EVEXMAP4)
> > + ? i.operands - 2 : i.operands - 1;
>
> "is" in the variable name is properly misleading. What you're determining
> here is which operand you want to _check_ for being the memory operand.
>
Changed to MemOperand.
> > +
> As to the condition, the two side of && may want swapping: In such a
> condition it is generally desirable to have the more restricting part first. Plus
> this may be more neat to express without ?: anyway:
>
> i.operands - 1 - (t->opcode_space == SPACE_EVEXMAP4 && t-
> >opcode_modifier.vexvvvv)
>
> (suitably line wrapped of course).
>
Done.
> > + switch (i.dir_encoding)
> > + {
> > + case dir_encoding_load:
> > + if (operand_type_check (operand_types[isMemOperand],
> anymem)
> > + || t->opcode_modifier.regmem)
> > + goto check_reverse;
> > + break;
> >
> > - case dir_encoding_store:
> > - if (!operand_type_check (operand_types[i.operands - 1],
> anymem)
> > - && !t->opcode_modifier.regmem)
> > - goto check_reverse;
> > - break;
> > + case dir_encoding_store:
> > + if (!operand_type_check (operand_types[isMemOperand],
> anymem)
> > + && !t->opcode_modifier.regmem)
> > + goto check_reverse;
> > + break;
> >
> > - case dir_encoding_swap:
> > - goto check_reverse;
> > + case dir_encoding_swap:
> > + goto check_reverse;
> >
> > - case dir_encoding_default:
> > - break;
> > - }
> > + case dir_encoding_default:
> > + break;
> > + }
> > + }
> > /* If we want store form, we skip the current load. */
> > if ((i.dir_encoding == dir_encoding_store
> > || i.dir_encoding == dir_encoding_swap) @@ -7432,11 +7443,13
> > @@ match_template (char mnem_suffix)
> > continue;
> > /* Try reversing direction of operands. */
> > j = is_cpu (t, CpuFMA4)
> > - || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
> > + || is_cpu (t, CpuXOP)
> > + || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
> > overlap0 = operand_type_and (i.types[0], operand_types[j]);
> > overlap1 = operand_type_and (i.types[j], operand_types[0]);
> > overlap2 = operand_type_and (i.types[1], operand_types[1]);
> > - gas_assert (t->operands != 3 || !check_register);
> > + gas_assert (t->operands != 3 || !check_register
> > + || is_cpu (t,CpuAPX_F));
>
> Nit: Missing blank again.
Done.
>
> > @@ -7471,6 +7484,12 @@ match_template (char mnem_suffix)
> > found_reverse_match = Opcode_VexW;
> > goto check_operands_345;
> > }
> > + else if (is_cpu (t,CpuAPX_F)
> > + && i.operands == 3)
> > + {
> > + found_reverse_match = Opcode_APX_NDDD;
> > + goto check_operands_345;
> > + }
> > else if (t->opcode_space != SPACE_BASE
> > && (t->opcode_space != SPACE_0F
> > /* MOV to/from CR/DR/TR, as an exception, follow
> @@ -7636,6
> > +7655,15 @@ match_template (char mnem_suffix)
> > flipping VEX.W. */
> > i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
> >
> > + j = i.tm.operand_types[0].bitfield.imm8;
> > + i.tm.operand_types[j] = operand_types[j + 1];
> > + i.tm.operand_types[j + 1] = operand_types[j];
> > + break;
>
> I'm not overly happy to see this code getting duplicated. Are there any
> encodings at all which have D and and immediate operand? I don't think so, in
> which case this at least wants simplifying. But read on.
>
> > + case Opcode_APX_NDDD:
> > + /* Only the first two register operands need reversing. */
> > + i.tm.base_opcode ^= 0x2;
>
> I think you mean Opcode_D here?
>
> > j = i.tm.operand_types[0].bitfield.imm8;
> > i.tm.operand_types[j] = operand_types[j + 1];
> > i.tm.operand_types[j + 1] = operand_types[j];
>
> Taking both remarks together, do we need Opcode_APX_NDDD at all? Can't
> you use the ordinary Opcode_D, with
>
> default:
> /* If we found a reverse match we must alter the opcode direction
> bit and clear/flip the regmem modifier one. found_reverse_match
> holds bits to change (different for int & float insns). */
>
> i.tm.base_opcode ^= found_reverse_match;
>
> if (i.tm.opcode_space == SPACE_EVEXMAP4 && i.operands == 3)
> goto swap_first_2;
> ...
> swap_first_2:
> j = i.tm.operand_types[0].bitfield.imm8;
> i.tm.operand_types[j] = operand_types[j + 1];
> i.tm.operand_types[j + 1] = operand_types[j];
> break;
>
> ? (I'm not convinced the i.operands == 3 part of the condition is needed; if at
> all possible it wants omitting.)
>
Your suggestion is indeed better than before. It worked without "i.operands == 3".
> > @@ -8462,8 +8490,8 @@ process_operands (void)
> > const reg_entry *default_seg = NULL;
> >
> > /* We only need to check those implicit registers for instructions
> > - with 3 operands or less. */
> > - if (i.operands <= 3)
> > + with 4 operands or less. */
> > + if (i.operands <= 4)
> > for (unsigned int j = 0; j < i.operands; j++)
> > if (i.types[j].bitfield.instance != InstanceNone)
> > i.reg_operands--;
>
> How useful is it to keep the outer if() when 4-operand insns now also need
> checking? There are extremely few 5-operand ones ...
>
Deleted it.
> > @@ -8825,6 +8853,9 @@ build_modrm_byte (void)
> > break;
> > if (v >= dest)
> > v = ~0;
> > + if (i.tm.opcode_space == SPACE_EVEXMAP4
> > + && i.tm.opcode_modifier.vexvvvv)
> > + v = dest;
> > if (i.tm.extension_opcode != None)
> > {
> > if (dest != source)
> > @@ -9088,6 +9119,9 @@ build_modrm_byte (void)
> > set_rex_vrex (i.op[op].regs, REX_B, false);
> > }
> >
> > + if (i.tm.opcode_space == SPACE_EVEXMAP4
> > + && i.tm.opcode_modifier.vexvvvv)
> > + dest--;
> > if (op == dest)
> > dest = ~0;
> > if (op == source)
>
> These two changes are at the very least problematic with .insn, whose
> behavior may not change. I'd also prefer if we could get away with just one
> change to the function. Did you consider alternatives? We could re- widen
> VexVVVV, such that the value 2 indicates that the destination is encoded there.
> That then also has no chance of conflicting with .insn.
>
I added value 2 for NDD, if it's ok, I will create another patch to move (i.tm.extension_opcode != None) to VexVVVVDEST branch, and use value 3 instead of SWAP_SOURCES, maybe name it VexVVVVSRC1, or just VexVVVVOP1, VexVVVVOP2 and VexVVVVOP3?
/* How to encode VEX.vvvv:
0: VEX.vvvv must be 1111b.
1: VEX.vvvv encodes one of the register operands.
2: VEX.vvvv encodes as the dest register operands.
*/
#define VexVVVVSRC 1
#define VexVVVVDEST 2
VexVVVV,
if (i.tm.opcode_modifier.vexvvvv == VexVVVVDEST)
{
v = dest;
dest-- ;
}
else if (i.tm.opcode_modifier.vexvvvv == VexVVVVSRC)
{
v = source + 1;
for (v = source + 1; v < dest; ++v)
if (v != reg_slot)
break;
if (i.tm.extension_opcode != None)
{
if (dest != source)
v = dest;
dest = ~0;
}
gas_assert (source < dest);
if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
&& source != op)
{
unsigned int tmp = source;
source = v;
v = tmp;
}
}
else
v = ~0;
> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> > @@ -0,0 +1,156 @@
> > +# Check 64bit APX NDD instructions with evex prefix encoding
> > +
> > + .allow_index_reg
> > + .text
> > +_start:
> > +cmovge 0x90909090(%eax),%edx,%r8d
> > +cmovle 0x90909090(%eax),%edx,%r8d
> > +cmovg 0x90909090(%eax),%edx,%r8d
> > +imul 0x90909(%eax),%edx,%r8d
> > +imul 0x909(%rax,%r31,8),%rdx,%r25
>
> What about imul by immediate? The present spec is quite unclear there:
> The insn page says {ND=ZU} and the table says 0/1 in the ND column.
>
We don't support it yet, I put it in RFC.
...
2. Support APX ZU -- In progress
3. Support APX CCMP and CTEST -- In progress
...
About 0/1 in the ND column, it means ZU can be 0/1.
IMUL with opcodes 0x69 and 0x6B in map 0 and SETcc instructions
Although these instructions do not support NDD, the EVEX.ND bit is used to control whether its
destination register has its upper bits (namely, bits [63:OSIZE]) zeroed when OSIZE is 8b or 16b.
That is, if EVEX.ND = 1, the upper bits are always zeroed; otherwise, they keep the old values
when OSIZE is 8b or 16b. For these instructions, EVEX.[V4,V3,V2,V1,V0] must be all zero.
> > +.byte 0x62,0xf4,0xfc,0x08,0xff,0xc0 #inc %rax .byte
> > +0x62,0xf4,0xec,0x08,0xff,0xc0 #bad
>
> As before, please avoid .byte whenever possible. And please have a more
> detailed comment as to what is being encoded, when .byte cannot be avoided.
> Plus, if at all possible, have "bad" tests live in separate testcases from "good"
> ones.
>
This case wants to test that inc supports evex format without GPR32, patch part II 1/6 will cover it. The first case has been removed and the second case has been added to x86-64-apx-evex-promoted-bad.s.
> > --- a/opcodes/i386-dis-evex-prefix.h
> > +++ b/opcodes/i386-dis-evex-prefix.h
>
> Once again I'll reply to disassembler changes separately.
>
Ok.
> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -960,6 +960,7 @@ typedef struct insn_template
> > /* The next value is arbitrary, as long as it's non-zero and distinct
> > from all other values above. */
> > #define Opcode_VexW 0xf /* Operand order controlled by VEX.W. */
> > +#define Opcode_APX_NDDD 0x11 /* Direction bit for APX NDD insns. */
>
> The comment talks of a single bit, but the value has two bits set.
> Plus in the code you also don't use this constant as described by the
> comment. Aiui like for Opcode_VexW the value is really arbitrary, just as long
> as it's different from others. In which case I'd rather suggest using e.g. 0xe (if,
> unlike suggested above, Opcode_D cannot be re-used).
>
> Also I don't think there's a need for three D-s in the name.
>
Deleted Opcode_APX_NDDD.
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
>
> Comments given on the earlier patch apply here (and elsewhere) as well.
>
> Jan
next prev parent reply other threads:[~2023-10-22 14:05 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-19 15:25 [PATCH 0/8] [RFC] Support Intel APX EGPR Cui, Lili
2023-09-19 15:25 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-09-21 15:27 ` Jan Beulich
2023-09-27 15:57 ` Cui, Lili
2023-09-21 15:51 ` Jan Beulich
2023-09-27 15:59 ` Cui, Lili
2023-09-28 8:02 ` Jan Beulich
2023-10-07 3:27 ` Cui, Lili
2023-09-19 15:25 ` [PATCH 2/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-09-22 10:12 ` Jan Beulich
2023-10-17 15:48 ` Cui, Lili
2023-10-18 6:40 ` Jan Beulich
2023-10-18 10:44 ` Cui, Lili
2023-10-18 10:50 ` Jan Beulich
2023-09-22 10:50 ` Jan Beulich
2023-10-17 15:50 ` Cui, Lili
2023-10-17 16:11 ` Jan Beulich
2023-10-18 2:02 ` Cui, Lili
2023-10-18 6:10 ` Jan Beulich
2023-09-25 6:03 ` Jan Beulich
2023-10-17 15:52 ` Cui, Lili
2023-10-17 16:12 ` Jan Beulich
2023-10-18 6:31 ` Cui, Lili
2023-10-18 6:47 ` Jan Beulich
2023-10-18 7:52 ` Cui, Lili
2023-10-18 8:21 ` Jan Beulich
2023-10-18 11:30 ` Cui, Lili
2023-10-19 11:58 ` Cui, Lili
2023-10-19 15:24 ` Jan Beulich
2023-10-19 16:38 ` Cui, Lili
2023-10-20 6:25 ` Jan Beulich
2023-10-22 14:33 ` Cui, Lili
2023-09-19 15:25 ` [PATCH 3/8] Add tests for " Cui, Lili
2023-09-27 13:11 ` Jan Beulich
2023-10-17 15:53 ` FW: " Cui, Lili
2023-10-17 16:19 ` Jan Beulich
2023-10-18 2:32 ` Cui, Lili
2023-10-18 6:05 ` Jan Beulich
2023-10-18 7:16 ` Cui, Lili
2023-10-18 8:05 ` Jan Beulich
2023-10-18 11:26 ` Cui, Lili
2023-10-18 12:06 ` Jan Beulich
2023-10-25 16:03 ` Cui, Lili
2023-09-27 13:19 ` Jan Beulich
2023-09-19 15:25 ` [PATCH 4/8] Support APX NDD Cui, Lili
2023-09-27 14:44 ` Jan Beulich
2023-10-22 14:05 ` Cui, Lili [this message]
2023-10-23 7:12 ` Jan Beulich
2023-10-25 8:10 ` Cui, Lili
2023-10-25 8:47 ` Jan Beulich
2023-10-25 15:49 ` Cui, Lili
2023-10-25 15:59 ` Jan Beulich
2023-09-28 7:57 ` Jan Beulich
2023-10-22 14:57 ` Cui, Lili
2023-10-24 11:39 ` Cui, Lili
2023-10-24 11:58 ` Jan Beulich
2023-10-25 15:29 ` Cui, Lili
2023-09-19 15:25 ` [PATCH 5/8] Support APX NDD optimized encoding Cui, Lili
2023-09-28 9:29 ` Jan Beulich
2023-10-23 2:57 ` Hu, Lin1
2023-10-23 7:23 ` Jan Beulich
2023-10-23 7:50 ` Hu, Lin1
2023-10-23 8:15 ` Jan Beulich
2023-10-24 1:40 ` Hu, Lin1
2023-10-24 6:03 ` Jan Beulich
2023-10-24 6:08 ` Hu, Lin1
2023-10-23 3:07 ` [PATCH-V2] " Hu, Lin1
2023-10-23 3:30 ` [PATCH 5/8] [v2] " Hu, Lin1
2023-10-23 7:26 ` Jan Beulich
2023-09-19 15:25 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-09-28 11:37 ` Jan Beulich
2023-10-30 15:21 ` Cui, Lili
2023-10-30 15:31 ` Jan Beulich
2023-11-20 13:05 ` Cui, Lili
2023-09-19 15:25 ` [PATCH 7/8] Support APX NF Cui, Lili
2023-09-25 6:07 ` Jan Beulich
2023-09-28 12:42 ` Jan Beulich
2023-11-02 10:15 ` Cui, Lili
2023-11-02 10:23 ` Jan Beulich
2023-11-02 10:46 ` Cui, Lili
2023-12-12 2:59 ` H.J. Lu
2023-09-19 15:25 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-09-28 13:11 ` Jan Beulich
2023-11-02 2:32 ` Hu, Lin1
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SJ0PR11MB5600EAA8556B44606A5908AB9ED9A@SJ0PR11MB5600.namprd11.prod.outlook.com \
--to=lili.cui@intel.com \
--cc=JBeulich@suse.com \
--cc=binutils@sourceware.org \
--cc=hongjiu.lu@intel.com \
--cc=lingling.kong@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).