From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
Date: Fri, 8 Dec 2023 15:21:14 +0000 [thread overview]
Message-ID: <SJ0PR11MB56007B278C5F7522A3F288839E8AA@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <546c8890-0526-49a3-8310-319358bf55c2@suse.com>
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, December 7, 2023 8:39 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
>
> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -409,6 +409,9 @@ struct _i386_insn
> > /* Compressed disp8*N attribute. */
> > unsigned int memshift;
> >
> > + /* No CSPAZO flags update.*/
> > + bool has_nf;
>
> As before I don't see the point in adding this field when it's not used in the
> change. Note that this is unrelated to the introduction of the NF attribute right
> here, which has a reason.
>
Moved.
> > @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
> >
> > /* Dual VEX/EVEX templates need stripping one of the possible variants. */
> > if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> > - {
> > - if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> > - || maybe_cpu (t, CpuFMA))
> > - && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> > + {
> > + if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> > + || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> > + || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> > + || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> > {
> > if (need_evex_encoding ())
>
> There are several issues here:
> - Why did you need to change (to the worse) the original code?
> - Why did you not model the addition after that original code?
> - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512
> insn can be VEX-encoded?
I don't understand what you mean, we have this combination.
kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> - If these new macros are really needed for whatever reason, they shouldn't
> be added to opcodes/i386-opc.h when they're useful only in the assembler.
> - Style requires a blank before the opening parenthesis in function
> invocations (which also covers function-like macro invocations).
>
> I think I asked before: How is it that you get away without altering
> cpu_flags_match(), containing related and quite similar logic?
>
For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket and the content in the following brackets can be combined arbitrarily. I think it is Inaccurate. So I give examples one by one for each identified combination.
Just found cpu_flags_match() has similar logic, I think the following is the only code related to CPUID alerts, but none of our combinations are related to cpuavx.
if (all.bitfield.cpuavx)
{
/* We need to check SSE2AVX with AVX. */
if (!t->opcode_modifier.sse2avx
|| (sse2avx && !i.prefix[DATA_PREFIX]))
match |= CPU_FLAGS_ARCH_MATCH;
}
> > @@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t)
> > return t->opcode_modifier.vex || t->opcode_modifier.evex; }
> >
> > +static INLINE bool
> > +is_apx_evex_encoding (void)
> > +{
> > + return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> > + || (i.vex.register_specifier
> > + && i.vex.register_specifier->reg_flags & RegRex2); }
>
> If you want this to be a function despite being used just once, you'll need to
> add a comment mentioning the constraint when calling it (or else the use of
> i.rex2 in particular is confusing). I'm sure I commented on this before, and I
> thought such a comment had already appeared.
>
I also have the impression that it was added, anyway I will add it.
+/* We can use this function only when the current encoding is evex. */
static INLINE bool
is_apx_evex_encoding (void)
{
> > @@ -5655,17 +5693,17 @@ md_assemble (char *line)
> > instruction already has a prefix, we need to convert old
> > registers to new ones. */
> >
> > - if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > - && (i.op[0].regs->reg_flags & RegRex64) != 0)
> > - || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > - && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > - || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > - || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > - && (i.rex != 0 || i.rex2 != 0)))
> > + if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > + && (i.op[0].regs->reg_flags & RegRex64) != 0)
> > + || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > + && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > + || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > + || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > + && (i.rex != 0 || i.rex2 != 0))))
>
> I'm having trouble spotting the change here: There's an outer pair of
> parentheses being added, but that's for no reason unless there's another
> change well hidden. Please clarify.
>
Removed.
> > {
> > int x;
> >
> > - if (!i.rex2)
> > + if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
> > i.rex |= REX_OPCODE;
>
> Why the change to is_apx_rex2_encoding()? If that's wanted / needed here,
> shouldn't that be put in place by the earlier patch?
>
Moved to the corresponding patch.
> > @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
> *r)
> > if (!cpu_arch_flags.bitfield.cpuapx_f
> > || flag_code != CODE_64BIT)
> > return false;
> > +
> > + /* When using RegRex2, dual VEX/EVEX templates need to be marked as
> EVEX.
> > + For the later install_template function. */
> > + if (current_templates->start->opcode_modifier.vex
> > + && current_templates->start->opcode_modifier.evex)
> > + i.vec_encoding = vex_encoding_evex;
>
> I'm afraid I don't understand the 2nd sentence of the comment. This may be
> related to my question regarding cpu_flags_match() further up.
>
> The first sentence isn't quite correct either - you don't mark any template here
> (and you can't, because we don't even know yet which template we're going
> to use).
>
> Finally - do you really need the .evex check here? (I won't exclude that this
> yields a better diagnostic in certain cases, but this wants clarifying if so.)
>
If you look at install_template(), you'll see that before this function we need to know if the current encoding is evex. We need to check opcode_modifier.evex here, it is a fix for issues caused by the merge of VEX and EVEX.
if (t->opcode_modifier.vex && t->opcode_modifier.evex)
{
if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
|| AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
|| APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
|| APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
{
if (need_evex_encoding ())
{
> > --- a/gas/testsuite/gas/i386/x86-64.exp
> > +++ b/gas/testsuite/gas/i386/x86-64.exp
> > @@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
> > run_dump_test "x86-64-movbe"
> > run_dump_test "x86-64-movbe-intel"
> > run_dump_test "x86-64-movbe-suffix"
> > -run_list_test "x86-64-inval-movbe" "-al"
> > +run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -
> al"
>
> I can see why you add the -march=, as we've been through this before.
> But why the -I ?
>
Removed, It is redundant.
> > @@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
> > <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> > load:Load:0, store:Store:0, +
> > vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> > - rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> > + rex:REX:x64, rex2:REX2:APX_F,
> > + nooptimize:NoOptimize:0>
>
> This change wants to go into the earlier patch?
>
Done.
> > @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
> >
> > invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 } invept, 0x660f3880, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> > invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 } invvpid, 0x660f3881, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
>
> Seeing these: Are there any Map4 encodings which aren't EVex128? If not
> (and if you're also not hiddenly aware of some appearing in the near future),
> please consider making EVexMap4 include this right away. Even if in the longer
> run other encodings appear, it'll then be easy to simply replace all the
> EVexMap4 uses in a purely mechanical way. Until then shorter template lines
> are preferable.
>
Would you mind defining it this way? Since #define EVex128 is behind it. Considering that you don't like unnecessary changes.
+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128
> > @@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {} xsetbv,
> > 0xf01d1, Xsave, NoSuf, {}
> >
> > // xsaveopt
> > -
> > xsaveopt, 0xfae/6, Xsaveopt,
> > Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr,
> { Unspecified|BaseIndex
> > } xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, {
> > Unspecified|BaseIndex }
>
> Iirc the earlier patch added that blank line. Why would you do such back and
> forth?
>
Done.
> > @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
> >
> > // BMI2 instructions.
> >
> > -bzhi, 0xf5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -mulx, 0xf2f6, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -pdep, 0xf2f5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -pext, 0xf3f5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -rorx, 0xf2f0, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSu
> f, {
> > Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex,
> Reg32|Reg64
> > } -sarx, 0xf3f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -shlx, 0x66f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -shrx, 0xf2f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 }
> > +bzhi, 0xf5, BMI2&(BMI2|APX_F),
> >
> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
> ources|N
> > +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
> > +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
>
> Hmm, I had specifically suggested a pre-processor macro to use in place of the
> open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here
> and below)?
>
There are many different types of combinations, and each combination appears relatively few times, so I think adding a #define for each combination feels a bit wasteful.
Thanks,
Lili.
next prev parent reply other threads:[~2023-12-08 15:21 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-24 7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24 7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30 ` Jan Beulich
2023-12-05 13:31 ` Cui, Lili
2023-12-06 7:52 ` Jan Beulich
2023-12-06 12:43 ` Cui, Lili
2023-12-07 9:01 ` Jan Beulich
2023-12-08 3:10 ` Cui, Lili
2023-11-24 7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24 7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38 ` Jan Beulich
2023-12-08 15:21 ` Cui, Lili [this message]
2023-12-11 8:34 ` Jan Beulich
2023-12-12 10:44 ` Cui, Lili
2023-12-12 11:16 ` Jan Beulich
2023-12-12 12:32 ` Cui, Lili
2023-12-12 12:39 ` Jan Beulich
2023-12-12 13:15 ` Cui, Lili
2023-12-12 14:13 ` Jan Beulich
2023-12-13 7:36 ` Cui, Lili
2023-12-13 7:48 ` Jan Beulich
2023-12-12 12:58 ` Cui, Lili
2023-12-12 14:04 ` Jan Beulich
2023-12-13 8:35 ` Cui, Lili
2023-12-13 9:13 ` Jan Beulich
2023-12-07 13:34 ` Jan Beulich
2023-12-11 6:16 ` Cui, Lili
2023-12-11 8:43 ` Jan Beulich
2023-12-11 11:50 ` Jan Beulich
2023-11-24 7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05 ` Jan Beulich
2023-12-11 6:16 ` Cui, Lili
2023-12-11 8:55 ` Jan Beulich
2023-11-24 7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12 ` Jan Beulich
2023-12-11 13:36 ` Cui, Lili
2023-12-11 16:50 ` Jan Beulich
2023-12-13 10:42 ` Cui, Lili
2024-03-22 10:02 ` Jan Beulich
2024-03-22 10:31 ` Jan Beulich
2024-03-26 2:04 ` Cui, Lili
2024-03-26 7:06 ` Jan Beulich
2024-03-26 7:18 ` Cui, Lili
2024-03-22 10:59 ` Jan Beulich
2024-03-26 8:22 ` Cui, Lili
2024-03-26 9:30 ` Jan Beulich
2024-03-27 2:41 ` Cui, Lili
2023-12-08 14:27 ` Jan Beulich
2023-12-12 5:53 ` Cui, Lili
2023-12-12 8:28 ` Jan Beulich
2023-11-24 7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17 ` Jan Beulich
2023-12-15 8:38 ` Cui, Lili
2023-12-15 8:44 ` Jan Beulich
2023-11-24 7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27 ` Jan Beulich
2023-12-12 3:18 ` Hu, Lin1
2023-12-12 8:41 ` Jan Beulich
2023-12-13 5:31 ` Hu, Lin1
2023-12-12 8:45 ` Jan Beulich
2023-12-13 6:06 ` Hu, Lin1
2023-12-13 8:19 ` Jan Beulich
2023-12-13 8:34 ` Hu, Lin1
2023-11-24 7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24 7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22 ` Cui, Lili
2023-11-24 12:14 ` Jan Beulich
2023-12-12 2:57 ` Lu, Hongjiu
2023-12-12 8:16 ` Cui, Lili
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SJ0PR11MB56007B278C5F7522A3F288839E8AA@SJ0PR11MB5600.namprd11.prod.outlook.com \
--to=lili.cui@intel.com \
--cc=JBeulich@suse.com \
--cc=binutils@sourceware.org \
--cc=hongjiu.lu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).