RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
Date: Fri, 8 Dec 2023 15:21:14 +0000	[thread overview]
Message-ID: <SJ0PR11MB56007B278C5F7522A3F288839E8AA@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <546c8890-0526-49a3-8310-319358bf55c2@suse.com>



> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, December 7, 2023 8:39 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
> 
> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -409,6 +409,9 @@ struct _i386_insn
> >      /* Compressed disp8*N attribute.  */
> >      unsigned int memshift;
> >
> > +    /* No CSPAZO flags update.*/
> > +    bool has_nf;
> 
> As before I don't see the point in adding this field when it's not used in the
> change. Note that this is unrelated to the introduction of the NF attribute right
> here, which has a reason.
> 

Moved.

> > @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
> >
> >    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
> >    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> > -  {
> > -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> > -	   || maybe_cpu (t, CpuFMA))
> > -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> > +    {
> > +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> > +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> > +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> > +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >  	{
> >  	  if (need_evex_encoding ())
> 
> There are several issues here:
> - Why did you need to change (to the worse) the original code?
> - Why did you not model the addition after that original code?
> - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512
> insn can be VEX-encoded?

 I don't understand what you mean, we have this combination.

kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }

> - If these new macros are really needed for whatever reason, they shouldn't
>   be added to opcodes/i386-opc.h when they're useful only in the assembler.
> - Style requires a blank before the opening parenthesis in function
>   invocations (which also covers function-like macro invocations).
> 
> I think I asked before: How is it that you get away without altering
> cpu_flags_match(), containing related and quite similar logic?
> 

For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket and the content in the following brackets can be combined arbitrarily. I think it is Inaccurate. So I give examples one by one for each identified combination.

Just found cpu_flags_match() has similar logic, I think the following is the only code related to CPUID alerts, but none of our combinations are related to cpuavx.

          if (all.bitfield.cpuavx)
            {
              /* We need to check SSE2AVX with AVX.  */
              if (!t->opcode_modifier.sse2avx
                  || (sse2avx && !i.prefix[DATA_PREFIX]))
                match |= CPU_FLAGS_ARCH_MATCH;
            }

> > @@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t)
> >    return t->opcode_modifier.vex || t->opcode_modifier.evex;  }
> >
> > +static INLINE bool
> > +is_apx_evex_encoding (void)
> > +{
> > +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> > +    || (i.vex.register_specifier
> > +	&& i.vex.register_specifier->reg_flags & RegRex2); }
> 
> If you want this to be a function despite being used just once, you'll need to
> add a comment mentioning the constraint when calling it (or else the use of
> i.rex2 in particular is confusing). I'm sure I commented on this before, and I
> thought such a comment had already appeared.
> 

I also have the impression that it was added, anyway I will add it.

+/* We can use this function only when the current encoding is evex.  */
 static INLINE bool
 is_apx_evex_encoding (void)
 {

> > @@ -5655,17 +5693,17 @@ md_assemble (char *line)
> >       instruction already has a prefix, we need to convert old
> >       registers to new ones.  */
> >
> > -  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > -       && (i.op[0].regs->reg_flags & RegRex64) != 0)
> > -      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > -	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > -      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > -	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && (i.rex != 0 || i.rex2 != 0)))
> > +  if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > +	&& (i.op[0].regs->reg_flags & RegRex64) != 0)
> > +       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > +	   && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > +       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > +	    || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > +	   && (i.rex != 0 || i.rex2 != 0))))
> 
> I'm having trouble spotting the change here: There's an outer pair of
> parentheses being added, but that's for no reason unless there's another
> change well hidden. Please clarify.
> 

Removed.

> >      {
> >        int x;
> >
> > -      if (!i.rex2)
> > +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
> >  	i.rex |= REX_OPCODE;
> 
> Why the change to is_apx_rex2_encoding()? If that's wanted / needed here,
> shouldn't that be put in place by the earlier patch?
>

Moved to the corresponding patch.

> > @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
> *r)
> >        if (!cpu_arch_flags.bitfield.cpuapx_f
> >  	  || flag_code != CODE_64BIT)
> >  	return false;
> > +
> > +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as
> EVEX.
> > +	 For the later install_template function.  */
> > +      if (current_templates->start->opcode_modifier.vex
> > +	  && current_templates->start->opcode_modifier.evex)
> > +	i.vec_encoding = vex_encoding_evex;
> 
> I'm afraid I don't understand the 2nd sentence of the comment. This may be
> related to my question regarding cpu_flags_match() further up.
> 
> The first sentence isn't quite correct either - you don't mark any template here
> (and you can't, because we don't even know yet which template we're going
> to use).
> 
> Finally - do you really need the .evex check here? (I won't exclude that this
> yields a better diagnostic in certain cases, but this wants clarifying if so.)
> 

If you look at install_template(), you'll see that before this function we need to know if the current encoding is evex. We need to check opcode_modifier.evex here, it is a fix for issues caused by the merge of VEX and EVEX.
  if (t->opcode_modifier.vex && t->opcode_modifier.evex)
    {
      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
          || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
          || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
          || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
        {
          if (need_evex_encoding ())
            {

> > --- a/gas/testsuite/gas/i386/x86-64.exp
> > +++ b/gas/testsuite/gas/i386/x86-64.exp
> > @@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
> >  run_dump_test "x86-64-movbe"
> >  run_dump_test "x86-64-movbe-intel"
> >  run_dump_test "x86-64-movbe-suffix"
> > -run_list_test "x86-64-inval-movbe" "-al"
> > +run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -
> al"
> 
> I can see why you add the -march=, as we've been through this before.
> But why the -I ?
> 

Removed, It is redundant.

> > @@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
> > <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >                        load:Load:0, store:Store:0, +
> >                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> > -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> > +                      rex:REX:x64, rex2:REX2:APX_F,
> > + nooptimize:NoOptimize:0>
> 
> This change wants to go into the earlier patch?
> 

Done.

> > @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
> >
> >  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> >  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> 
> Seeing these: Are there any Map4 encodings which aren't EVex128? If not
> (and if you're also not hiddenly aware of some appearing in the near future),
> please consider making EVexMap4 include this right away. Even if in the longer
> run other encodings appear, it'll then be easy to simply replace all the
> EVexMap4 uses in a purely mechanical way. Until then shorter template lines
> are preferable.
> 

Would you mind defining it this way? Since #define EVex128 is behind it. Considering that you don't like unnecessary changes.

+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128

> > @@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {}  xsetbv,
> > 0xf01d1, Xsave, NoSuf, {}
> >
> >  // xsaveopt
> > -
> >  xsaveopt, 0xfae/6, Xsaveopt,
> > Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr,
> { Unspecified|BaseIndex
> > }  xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, {
> > Unspecified|BaseIndex }
> 
> Iirc the earlier patch added that blank line. Why would you do such back and
> forth?
> 

Done.

> > @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
> >
> >  // BMI2 instructions.
> >
> > -bzhi, 0xf5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -mulx, 0xf2f6, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -pdep, 0xf2f5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -pext, 0xf3f5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -rorx, 0xf2f0, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSu
> f, {
> > Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex,
> Reg32|Reg64
> > } -sarx, 0xf3f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -shlx, 0x66f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -shrx, 0xf2f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 }
> > +bzhi, 0xf5, BMI2&(BMI2|APX_F),
> >
> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
> ources|N
> > +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
> > +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> 
> Hmm, I had specifically suggested a pre-processor macro to use in place of the
> open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here
> and below)?
> 

There are many different types of combinations, and each combination appears relatively few times, so I think adding a #define for each combination feels a bit wasteful.

Thanks,
Lili.

next prev parent reply	other threads:[~2023-12-08 15:21 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30   ` Jan Beulich
2023-12-05 13:31     ` Cui, Lili
2023-12-06  7:52       ` Jan Beulich
2023-12-06 12:43         ` Cui, Lili
2023-12-07  9:01           ` Jan Beulich
2023-12-08  3:10             ` Cui, Lili
2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38   ` Jan Beulich
2023-12-08 15:21     ` Cui, Lili [this message]
2023-12-11  8:34       ` Jan Beulich
2023-12-12 10:44         ` Cui, Lili
2023-12-12 11:16           ` Jan Beulich
2023-12-12 12:32             ` Cui, Lili
2023-12-12 12:39               ` Jan Beulich
2023-12-12 13:15                 ` Cui, Lili
2023-12-12 14:13                   ` Jan Beulich
2023-12-13  7:36                     ` Cui, Lili
2023-12-13  7:48                       ` Jan Beulich
2023-12-12 12:58         ` Cui, Lili
2023-12-12 14:04           ` Jan Beulich
2023-12-13  8:35             ` Cui, Lili
2023-12-13  9:13               ` Jan Beulich
2023-12-07 13:34   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:43       ` Jan Beulich
2023-12-11 11:50   ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:55       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12   ` Jan Beulich
2023-12-11 13:36     ` Cui, Lili
2023-12-11 16:50       ` Jan Beulich
2023-12-13 10:42         ` Cui, Lili
2024-03-22 10:02     ` Jan Beulich
2024-03-22 10:31       ` Jan Beulich
2024-03-26  2:04         ` Cui, Lili
2024-03-26  7:06           ` Jan Beulich
2024-03-26  7:18             ` Cui, Lili
2024-03-22 10:59       ` Jan Beulich
2024-03-26  8:22         ` Cui, Lili
2024-03-26  9:30           ` Jan Beulich
2024-03-27  2:41             ` Cui, Lili
2023-12-08 14:27   ` Jan Beulich
2023-12-12  5:53     ` Cui, Lili
2023-12-12  8:28       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17   ` Jan Beulich
2023-12-15  8:38     ` Cui, Lili
2023-12-15  8:44       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27   ` Jan Beulich
2023-12-12  3:18     ` Hu, Lin1
2023-12-12  8:41       ` Jan Beulich
2023-12-13  5:31         ` Hu, Lin1
2023-12-12  8:45       ` Jan Beulich
2023-12-13  6:06         ` Hu, Lin1
2023-12-13  8:19           ` Jan Beulich
2023-12-13  8:34             ` Hu, Lin1
2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22   ` Cui, Lili
2023-11-24 12:14     ` Jan Beulich
2023-12-12  2:57 ` Lu, Hongjiu
2023-12-12  8:16 ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB56007B278C5F7522A3F288839E8AA@SJ0PR11MB5600.namprd11.prod.outlook.com \
    --to=lili.cui@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).