Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: hongjiu.lu@intel.com, binutils@sourceware.org
Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
Date: Thu, 7 Dec 2023 13:38:54 +0100	[thread overview]
Message-ID: <546c8890-0526-49a3-8310-319358bf55c2@suse.com> (raw)
In-Reply-To: <20231124070213.3886483-4-lili.cui@intel.com>

On 24.11.2023 08:02, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -409,6 +409,9 @@ struct _i386_insn
>      /* Compressed disp8*N attribute.  */
>      unsigned int memshift;
>  
> +    /* No CSPAZO flags update.*/
> +    bool has_nf;

As before I don't see the point in adding this field when it's not used
in the change. Note that this is unrelated to the introduction of the NF
attribute right here, which has a reason.

> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>  
>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> -  {
> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> -	   || maybe_cpu (t, CpuFMA))
> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> +    {
> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>  	{
>  	  if (need_evex_encoding ())

There are several issues here:
- Why did you need to change (to the worse) the original code?
- Why did you not model the addition after that original code?
- How come APX_F (CpuAVX512*) constructs appear here, when no AVX512 insn
  can be VEX-encoded?
- If these new macros are really needed for whatever reason, they shouldn't
  be added to opcodes/i386-opc.h when they're useful only in the assembler.
- Style requires a blank before the opening parenthesis in function
  invocations (which also covers function-like macro invocations).

I think I asked before: How is it that you get away without altering
cpu_flags_match(), containing related and quite similar logic?

> @@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || t->opcode_modifier.evex;
>  }
>  
> +static INLINE bool
> +is_apx_evex_encoding (void)
> +{
> +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> +    || (i.vex.register_specifier
> +	&& i.vex.register_specifier->reg_flags & RegRex2);
> +}

If you want this to be a function despite being used just once, you'll
need to add a comment mentioning the constraint when calling it (or
else the use of i.rex2 in particular is confusing). I'm sure I commented
on this before, and I thought such a comment had already appeared.

> @@ -5655,17 +5693,17 @@ md_assemble (char *line)
>       instruction already has a prefix, we need to convert old
>       registers to new ones.  */
>  
> -  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> -       && (i.op[0].regs->reg_flags & RegRex64) != 0)
> -      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> -	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> -      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> -	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && (i.rex != 0 || i.rex2 != 0)))
> +  if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> +	&& (i.op[0].regs->reg_flags & RegRex64) != 0)
> +       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> +	   && (i.op[1].regs->reg_flags & RegRex64) != 0)
> +       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> +	    || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> +	   && (i.rex != 0 || i.rex2 != 0))))

I'm having trouble spotting the change here: There's an outer pair of
parentheses being added, but that's for no reason unless there's another
change well hidden. Please clarify.

>      {
>        int x;
>  
> -      if (!i.rex2)
> +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
>  	i.rex |= REX_OPCODE;

Why the change to is_apx_rex2_encoding()? If that's wanted / needed
here, shouldn't that be put in place by the earlier patch?

> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry *r)
>        if (!cpu_arch_flags.bitfield.cpuapx_f
>  	  || flag_code != CODE_64BIT)
>  	return false;
> +
> +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
> +	 For the later install_template function.  */
> +      if (current_templates->start->opcode_modifier.vex
> +	  && current_templates->start->opcode_modifier.evex)
> +	i.vec_encoding = vex_encoding_evex;

I'm afraid I don't understand the 2nd sentence of the comment. This may
be related to my question regarding cpu_flags_match() further up.

The first sentence isn't quite correct either - you don't mark any
template here (and you can't, because we don't even know yet which
template we're going to use).

Finally - do you really need the .evex check here? (I won't exclude
that this yields a better diagnostic in certain cases, but this wants
clarifying if so.)

> --- a/gas/testsuite/gas/i386/x86-64.exp
> +++ b/gas/testsuite/gas/i386/x86-64.exp
> @@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
>  run_dump_test "x86-64-movbe"
>  run_dump_test "x86-64-movbe-intel"
>  run_dump_test "x86-64-movbe-suffix"
> -run_list_test "x86-64-inval-movbe" "-al"
> +run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -al"

I can see why you add the -march=, as we've been through this before.
But why the -I ?

> @@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
>  <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>                        load:Load:0, store:Store:0, +
>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> +                      rex:REX:x64, rex2:REX2:APX_F, nooptimize:NoOptimize:0>

This change wants to go into the earlier patch?

> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
>  
>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }

Seeing these: Are there any Map4 encodings which aren't EVex128? If not (and
if you're also not hiddenly aware of some appearing in the near future),
please consider making EVexMap4 include this right away. Even if in the
longer run other encodings appear, it'll then be easy to simply replace all
the EVexMap4 uses in a purely mechanical way. Until then shorter template
lines are preferable.

> @@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {}
>  xsetbv, 0xf01d1, Xsave, NoSuf, {}
>  
>  // xsaveopt
> -
>  xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
>  xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }

Iirc the earlier patch added that blank line. Why would you do such back
and forth?

> @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
>  
>  // BMI2 instructions.
>  
> -bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
> -sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> +bzhi, 0xf5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }

Hmm, I had specifically suggested a pre-processor macro to use in place of the
open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here and
below)?

Jan

next prev parent reply	other threads:[~2023-12-07 12:38 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30   ` Jan Beulich
2023-12-05 13:31     ` Cui, Lili
2023-12-06  7:52       ` Jan Beulich
2023-12-06 12:43         ` Cui, Lili
2023-12-07  9:01           ` Jan Beulich
2023-12-08  3:10             ` Cui, Lili
2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38   ` Jan Beulich [this message]
2023-12-08 15:21     ` Cui, Lili
2023-12-11  8:34       ` Jan Beulich
2023-12-12 10:44         ` Cui, Lili
2023-12-12 11:16           ` Jan Beulich
2023-12-12 12:32             ` Cui, Lili
2023-12-12 12:39               ` Jan Beulich
2023-12-12 13:15                 ` Cui, Lili
2023-12-12 14:13                   ` Jan Beulich
2023-12-13  7:36                     ` Cui, Lili
2023-12-13  7:48                       ` Jan Beulich
2023-12-12 12:58         ` Cui, Lili
2023-12-12 14:04           ` Jan Beulich
2023-12-13  8:35             ` Cui, Lili
2023-12-13  9:13               ` Jan Beulich
2023-12-07 13:34   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:43       ` Jan Beulich
2023-12-11 11:50   ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:55       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12   ` Jan Beulich
2023-12-11 13:36     ` Cui, Lili
2023-12-11 16:50       ` Jan Beulich
2023-12-13 10:42         ` Cui, Lili
2024-03-22 10:02     ` Jan Beulich
2024-03-22 10:31       ` Jan Beulich
2024-03-26  2:04         ` Cui, Lili
2024-03-26  7:06           ` Jan Beulich
2024-03-26  7:18             ` Cui, Lili
2024-03-22 10:59       ` Jan Beulich
2024-03-26  8:22         ` Cui, Lili
2024-03-26  9:30           ` Jan Beulich
2024-03-27  2:41             ` Cui, Lili
2023-12-08 14:27   ` Jan Beulich
2023-12-12  5:53     ` Cui, Lili
2023-12-12  8:28       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17   ` Jan Beulich
2023-12-15  8:38     ` Cui, Lili
2023-12-15  8:44       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27   ` Jan Beulich
2023-12-12  3:18     ` Hu, Lin1
2023-12-12  8:41       ` Jan Beulich
2023-12-13  5:31         ` Hu, Lin1
2023-12-12  8:45       ` Jan Beulich
2023-12-13  6:06         ` Hu, Lin1
2023-12-13  8:19           ` Jan Beulich
2023-12-13  8:34             ` Hu, Lin1
2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22   ` Cui, Lili
2023-11-24 12:14     ` Jan Beulich
2023-12-12  2:57 ` Lu, Hongjiu
2023-12-12  8:16 ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546c8890-0526-49a3-8310-319358bf55c2@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    --cc=lili.cui@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).