RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"ccoutant@gmail.com" <ccoutant@gmail.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
Date: Mon, 6 Nov 2023 15:20:39 +0000	[thread overview]
Message-ID: <SJ0PR11MB560042CC079FE35596CD34E49EAAA@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <9373c79d-85bb-15bd-4501-7634687d9a8e@suse.com>

> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> 
> On 02.11.2023 12:29, Cui, Lili wrote:
> > @@ -406,6 +409,11 @@ struct _i386_insn
> >      /* Compressed disp8*N attribute.  */
> >      unsigned int memshift;
> >
> > +    /* No CSPAZO flags update.*/
> > +    bool has_nf;
> > +
> > +    bool has_zero_upper;
> > +
> 
> Can both please be introduced when they're needed, not randomly ahead of
> time?
 
Moved has_nf to patch 2/8 and deleted has_zero_upper.

> > @@ -2375,6 +2388,9 @@ register_number (const reg_entry *r)
> >    if (r->reg_flags & RegRex)
> >      nr += 8;
> >
> > +  if (r->reg_flags & RegRex2)
> > +    nr += 16;
> > +
> >    if (r->reg_flags & RegVRex)
> >      nr += 16;
> 
> Perhaps fold to
> 
>     if (r->reg_flags & (RegVRex | RegRex2))
>       nr += 16;
> 
> ? Irrespective an assertion may be worthwhile that both flags aren't set at the
> same time?

Done.

> 
> > @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
> >      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >
> > +/* Build (2 bytes) rex2 prefix.
> > +   | D5h |
> > +   | m | R4 X4 B4 | W R X B |
> > +*/
> > +static void
> > +build_rex2_prefix (void)
> > +{
> > +  i.vex.length = 2;
> > +  i.vex.bytes[0] = 0xd5;
> > +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> > +		    | (i.rex2 << 4) | i.rex);
> > +}
> 
> I may have asked on v1 already: For emitting REX we don't resort to (ab)using
> i.vex. Is that really necessary? (If so, a comment next to the field declaration
> may be warranted.)
> 
Added comment for it.

  /* For the W R X B bits, the variables of rex prefix will be reused.  */
  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
                    | (i.rex2 << 4) | i.rex);

> Speaking of v1: Can you please make sure you have correct version tags on
> submissions of updated patch versions?
> 
I used git to send all the patches at once( git send-email  --cover-letter --annotate  --to="..." -8), which only has the opportunity to change the version of the cover letter patch. To change the version of each patch, I can send them one by one next time. By the way, do you have a better way? Or how did you modify them? Thanks.

> > @@ -4423,12 +4460,16 @@ optimize_encoding (void)
> >  	  i.suffix = 0;
> >  	  /* Convert to byte registers.  */
> >  	  if (i.types[1].bitfield.word)
> > -	    j = 16;
> > -	  else if (i.types[1].bitfield.dword)
> > +	    /* There are 32 8-bit registers.  */
> 
> Please make sure comments are actually correct. With your additions there
> are 40 8-bit registers; prior to that there were 24. The j += 8 further down deal
> with that difference, and the comment here (if one is to be added) wants to
> tell the full truth.
> 

Done.

> > @@ -5278,6 +5319,9 @@ md_assemble (char *line)
> >  	case register_type_mismatch:
> >  	  err_msg = _("register type mismatch");
> >  	  break;
> > +	case register_type_of_address_mismatch:
> > +	  err_msg = _("register type of address mismatch");
> > +	  break;
> 
> I have a concern with wording / naming here: If I saw this in an error message,
> I wouldn't know what is meant. Maybe something along the lines of "cannot
> use an extended GPR for addressing"? And then the enumerator suitabley
> renamed as well?
> 
 Changed to  

+       case unsupported_EGPR_for_addressing:
+         err_msg = _("unsupported EGPR for addressing");
+         break;

> > @@ -5578,7 +5625,7 @@ md_assemble (char *line)
> >        as_warn (_("translating to `%sp'"), insn_name (&i.tm));
> >      }
> >
> > -  if (is_any_vex_encoding (&i.tm))
> > + if (is_any_vex_encoding (&i.tm))
> >      {
> 
> Stray change, breaking indentation?
> 

Done.

> > @@ -5594,6 +5641,13 @@ md_assemble (char *line)
> >  	  return;
> >  	}
> >
> > +      /* Check for explicit REX2 prefix.  */
> > +      if (i.rex2 || i.rex2_encoding)
> 
> This open-codes is_any_apx_rex2_encoding(). But read on.
> 
> > +	{
> > +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> 
> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is what case
> the i.rex2 check above is intended to cover. Error message comment, and
> condition want to reflect that.
> 

Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase for it.

        {rex} vmovaps %xmm7,%xmm2
        {rex} vmovaps %xmm17,%xmm2
        {rex} rorx $7,%eax,%ebx
+       {rex2} vmovaps %xmm7,%xmm2

> > @@ -5633,11 +5687,11 @@ md_assemble (char *line)
> >  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> >        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> >  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && i.rex != 0))
> > +	  && (i.rex != 0 || i.rex2 != 0)))
> >      {
> >        int x;
> > -
> > -      i.rex |= REX_OPCODE;
> 
> Please don't remove blank lines like this.
> 

Done.

> > @@ -5647,9 +5701,11 @@ md_assemble (char *line)
> >  	      gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> >  	      /* In case it is "hi" register, give up.  */
> >  	      if (i.op[x].regs->reg_num > 3)
> > -		as_bad (_("can't encode register '%s%s' in an "
> > -			  "instruction requiring REX prefix."),
> > -			register_prefix, i.op[x].regs->reg_name);
> > +		{
> > +		  as_bad (_("can't encode register '%s%s' in an "
> > +			    "instruction requiring REX/REX2 prefix."),
> > +			  register_prefix, i.op[x].regs->reg_name);
> > +		}
> 
> There's no need to introduce braces here. Without doing so this will also be
> less of a change.
> 

Done.

> > @@ -6989,6 +7056,44 @@ VEX_check_encoding (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if Egprs operands are valid for the instruction.  */
> > +
> > +static int
> > +check_EgprOperands (const insn_template *t) {
> > +  if (t->opcode_modifier.noegpr)
> > +    {
> 
> This scope effectively covers the entire function. Did you consider
> 
>   if (!t->opcode_modifier.noegpr)
>     return 0;
> 
> to aid readability?
> 

Done.

> > +      for (unsigned int op = 0; op < i.operands; op++)
> > +	{
> > +	  if (i.types[op].bitfield.class != Reg
> > +	      /* Special case for (%dx) while doing input/output op */
> > +	      || i.input_output_operand)
> 
> Why is this needed? The register table entry for %dx ...
> 
> > +	    continue;
> > +
> > +	  if (i.op[op].regs->reg_flags & RegRex2)
> 
> ... doesn't have this bit set anyway.
> 

For this special case i.op is empty, we need continue, otherwise r i.op[op].regs->reg_flags  will cause segment fault.

> > +	    {
> > +	      i.error = register_type_mismatch;
> > +	      return 1;
> > +	    }
> > +	}
> > +
> > +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> > +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> > +	{
> > +	  i.error = register_type_of_address_mismatch;
> > +	  return 1;
> > +	}
> > +
> > +      /* Check pseudo prefix {rex2} are valid.  */
> > +      if (i.rex2_encoding)
> > +	{
> > +	  i.error = invalid_pseudo_prefix;
> > +	  return 1;
> > +	}
> 
> Further up in md_assemble() {rex} or {rex2} is simply ignored when wrong to
> apply. Why would an inapplicable {rex2} be treated as an error here? This
> would then also ...
> 
> > @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
> >        /* Do not verify operands when there are none.  */
> >        if (!t->operands)
> >  	{
> > -	  if (VEX_check_encoding (t))
> > +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> >  	    {
> >  	      specific_error = progress (i.error);
> >  	      continue;
> 
> ... eliminate the need for this change, which is kind of bogus anyway:
> There are no operands here, so calling a function of the given name is at least
> suspicious.
> 

We have these tests and I'm confused whether to remove them or not.

+       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
+       {rex2} wrmsr
+       {rex2} rdtsc
+       {rex2} rdmsr
+       {rex2} sysenter
+       {rex2} sysexitl
+       {rex2} rdpmc

> > @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry *r)
> >  	i.vec_encoding = vex_encoding_error;
> >      }
> >
> > +  if (r->reg_flags & RegRex2)
> > +    {
> > +      if (!cpu_arch_flags.bitfield.cpuapx_f
> > +	  || flag_code != CODE_64BIT)
> > +	return false;
> > +    }
> 
> Please fold the two if()s into one (unless of course you know that the outer
> one is going to be extended in a subsequent patch).
> 

Yes, other code will be added in the outer if with patch2/8.

> > --- a/gas/doc/c-i386.texi
> > +++ b/gas/doc/c-i386.texi
> > @@ -216,6 +216,7 @@ accept various extension mnemonics.  For example,
> > @code{avx10.1/512},  @code{avx10.1/256},  @code{avx10.1/128},
> > +@code{apx},
> >  @code{amx_int8},
> >  @code{amx_bf16},
> >  @code{amx_fp16},
> > @@ -1662,7 +1663,7 @@ supported on the CPU specified.  The choices for
> @var{cpu_type} are:
> >  @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab
> > @samp{.cx16}  @item @samp{.padlock} @tab @samp{.clzero} @tab
> > @samp{.mwaitx} @tab @samp{.rdpru}  @item @samp{.mcommit} @tab
> > @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb} -@item
> > @samp{.tlbsync}
> > +@item @samp{.tlbsync} @tab @samp{.apx}
> >  @end multitable
> 
> DYM apx_f in both cases?
> 
> Also don't you need to also mention {rex2} somewhere in this file?
> 

Done.

> > --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> > +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> > @@ -11,11 +11,11 @@ Disassembly of section .text:
> >  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >
> >  0+1 <aad0>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >
> >  0+3 <aad1>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >
> >  0+5 <aam0>:
> > --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> > +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> > @@ -11,11 +11,11 @@ Disassembly of section .text:
> >  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >
> >  0+1 <aad0>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >
> >  0+3 <aad1>:
> > -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> > +[ 	]*[a-f0-9]+:	d5                   	rex2
> >  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >
> >  0+5 <aam0>:
> 
> These expectations match the ones of the same test in the parent directory.
> Hence instead of adjusting each in both places, please have the ones here
> reference the parent directory files.
> 

They are used to test illegal opcodes for x86-64. Since D5 now makes sense, these two test cases were removed.

# All the followings are illegal opcodes for x86-64.
aad0:
        aad
aad1:
        aad $2

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> 
> As before I'll look at the disassembler changes separately. This patch is simply
> too big.
> 

Ok

> > @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >    return elem_size;
> >  }
> >
> > +static bool
> > +if_entry_needs_special_handle (const unsigned long long opcode, unsigned
> int space,
> > +			       const char *cpu_flags)
> > +{
> > +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> > +#UD.  */
> > +  if (strcmp (cpu_flags, "XSAVES") >= 0
> > +      || strcmp (cpu_flags, "XSAVEC") >= 0
> > +      || strcmp (cpu_flags, "Xsave") >= 0
> > +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> 
> Upon further thought for these (and maybe even ...
> 
> > +      || !strcmp (cpu_flags, "3dnow")
> > +      || !strcmp (cpu_flags, "3dnowA"))
> 
> ... for these, but see also below) it might be better to add the attribute right in
> the opcode table.
> 
> As to the 3dnow insns - I think I'd like to revise my earlier suggestion to also
> tag those. Like e.g. FPU insns they're pretty normal GPR-wise, so allowing them
> to be used like that would appear only consistent. Otherwise, if we were
> concerned of AMD extensions in general, SSE4a insns (and maybe further
> ones) would also need excluding. (Additionally recall that there's an overlap
> between 3dnowa and SSE, which would result in another [apparent]
> inconsistency when excluding 3dnow insns here.)
> 

I see, for example  I think I need to split this table into two parts, one is for SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
pextrw, 0xfc5, SSE|3dnowA, Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8, RegMMX, Reg32|Reg64 }

Thanks,
Lili.

next prev parent reply	other threads:[~2023-11-06 15:20 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05   ` Jan Beulich
2023-11-03  6:20     ` Cui, Lili
2023-11-03 13:05     ` Jan Beulich
2023-11-03 14:19   ` Jan Beulich
2023-11-06 15:20     ` Cui, Lili [this message]
2023-11-06 16:08       ` Jan Beulich
2023-11-07  8:16         ` Cui, Lili
2023-11-07 10:43           ` Jan Beulich
2023-11-07 15:31             ` Cui, Lili
2023-11-07 15:43               ` Jan Beulich
2023-11-07 15:53                 ` Cui, Lili
2023-11-06 15:02   ` Jan Beulich
2023-11-07  8:06     ` Cui, Lili
2023-11-07 10:20       ` Jan Beulich
2023-11-07 14:32         ` Cui, Lili
2023-11-07 15:08           ` Jan Beulich
2023-11-06 15:39   ` Jan Beulich
2023-11-09  8:02     ` Cui, Lili
2023-11-09 10:52       ` Jan Beulich
2023-11-09 13:27         ` Cui, Lili
2023-11-09 15:22           ` Jan Beulich
2023-11-10  7:11             ` Cui, Lili
2023-11-10  9:14               ` Jan Beulich
2023-11-10  9:21                 ` Jan Beulich
2023-11-10 12:38                   ` Cui, Lili
2023-12-14 10:13                   ` Cui, Lili
2023-12-18 15:24                     ` Jan Beulich
2023-12-18 16:23                       ` H.J. Lu
2023-11-10  9:47                 ` Cui, Lili
2023-11-10  9:57                   ` Jan Beulich
2023-11-10 12:05                     ` Cui, Lili
2023-11-10 12:35                       ` Jan Beulich
2023-11-13  0:18                         ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08  9:11   ` Jan Beulich
2023-11-15 14:56     ` Cui, Lili
2023-11-16  9:17       ` Jan Beulich
2023-11-16 15:34     ` Cui, Lili
2023-11-16 16:50       ` Jan Beulich
2023-11-17 12:42         ` Cui, Lili
2023-11-17 14:38           ` Jan Beulich
2023-11-22 13:40             ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39   ` Jan Beulich
2023-11-20  1:19     ` Cui, Lili
2023-11-08 11:13   ` Jan Beulich
2023-11-20 12:36     ` Cui, Lili
2023-11-20 16:33       ` Jan Beulich
2023-11-22  7:46         ` Cui, Lili
2023-11-22  8:47           ` Jan Beulich
2023-11-22 10:45             ` Cui, Lili
2023-11-23 10:57               ` Jan Beulich
2023-11-23 12:14                 ` Cui, Lili
2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07  8:17                   ` Cui, Lili
2023-12-07  8:33                     ` Cui, Lili
2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20  1:33     ` Cui, Lili
2023-11-20  8:19       ` Jan Beulich
2023-11-20 12:54         ` Cui, Lili
2023-11-20 16:43           ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44   ` Jan Beulich
2023-11-08 12:52     ` Jan Beulich
2023-11-22  5:48     ` Cui, Lili
2023-11-22  8:53       ` Jan Beulich
2023-11-22 12:26         ` Cui, Lili
2023-11-09  9:57   ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36   ` Jan Beulich
2023-11-10  5:43     ` Hu, Lin1
2023-11-10  9:54       ` Jan Beulich
2023-11-14  2:28         ` Hu, Lin1
2023-11-14 10:50           ` Jan Beulich
2023-11-15  2:52             ` Hu, Lin1
2023-11-15  8:57               ` Jan Beulich
2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
2023-11-15  9:34               ` Jan Beulich
2023-11-17  7:24                 ` Hu, Lin1
2023-11-17  9:47                   ` Jan Beulich
2023-11-20  3:28                     ` Hu, Lin1
2023-11-20  8:34                       ` Jan Beulich
2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20           ` Jan Beulich
2023-11-15  1:49             ` Hu, Lin1
2023-11-15  8:52               ` Jan Beulich
2023-11-17  3:27                 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59   ` Jan Beulich
2023-11-14  3:26     ` Hu, Lin1
2023-11-14 11:15       ` Jan Beulich
2023-11-24  5:40         ` Hu, Lin1
2023-11-24  7:21           ` Jan Beulich
2023-11-27  2:16             ` Hu, Lin1
2023-11-27  8:03               ` Jan Beulich
2023-11-27  8:46                 ` Hu, Lin1
2023-11-27  8:54                   ` Jan Beulich
2023-11-27  9:03                     ` Hu, Lin1
2023-11-27 10:32                       ` Jan Beulich
2023-12-04  7:33                         ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42   ` Cui, Lili
2023-11-06  7:30     ` Jan Beulich
2023-11-06 14:20       ` Cui, Lili
2023-11-06 14:44         ` Jan Beulich
2023-11-06 16:03           ` Cui, Lili
2023-11-06 16:10             ` Jan Beulich
2023-11-07  1:53               ` Cui, Lili
2023-11-07 10:11                 ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2023-09-19 15:25 [PATCH 0/8] [RFC] " Cui, Lili
2023-09-19 15:25 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-09-21 15:27   ` Jan Beulich
2023-09-27 15:57     ` Cui, Lili
2023-09-21 15:51   ` Jan Beulich
2023-09-27 15:59     ` Cui, Lili
2023-09-28  8:02       ` Jan Beulich
2023-10-07  3:27         ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB560042CC079FE35596CD34E49EAAA@SJ0PR11MB5600.namprd11.prod.outlook.com \
    --to=lili.cui@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=ccoutant@gmail.com \
    --cc=hongjiu.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).