public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"ccoutant@gmail.com" <ccoutant@gmail.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH 1/8] Support APX GPR32 with rex2 prefix
Date: Tue, 7 Nov 2023 08:16:29 +0000	[thread overview]
Message-ID: <SJ0PR11MB5600787DF341BF8F3BD1CFA19EA9A@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <9a6b01c3-4dd4-731e-d677-e19d092b16c4@suse.com>

> >> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
> >>
> >> On 02.11.2023 12:29, Cui, Lili wrote:
> >>> @@ -406,6 +409,11 @@ struct _i386_insn
> >>>      /* Compressed disp8*N attribute.  */
> >>>      unsigned int memshift;
> >>>
> >>> +    /* No CSPAZO flags update.*/
> >>> +    bool has_nf;
> >>> +
> >>> +    bool has_zero_upper;
> >>> +
> >>
> >> Can both please be introduced when they're needed, not randomly ahead
> >> of time?
> >
> > Moved has_nf to patch 2/8 and deleted has_zero_upper.
> 
> Patch 2/8? Not in this series then, I suppose?
> 
It should be patch 3/8. 😊

> >>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
> >>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >>>
> >>> +/* Build (2 bytes) rex2 prefix.
> >>> +   | D5h |
> >>> +   | m | R4 X4 B4 | W R X B |
> >>> +*/
> >>> +static void
> >>> +build_rex2_prefix (void)
> >>> +{
> >>> +  i.vex.length = 2;
> >>> +  i.vex.bytes[0] = 0xd5;
> >>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> >>> +		    | (i.rex2 << 4) | i.rex);
> >>> +}
> >>
> >> I may have asked on v1 already: For emitting REX we don't resort to
> >> (ab)using i.vex. Is that really necessary? (If so, a comment next to
> >> the field declaration may be warranted.)
> >>
> > Added comment for it.
> >
> >   /* For the W R X B bits, the variables of rex prefix will be reused.  */
> >   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> >                     | (i.rex2 << 4) | i.rex);
> 
> How does the comment relate to the (ab)use of i.vex?
>
Ah ha, it's i.vex, not i.rex. At first I thought rex2 should have its own variable, but in the output_insn function they have the same special handling of i.tm.opcode_space as VEX. Reusing i.vex can reduce some ugly code. 

> >> Speaking of v1: Can you please make sure you have correct version
> >> tags on submissions of updated patch versions?
> >>
> > I used git to send all the patches at once( git send-email  --cover-letter --
> annotate  --to="..." -8), which only has the opportunity to change the version of
> the cover letter patch. To change the version of each patch, I can send them
> one by one next time. By the way, do you have a better way? Or how did you
> modify them? Thanks.
> 
> Well, personally I don't use git to send patches. But I know people send series
> with proper version tags throughout, all the time.
> 
Ok, I will pay attention for it.

> >>> @@ -5278,6 +5319,9 @@ md_assemble (char *line)
> >>>  	case register_type_mismatch:
> >>>  	  err_msg = _("register type mismatch");
> >>>  	  break;
> >>> +	case register_type_of_address_mismatch:
> >>> +	  err_msg = _("register type of address mismatch");
> >>> +	  break;
> >>
> >> I have a concern with wording / naming here: If I saw this in an
> >> error message, I wouldn't know what is meant. Maybe something along
> >> the lines of "cannot use an extended GPR for addressing"? And then
> >> the enumerator suitabley renamed as well?
> >>
> >  Changed to
> >
> > +       case unsupported_EGPR_for_addressing:
> > +         err_msg = _("unsupported EGPR for addressing");
> > +         break;
> 
> May I suggest "extended GPR" in the message text (the enumerator is fine to
> have EGPR)?
> 

Sure.

> >>> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
> >>>  	  return;
> >>>  	}
> >>>
> >>> +      /* Check for explicit REX2 prefix.  */
> >>> +      if (i.rex2 || i.rex2_encoding)
> >>
> >> This open-codes is_any_apx_rex2_encoding(). But read on.
> >>
> >>> +	{
> >>> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> >>
> >> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is
> >> what case the i.rex2 check above is intended to cover. Error message
> >> comment, and condition want to reflect that.
> >>
> >
> > Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase
> for it.
> >
> >         {rex} vmovaps %xmm7,%xmm2
> >         {rex} vmovaps %xmm17,%xmm2
> >         {rex} rorx $7,%eax,%ebx
> > +       {rex2} vmovaps %xmm7,%xmm2
> 
> Right, but please see my "optional vs required" comment in the pseudo- prefix
> related patch I did send earlier today. I question the correctness of the {rex}
> related check here, which would then extend to the {rex2} one as well.
> 

A REX byte that is immediately followed by a legacy prefix byte (LOCK, REPE, REPNE, OSIZE override, ASIZE override, or segment overrides) or another REX byte is ignored and behaves as if it does not exist (except for contributing to the instruction length)
but in this case I think it's correct.

> >>> +      for (unsigned int op = 0; op < i.operands; op++)
> >>> +	{
> >>> +	  if (i.types[op].bitfield.class != Reg
> >>> +	      /* Special case for (%dx) while doing input/output op */
> >>> +	      || i.input_output_operand)
> >>
> >> Why is this needed? The register table entry for %dx ...
> >>
> >>> +	    continue;
> >>> +
> >>> +	  if (i.op[op].regs->reg_flags & RegRex2)
> >>
> >> ... doesn't have this bit set anyway.
> >>
> >
> > For this special case i.op is empty, we need continue, otherwise r
> i.op[op].regs->reg_flags  will cause segment fault.
> 
> I vaguely recall commenting on this anomaly to H.J. - perhaps time to fix that
> properly (in a separate patch), to not leave this trap open any longer?
> (Otherwise at least a comment is needed here.)
>

H.J is on vacation and I will try to fix this.

> >>> +	    {
> >>> +	      i.error = register_type_mismatch;
> >>> +	      return 1;
> >>> +	    }
> >>> +	}
> >>> +
> >>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> >>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> >>> +	{
> >>> +	  i.error = register_type_of_address_mismatch;
> >>> +	  return 1;
> >>> +	}
> >>> +
> >>> +      /* Check pseudo prefix {rex2} are valid.  */
> >>> +      if (i.rex2_encoding)
> >>> +	{
> >>> +	  i.error = invalid_pseudo_prefix;
> >>> +	  return 1;
> >>> +	}
> >>
> >> Further up in md_assemble() {rex} or {rex2} is simply ignored when
> >> wrong to apply. Why would an inapplicable {rex2} be treated as an
> >> error here? This would then also ...
> >>
> >>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
> >>>        /* Do not verify operands when there are none.  */
> >>>        if (!t->operands)
> >>>  	{
> >>> -	  if (VEX_check_encoding (t))
> >>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> >>>  	    {
> >>>  	      specific_error = progress (i.error);
> >>>  	      continue;
> >>
> >> ... eliminate the need for this change, which is kind of bogus anyway:
> >> There are no operands here, so calling a function of the given name
> >> is at least suspicious.
> >>
> >
> > We have these tests and I'm confused whether to remove them or not.
> >
> > +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
> > +       {rex2} wrmsr
> > +       {rex2} rdtsc
> > +       {rex2} rdmsr
> > +       {rex2} sysenter
> > +       {rex2} sysexitl
> > +       {rex2} rdpmc
> 
> They should all stay. But as to my comment: There's no use of any eGPR here. If
> you want to abuse that function and if there's no better descriptive name for it,
> then once again at least a comment is needed.
> (Considering this, the attribute's name NoEgpr is probably also misleading in
> the cases here, i.e. when there are no operands. Hence, if not to be renamed,
> requires yet another comment in i386-opc.h.)
> 
This question also confused me , some instructions only support Acc register, but we need to add NoEgpr for them, this seems a bit strange. if we use NoRex2 , it doesn't fit the vex and evex instructions either. So I will add comments to it for now.

+         /* When there are no operands, we still need to use the
+            check_EgprOperands function to check whether {rex2} is valid.  */
          if (VEX_check_encoding (t) || check_EgprOperands (t))

-  /* egprs (r16-r31) on instruction illegal.  */
+  /* egprs (r16-r31) on instruction illegal. We also use it to judge
+     whether the instruction supports pseudo-prefix {rex2}.  */
   NoEgpr,

> >>> @@ -14131,6 +14258,13 @@ static bool check_register (const reg_entry
> *r)
> >>>  	i.vec_encoding = vex_encoding_error;
> >>>      }
> >>>
> >>> +  if (r->reg_flags & RegRex2)
> >>> +    {
> >>> +      if (!cpu_arch_flags.bitfield.cpuapx_f
> >>> +	  || flag_code != CODE_64BIT)
> >>> +	return false;
> >>> +    }
> >>
> >> Please fold the two if()s into one (unless of course you know that
> >> the outer one is going to be extended in a subsequent patch).
> >>
> >
> > Yes, other code will be added in the outer if with patch2/8.
> 
> Hmm, you again say patch 2/8, yet that patch in this series clearly doesn't do
> anything like that.
> 

It should be patch 3/8.

> >>> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> >>> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
> >>> @@ -11,11 +11,11 @@ Disassembly of section .text:
> >>>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >>>
> >>>  0+1 <aad0>:
> >>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> >>> +[ 	]*[a-f0-9]+:	d5                   	rex2
> >>>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >>>
> >>>  0+3 <aad1>:
> >>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> >>> +[ 	]*[a-f0-9]+:	d5                   	rex2
> >>>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >>>
> >>>  0+5 <aam0>:
> >>> --- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> >>> +++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
> >>> @@ -11,11 +11,11 @@ Disassembly of section .text:
> >>>  [ 	]*[a-f0-9]+:	37                   	\(bad\)
> >>>
> >>>  0+1 <aad0>:
> >>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> >>> +[ 	]*[a-f0-9]+:	d5                   	rex2
> >>>  [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
> >>>
> >>>  0+3 <aad1>:
> >>> -[ 	]*[a-f0-9]+:	d5                   	\(bad\)
> >>> +[ 	]*[a-f0-9]+:	d5                   	rex2
> >>>  [ 	]*[a-f0-9]+:	02                   	.byte 0x2
> >>>
> >>>  0+5 <aam0>:
> >>
> >> These expectations match the ones of the same test in the parent directory.
> >> Hence instead of adjusting each in both places, please have the ones
> >> here reference the parent directory files.
> >>
> >
> > They are used to test illegal opcodes for x86-64. Since D5 now makes sense,
> these two test cases were removed.
> >
> > # All the followings are illegal opcodes for x86-64.
> > aad0:
> >         aad
> > aad1:
> >         aad $2
> 
> Right, but how does this relate to my request to simply fold the expectations
> here with that of the same test in the parent directory?
> (You'll find various examples under ilp32/ where I've done such folding already,
> whenever I had to touch both instances anyway.)
> 

Oh, got it.

> >>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
> >>>    return elem_size;
> >>>  }
> >>>
> >>> +static bool
> >>> +if_entry_needs_special_handle (const unsigned long long opcode,
> >>> +unsigned
> >> int space,
> >>> +			       const char *cpu_flags)
> >>> +{
> >>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> >>> +#UD.  */
> >>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> >>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> >>> +      || strcmp (cpu_flags, "Xsave") >= 0
> >>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
> >>
> >> Upon further thought for these (and maybe even ...
> >>
> >>> +      || !strcmp (cpu_flags, "3dnow")
> >>> +      || !strcmp (cpu_flags, "3dnowA"))
> >>
> >> ... for these, but see also below) it might be better to add the
> >> attribute right in the opcode table.
> >>
> >> As to the 3dnow insns - I think I'd like to revise my earlier
> >> suggestion to also tag those. Like e.g. FPU insns they're pretty
> >> normal GPR-wise, so allowing them to be used like that would appear
> >> only consistent. Otherwise, if we were concerned of AMD extensions in
> >> general, SSE4a insns (and maybe further
> >> ones) would also need excluding. (Additionally recall that there's an
> >> overlap between 3dnowa and SSE, which would result in another
> >> [apparent] inconsistency when excluding 3dnow insns here.)
> >>
> >
> > I see, for example  I think I need to split this table into two parts, one is for
> SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
> > pextrw, 0xfc5, SSE|3dnowA,
> > Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
> RegMMX,
> > Reg32|Reg64 }
> 
> I'm afraid I don't understand the question. All I've asked for is that the special
> treatment of 3dnow insns be removed again. Unless you want to special-case
> further insns; it's not really clear to me what's best, as both approaches have
> noticable downsides (either we allow to encode something which may never
> become valid, or we disallow something which may become valid).
> 
> In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui they can
> all be encoded with REX2.
> 
I need to correct it:  There are some instructions table present both SSE and AMD instructions. I need to split them first and then add NoEgpr to AMD instructions.
Another point is that we have not split the common instructions of AMD and Intel, so just adding NoEgpr to 3dnowA and 3dnow does not seem to make much sense.

Do you want me also to remove this part  and add  NoEgpr in insn table?
> >>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> >>> +#UD.  */
> >>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> >>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> >>> +      || strcmp (cpu_flags, "Xsave") >= 0
> >>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0

Thanks,
Lili.

  reply	other threads:[~2023-11-07  8:16 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05   ` Jan Beulich
2023-11-03  6:20     ` Cui, Lili
2023-11-03 13:05     ` Jan Beulich
2023-11-03 14:19   ` Jan Beulich
2023-11-06 15:20     ` Cui, Lili
2023-11-06 16:08       ` Jan Beulich
2023-11-07  8:16         ` Cui, Lili [this message]
2023-11-07 10:43           ` Jan Beulich
2023-11-07 15:31             ` Cui, Lili
2023-11-07 15:43               ` Jan Beulich
2023-11-07 15:53                 ` Cui, Lili
2023-11-06 15:02   ` Jan Beulich
2023-11-07  8:06     ` Cui, Lili
2023-11-07 10:20       ` Jan Beulich
2023-11-07 14:32         ` Cui, Lili
2023-11-07 15:08           ` Jan Beulich
2023-11-06 15:39   ` Jan Beulich
2023-11-09  8:02     ` Cui, Lili
2023-11-09 10:52       ` Jan Beulich
2023-11-09 13:27         ` Cui, Lili
2023-11-09 15:22           ` Jan Beulich
2023-11-10  7:11             ` Cui, Lili
2023-11-10  9:14               ` Jan Beulich
2023-11-10  9:21                 ` Jan Beulich
2023-11-10 12:38                   ` Cui, Lili
2023-12-14 10:13                   ` Cui, Lili
2023-12-18 15:24                     ` Jan Beulich
2023-12-18 16:23                       ` H.J. Lu
2023-11-10  9:47                 ` Cui, Lili
2023-11-10  9:57                   ` Jan Beulich
2023-11-10 12:05                     ` Cui, Lili
2023-11-10 12:35                       ` Jan Beulich
2023-11-13  0:18                         ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08  9:11   ` Jan Beulich
2023-11-15 14:56     ` Cui, Lili
2023-11-16  9:17       ` Jan Beulich
2023-11-16 15:34     ` Cui, Lili
2023-11-16 16:50       ` Jan Beulich
2023-11-17 12:42         ` Cui, Lili
2023-11-17 14:38           ` Jan Beulich
2023-11-22 13:40             ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39   ` Jan Beulich
2023-11-20  1:19     ` Cui, Lili
2023-11-08 11:13   ` Jan Beulich
2023-11-20 12:36     ` Cui, Lili
2023-11-20 16:33       ` Jan Beulich
2023-11-22  7:46         ` Cui, Lili
2023-11-22  8:47           ` Jan Beulich
2023-11-22 10:45             ` Cui, Lili
2023-11-23 10:57               ` Jan Beulich
2023-11-23 12:14                 ` Cui, Lili
2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07  8:17                   ` Cui, Lili
2023-12-07  8:33                     ` Cui, Lili
2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20  1:33     ` Cui, Lili
2023-11-20  8:19       ` Jan Beulich
2023-11-20 12:54         ` Cui, Lili
2023-11-20 16:43           ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44   ` Jan Beulich
2023-11-08 12:52     ` Jan Beulich
2023-11-22  5:48     ` Cui, Lili
2023-11-22  8:53       ` Jan Beulich
2023-11-22 12:26         ` Cui, Lili
2023-11-09  9:57   ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36   ` Jan Beulich
2023-11-10  5:43     ` Hu, Lin1
2023-11-10  9:54       ` Jan Beulich
2023-11-14  2:28         ` Hu, Lin1
2023-11-14 10:50           ` Jan Beulich
2023-11-15  2:52             ` Hu, Lin1
2023-11-15  8:57               ` Jan Beulich
2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
2023-11-15  9:34               ` Jan Beulich
2023-11-17  7:24                 ` Hu, Lin1
2023-11-17  9:47                   ` Jan Beulich
2023-11-20  3:28                     ` Hu, Lin1
2023-11-20  8:34                       ` Jan Beulich
2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20           ` Jan Beulich
2023-11-15  1:49             ` Hu, Lin1
2023-11-15  8:52               ` Jan Beulich
2023-11-17  3:27                 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59   ` Jan Beulich
2023-11-14  3:26     ` Hu, Lin1
2023-11-14 11:15       ` Jan Beulich
2023-11-24  5:40         ` Hu, Lin1
2023-11-24  7:21           ` Jan Beulich
2023-11-27  2:16             ` Hu, Lin1
2023-11-27  8:03               ` Jan Beulich
2023-11-27  8:46                 ` Hu, Lin1
2023-11-27  8:54                   ` Jan Beulich
2023-11-27  9:03                     ` Hu, Lin1
2023-11-27 10:32                       ` Jan Beulich
2023-12-04  7:33                         ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42   ` Cui, Lili
2023-11-06  7:30     ` Jan Beulich
2023-11-06 14:20       ` Cui, Lili
2023-11-06 14:44         ` Jan Beulich
2023-11-06 16:03           ` Cui, Lili
2023-11-06 16:10             ` Jan Beulich
2023-11-07  1:53               ` Cui, Lili
2023-11-07 10:11                 ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2023-09-19 15:25 [PATCH 0/8] [RFC] " Cui, Lili
2023-09-19 15:25 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-09-21 15:27   ` Jan Beulich
2023-09-27 15:57     ` Cui, Lili
2023-09-21 15:51   ` Jan Beulich
2023-09-27 15:59     ` Cui, Lili
2023-09-28  8:02       ` Jan Beulich
2023-10-07  3:27         ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB5600787DF341BF8F3BD1CFA19EA9A@SJ0PR11MB5600.namprd11.prod.outlook.com \
    --to=lili.cui@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=ccoutant@gmail.com \
    --cc=hongjiu.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).