Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"ccoutant@gmail.com" <ccoutant@gmail.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
Date: Tue, 7 Nov 2023 11:43:39 +0100	[thread overview]
Message-ID: <bddf799b-fd2b-10a8-3944-a45d4bb992e7@suse.com> (raw)
In-Reply-To: <SJ0PR11MB5600787DF341BF8F3BD1CFA19EA9A@SJ0PR11MB5600.namprd11.prod.outlook.com>

On 07.11.2023 09:16, Cui, Lili wrote:
>>>> Subject: Re: [PATCH 1/8] Support APX GPR32 with rex2 prefix
>>>>
>>>> On 02.11.2023 12:29, Cui, Lili wrote:
>>>>> @@ -4158,6 +4182,19 @@ build_evex_prefix (void)
>>>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
>>>>>
>>>>> +/* Build (2 bytes) rex2 prefix.
>>>>> +   | D5h |
>>>>> +   | m | R4 X4 B4 | W R X B |
>>>>> +*/
>>>>> +static void
>>>>> +build_rex2_prefix (void)
>>>>> +{
>>>>> +  i.vex.length = 2;
>>>>> +  i.vex.bytes[0] = 0xd5;
>>>>> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>>>>> +		    | (i.rex2 << 4) | i.rex);
>>>>> +}
>>>>
>>>> I may have asked on v1 already: For emitting REX we don't resort to
>>>> (ab)using i.vex. Is that really necessary? (If so, a comment next to
>>>> the field declaration may be warranted.)
>>>>
>>> Added comment for it.
>>>
>>>   /* For the W R X B bits, the variables of rex prefix will be reused.  */
>>>   i.vex.bytes[1] = ((i.tm.opcode_space << 7)
>>>                     | (i.rex2 << 4) | i.rex);
>>
>> How does the comment relate to the (ab)use of i.vex?
>>
> Ah ha, it's i.vex, not i.rex. At first I thought rex2 should have its own variable, but in the output_insn function they have the same special handling of i.tm.opcode_space as VEX. Reusing i.vex can reduce some ugly code. 

Things like this are very helpful to explain in the patch description.

>>>>> @@ -5594,6 +5641,13 @@ md_assemble (char *line)
>>>>>  	  return;
>>>>>  	}
>>>>>
>>>>> +      /* Check for explicit REX2 prefix.  */
>>>>> +      if (i.rex2 || i.rex2_encoding)
>>>>
>>>> This open-codes is_any_apx_rex2_encoding(). But read on.
>>>>
>>>>> +	{
>>>>> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
>>>>
>>>> There's no REX2 prefix; {rex2} only sets i.rex2_encoding. Question is
>>>> what case the i.rex2 check above is intended to cover. Error message
>>>> comment, and condition want to reflect that.
>>>>
>>>
>>> Removed i.rex2 and keep i.rex2_encoding here. Added one invalid testcase
>> for it.
>>>
>>>         {rex} vmovaps %xmm7,%xmm2
>>>         {rex} vmovaps %xmm17,%xmm2
>>>         {rex} rorx $7,%eax,%ebx
>>> +       {rex2} vmovaps %xmm7,%xmm2
>>
>> Right, but please see my "optional vs required" comment in the pseudo- prefix
>> related patch I did send earlier today. I question the correctness of the {rex}
>> related check here, which would then extend to the {rex2} one as well.
>>
> 
> A REX byte that is immediately followed by a legacy prefix byte (LOCK, REPE, REPNE, OSIZE override, ASIZE override, or segment overrides) or another REX byte is ignored and behaves as if it does not exist (except for contributing to the instruction length)
> but in this case I think it's correct.

I'm afraid I can't relate this to the aspect I raised above. Perhaps better to
discuss in the context of the patch that I sent (and that I mentioned above;
"x86: CPU-qualify {disp16} / {disp32}"). You did reply to the patch, but you
didn't reply to the more detailed description of the issue (which I did refer
to above).

>>>>> +	    {
>>>>> +	      i.error = register_type_mismatch;
>>>>> +	      return 1;
>>>>> +	    }
>>>>> +	}
>>>>> +
>>>>> +      if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>>>>> +	  || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>>>>> +	{
>>>>> +	  i.error = register_type_of_address_mismatch;
>>>>> +	  return 1;
>>>>> +	}
>>>>> +
>>>>> +      /* Check pseudo prefix {rex2} are valid.  */
>>>>> +      if (i.rex2_encoding)
>>>>> +	{
>>>>> +	  i.error = invalid_pseudo_prefix;
>>>>> +	  return 1;
>>>>> +	}
>>>>
>>>> Further up in md_assemble() {rex} or {rex2} is simply ignored when
>>>> wrong to apply. Why would an inapplicable {rex2} be treated as an
>>>> error here? This would then also ...
>>>>
>>>>> @@ -7125,7 +7230,7 @@ match_template (char mnem_suffix)
>>>>>        /* Do not verify operands when there are none.  */
>>>>>        if (!t->operands)
>>>>>  	{
>>>>> -	  if (VEX_check_encoding (t))
>>>>> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
>>>>>  	    {
>>>>>  	      specific_error = progress (i.error);
>>>>>  	      continue;
>>>>
>>>> ... eliminate the need for this change, which is kind of bogus anyway:
>>>> There are no operands here, so calling a function of the given name
>>>> is at least suspicious.
>>>>
>>>
>>> We have these tests and I'm confused whether to remove them or not.
>>>
>>> +       #All opcodes in the row 0xf3* prefixed REX2 are illegal.
>>> +       {rex2} wrmsr
>>> +       {rex2} rdtsc
>>> +       {rex2} rdmsr
>>> +       {rex2} sysenter
>>> +       {rex2} sysexitl
>>> +       {rex2} rdpmc
>>
>> They should all stay. But as to my comment: There's no use of any eGPR here. If
>> you want to abuse that function and if there's no better descriptive name for it,
>> then once again at least a comment is needed.
>> (Considering this, the attribute's name NoEgpr is probably also misleading in
>> the cases here, i.e. when there are no operands. Hence, if not to be renamed,
>> requires yet another comment in i386-opc.h.)
>>
> This question also confused me , some instructions only support Acc register, but we need to add NoEgpr for them, this seems a bit strange. if we use NoRex2 , it doesn't fit the vex and evex instructions either. So I will add comments to it for now.
> 
> +         /* When there are no operands, we still need to use the
> +            check_EgprOperands function to check whether {rex2} is valid.  */
>           if (VEX_check_encoding (t) || check_EgprOperands (t))
> 
> -  /* egprs (r16-r31) on instruction illegal.  */
> +  /* egprs (r16-r31) on instruction illegal. We also use it to judge
> +     whether the instruction supports pseudo-prefix {rex2}.  */
>    NoEgpr,

This looks okay commentary-wise, but as per above we first need to settle on
whether an inapplicable {rex2} shouldn't simply be ignored.

>>>>> @@ -1008,10 +1012,35 @@ get_element_size (char **opnd, int lineno)
>>>>>    return elem_size;
>>>>>  }
>>>>>
>>>>> +static bool
>>>>> +if_entry_needs_special_handle (const unsigned long long opcode,
>>>>> +unsigned
>>>> int space,
>>>>> +			       const char *cpu_flags)
>>>>> +{
>>>>> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
>>>>> +#UD.  */
>>>>> +  if (strcmp (cpu_flags, "XSAVES") >= 0
>>>>> +      || strcmp (cpu_flags, "XSAVEC") >= 0
>>>>> +      || strcmp (cpu_flags, "Xsave") >= 0
>>>>> +      || strcmp (cpu_flags, "Xsaveopt") >= 0
>>>>
>>>> Upon further thought for these (and maybe even ...
>>>>
>>>>> +      || !strcmp (cpu_flags, "3dnow")
>>>>> +      || !strcmp (cpu_flags, "3dnowA"))
>>>>
>>>> ... for these, but see also below) it might be better to add the
>>>> attribute right in the opcode table.
>>>>
>>>> As to the 3dnow insns - I think I'd like to revise my earlier
>>>> suggestion to also tag those. Like e.g. FPU insns they're pretty
>>>> normal GPR-wise, so allowing them to be used like that would appear
>>>> only consistent. Otherwise, if we were concerned of AMD extensions in
>>>> general, SSE4a insns (and maybe further
>>>> ones) would also need excluding. (Additionally recall that there's an
>>>> overlap between 3dnowa and SSE, which would result in another
>>>> [apparent] inconsistency when excluding 3dnow insns here.)
>>>>
>>>
>>> I see, for example  I think I need to split this table into two parts, one is for
>> SSE and one is for 3dnowA, then add noegpr to the SSE one, right?
>>> pextrw, 0xfc5, SSE|3dnowA,
>>> Modrm|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|NoRex64, { Imm8,
>> RegMMX,
>>> Reg32|Reg64 }
>>
>> I'm afraid I don't understand the question. All I've asked for is that the special
>> treatment of 3dnow insns be removed again. Unless you want to special-case
>> further insns; it's not really clear to me what's best, as both approaches have
>> noticable downsides (either we allow to encode something which may never
>> become valid, or we disallow something which may become valid).
>>
>> In any event adding NoEgpr to any SSE insn sounds wrong to me - aiui they can
>> all be encoded with REX2.
>>
> I need to correct it:  There are some instructions table present both SSE and AMD instructions. I need to split them first and then add NoEgpr to AMD instructions.
> Another point is that we have not split the common instructions of AMD and Intel, so just adding NoEgpr to 3dnowA and 3dnow does not seem to make much sense.
> 
> Do you want me also to remove this part  and add  NoEgpr in insn table?

First we need to settle on what to do with 3DNow!, SSE4a, and maybe further
AMD-only insns (beyond e.g. XOP and TBM ones, which aiui are covered by
virtue of being VEX[-like], and hence never eligible for eGPR use). Then we
can sort out how to best express what we have decided to enforce.

I'm not convinced at all that templates like that for MASKMOVQ would need
splitting: The difference would be noticeable only if someone disabled SSE,
but kept 3DNow! and APX_F enabled. We could easily document the resulting
pitfall instead.

Jan

next prev parent reply	other threads:[~2023-11-07 10:43 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05   ` Jan Beulich
2023-11-03  6:20     ` Cui, Lili
2023-11-03 13:05     ` Jan Beulich
2023-11-03 14:19   ` Jan Beulich
2023-11-06 15:20     ` Cui, Lili
2023-11-06 16:08       ` Jan Beulich
2023-11-07  8:16         ` Cui, Lili
2023-11-07 10:43           ` Jan Beulich [this message]
2023-11-07 15:31             ` Cui, Lili
2023-11-07 15:43               ` Jan Beulich
2023-11-07 15:53                 ` Cui, Lili
2023-11-06 15:02   ` Jan Beulich
2023-11-07  8:06     ` Cui, Lili
2023-11-07 10:20       ` Jan Beulich
2023-11-07 14:32         ` Cui, Lili
2023-11-07 15:08           ` Jan Beulich
2023-11-06 15:39   ` Jan Beulich
2023-11-09  8:02     ` Cui, Lili
2023-11-09 10:52       ` Jan Beulich
2023-11-09 13:27         ` Cui, Lili
2023-11-09 15:22           ` Jan Beulich
2023-11-10  7:11             ` Cui, Lili
2023-11-10  9:14               ` Jan Beulich
2023-11-10  9:21                 ` Jan Beulich
2023-11-10 12:38                   ` Cui, Lili
2023-12-14 10:13                   ` Cui, Lili
2023-12-18 15:24                     ` Jan Beulich
2023-12-18 16:23                       ` H.J. Lu
2023-11-10  9:47                 ` Cui, Lili
2023-11-10  9:57                   ` Jan Beulich
2023-11-10 12:05                     ` Cui, Lili
2023-11-10 12:35                       ` Jan Beulich
2023-11-13  0:18                         ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08  9:11   ` Jan Beulich
2023-11-15 14:56     ` Cui, Lili
2023-11-16  9:17       ` Jan Beulich
2023-11-16 15:34     ` Cui, Lili
2023-11-16 16:50       ` Jan Beulich
2023-11-17 12:42         ` Cui, Lili
2023-11-17 14:38           ` Jan Beulich
2023-11-22 13:40             ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39   ` Jan Beulich
2023-11-20  1:19     ` Cui, Lili
2023-11-08 11:13   ` Jan Beulich
2023-11-20 12:36     ` Cui, Lili
2023-11-20 16:33       ` Jan Beulich
2023-11-22  7:46         ` Cui, Lili
2023-11-22  8:47           ` Jan Beulich
2023-11-22 10:45             ` Cui, Lili
2023-11-23 10:57               ` Jan Beulich
2023-11-23 12:14                 ` Cui, Lili
2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07  8:17                   ` Cui, Lili
2023-12-07  8:33                     ` Cui, Lili
2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20  1:33     ` Cui, Lili
2023-11-20  8:19       ` Jan Beulich
2023-11-20 12:54         ` Cui, Lili
2023-11-20 16:43           ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44   ` Jan Beulich
2023-11-08 12:52     ` Jan Beulich
2023-11-22  5:48     ` Cui, Lili
2023-11-22  8:53       ` Jan Beulich
2023-11-22 12:26         ` Cui, Lili
2023-11-09  9:57   ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36   ` Jan Beulich
2023-11-10  5:43     ` Hu, Lin1
2023-11-10  9:54       ` Jan Beulich
2023-11-14  2:28         ` Hu, Lin1
2023-11-14 10:50           ` Jan Beulich
2023-11-15  2:52             ` Hu, Lin1
2023-11-15  8:57               ` Jan Beulich
2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
2023-11-15  9:34               ` Jan Beulich
2023-11-17  7:24                 ` Hu, Lin1
2023-11-17  9:47                   ` Jan Beulich
2023-11-20  3:28                     ` Hu, Lin1
2023-11-20  8:34                       ` Jan Beulich
2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20           ` Jan Beulich
2023-11-15  1:49             ` Hu, Lin1
2023-11-15  8:52               ` Jan Beulich
2023-11-17  3:27                 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59   ` Jan Beulich
2023-11-14  3:26     ` Hu, Lin1
2023-11-14 11:15       ` Jan Beulich
2023-11-24  5:40         ` Hu, Lin1
2023-11-24  7:21           ` Jan Beulich
2023-11-27  2:16             ` Hu, Lin1
2023-11-27  8:03               ` Jan Beulich
2023-11-27  8:46                 ` Hu, Lin1
2023-11-27  8:54                   ` Jan Beulich
2023-11-27  9:03                     ` Hu, Lin1
2023-11-27 10:32                       ` Jan Beulich
2023-12-04  7:33                         ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42   ` Cui, Lili
2023-11-06  7:30     ` Jan Beulich
2023-11-06 14:20       ` Cui, Lili
2023-11-06 14:44         ` Jan Beulich
2023-11-06 16:03           ` Cui, Lili
2023-11-06 16:10             ` Jan Beulich
2023-11-07  1:53               ` Cui, Lili
2023-11-07 10:11                 ` Jan Beulich
  -- strict thread matches above, loose matches on Subject: below --
2023-09-19 15:25 [PATCH 0/8] [RFC] " Cui, Lili
2023-09-19 15:25 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-09-21 15:27   ` Jan Beulich
2023-09-27 15:57     ` Cui, Lili
2023-09-21 15:51   ` Jan Beulich
2023-09-27 15:59     ` Cui, Lili
2023-09-28  8:02       ` Jan Beulich
2023-10-07  3:27         ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bddf799b-fd2b-10a8-3944-a45d4bb992e7@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=ccoutant@gmail.com \
    --cc=hongjiu.lu@intel.com \
    --cc=lili.cui@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).