RE: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH v3 2/9] Support APX GPR32 with rex2 prefix
Date: Wed, 6 Dec 2023 12:43:50 +0000	[thread overview]
Message-ID: <SJ0PR11MB5600FDF8200EEDEF70714AE79E84A@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <9578ce1e-233e-4c9a-8003-a619a685c236@suse.com>

> On 05.12.2023 14:31, Cui, Lili wrote:
> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
> >>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >>>
> >>> +/* Build (2 bytes) rex2 prefix.
> >>> +   | D5h |
> >>> +   | m | R4 X4 B4 | W R X B |
> >>> +*/
> >>> +static void
> >>> +build_rex2_prefix (void)
> >>> +{
> >>> +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the
> >>> +same.  */
> >>
> >> How do they handle it the same? (Also I don't think this is useful as
> >> a code comment; it instead belongs in the description imo.)
> >>
> >
> > Moved the comment to the functions description.
> >
> > /* Build (2 bytes) rex2 prefix.
> >    | D5h |
> >    | m | R4 X4 B4 | W R X B |
> >
> >    Rex2 reuses i.vex as they handle i.tm.opcode_space the same way.
> > */ static void build_rex2_prefix (void)
> >
> >
> > In function "output_insn",  some handle like this.
> >
> >       if (!i.vex.length)
> >         switch (i.tm.opcode_space)
> >           {
> >           case SPACE_BASE:
> >             break;
> >           case SPACE_0F:
> >             ++j;
> >             break;
> >           case SPACE_0F38:
> >           case SPACE_0F3A:
> >             j += 2;
> >             break;
> >           default:
> >             abort ();
> >           }
> > .....
> >          if (!i.vex.length
> >               && i.tm.opcode_space != SPACE_BASE)
> >             {
> >               *p++ = 0x0f;
> >               if (i.tm.opcode_space != SPACE_0F)
> >                 *p++ = i.tm.opcode_space == SPACE_0F38
> >                        ? 0x38 : 0x3a;
> >             }
> 
> Oh, I see. That's pretty remote. How about replacing "the same way"?
> Perhaps
> "Rex2 reuses i.vex as they both encode i.tm.opcode_space in their prefixes"?
> 

Done.

> While in that form it's fine to remain in a code comment, just a general
> clarification: When I say something wants saying in the "description", it's
> (almost) always that I mean the patch description, not anything else.
> 

I see.

> >> I did comment on, in particular, the 8-bit register counts before.
> >> Afaict the comments above are nevertheless unchanged and hence still
> >> not really correct.
> >>
> >
> > Changed to :
> >
> >       if (flag_code == CODE_64BIT || base_regnum < 4)
> >         {
> >           i.types[1].bitfield.byte = 1;
> >           /* Ignore the suffix.  */
> >           i.suffix = 0;
> >           /* Convert to byte registers. 8-bit registers are special,
> >              RegRex64 and non-RegRex64 each have 8 registers.  */
> >           if (i.types[1].bitfield.word)
> >             /* 32 (or 40) 8-bit registers.  */
> >             j = 32;
> >           else if (i.types[1].bitfield.dword)
> >             /* 32 (or 40)8-bit registers + 32 16-bit registers.  */
> 
> Nit: Missing blank.
> 

Done.

> >             j = 64;
> >           else
> >             /* 32 (or 40) 8-bit registers + 32 16-bit registers
> >                + 32 32-bit registers.  */
> >             j = 96;
> >
> >           if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum <
> 4)
> >             j += 8;
> >           i.op[1].regs -= j;
> >         }
> 
> I won't insist on further changes, but imo as you're adding comments, also
> adding a comment to this last if() (which finally takes care of the 8-bit reg
> special case) would be advisable.
> 

Added.

          /* In 64-bit mode, the following byte registers cannot be accessed
             if using the Rex and Rex2 prefix: AH, BH, CH, DH */
          if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
            j += 8;

> >>> +/* Check if Egprs operands are valid for the instruction.  */
> >>> +
> >>> +static int
> >>> +check_EgprOperands (const insn_template *t) {
> >>> +  if (!t->opcode_modifier.noegpr)
> >>> +    return 0;
> >>> +
> >>> +  for (unsigned int op = 0; op < i.operands; op++)
> >>> +    {
> >>> +      if (i.types[op].bitfield.class != Reg
> >>> +	  /* Special case for (%dx) while doing input/output op */
> >>> +	  || i.input_output_operand)
> >>
> >> Didn't we agree that this extra condition isn't necessary, once the
> >> producer site correctly updates all state (which was supposed to be
> >> done in a small prereq patch)?
> >>
> >
> > I tried adding "Unspecified | BaseIndex" to the InOutPortReg, then some
> related instructions had two memory operands, so it raised a lot of invalid
> test case fail, and more ugly code needed to be added. In the end, I felt that
> this simple modification might be better.
> 
> Changing InOutPortReg of course isn't going to be easy. But that also wasn't
> what we had discussed. Instead (I thought) we agreed on ...
> 
> > @@ -13137,6 +13137,7 @@ i386_att_operand (char *operand_string)
> >           && !operand_type_check (i.types[this_operand], disp))
> >         {
> >           i.types[this_operand] = i.base_reg->reg_type;
> > +         i.types[this_operand].bitfield.class = 0;
> >           i.input_output_operand = true;
> >           return 1;
> 
> amending this code to also correctly set i.op[].regs. Perhaps it would also be
> best to actually clear i.base_reg (for there not being any memory operand).
> (FTAOD: All of this in a separate prereq patch, not here. The code creating
> inconsistent state has been a [latent] bug for a long time.)
> 

Added i.base_reg = NULL. Just discussing it here, I'll create a new patch for it.

@@ -13016,6 +13016,8 @@ i386_att_operand (char *operand_string)
          && !operand_type_check (i.types[this_operand], disp))
        {
          i.types[this_operand] = i.base_reg->reg_type;
+         i.types[this_operand].bitfield.class = 0;
+         i.base_reg = NULL;
          i.input_output_operand = true;
          return 1;
        }

> >>> --- a/gas/doc/c-i386.texi
> >>> +++ b/gas/doc/c-i386.texi
> >>> @@ -217,6 +217,7 @@ accept various extension mnemonics.  For
> >>> example, @code{avx10.1/256},  @code{avx10.1/128},  @code{user_msr},
> >>> +@code{apx_f},
> >>>  @code{amx_int8},
> >>>  @code{amx_bf16},
> >>>  @code{amx_fp16},
> >>> @@ -983,6 +984,9 @@ Different encoding options can be specified via
> >> pseudo prefixes:
> >>>  instructions (x86-64 only).  Note that this differs from the
> >>> @samp{rex}  prefix which generates REX prefix unconditionally.
> >>>
> >>> +@item
> >>> +@samp{@{rex2@}} -- encode with REX2 prefix
> >>
> >> This isn't in line with what's said for {rex}. Iirc we were in
> >> agreement that we want both to behave consistently. In which case
> >> documentation also needs to describe them consistently.
> >>
> >
> > Changed to
> >
> > @item
> > @samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
> > instructions (APX_F only).  Note that this differs from the
> > @samp{rex2} prefix which generates REX2 prefix unconditionally.
> 
> Except there's no "rex2" prefix according to the present implementation.
>
 
Remove them for current implementation.

@item
@samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
instructions (APX_F only).

> >>> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> >>> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> >>> @@ -5,3 +5,61 @@ pseudos:
> >>>  	{rex} vmovaps %xmm7,%xmm2
> >>>  	{rex} vmovaps %xmm17,%xmm2
> >>>  	{rex} rorx $7,%eax,%ebx
> >>> +	{rex2} vmovaps %xmm7,%xmm2
> >>> +	{rex2} xsave (%rax)
> >>> +	{rex2} xsaves (%ecx)
> >>> +	{rex2} xsaves64 (%ecx)
> >>> +	{rex2} xsavec (%ecx)
> >>> +	{rex2} xrstors (%ecx)
> >>> +	{rex2} xrstors64 (%ecx)
> >>> +
> >>> +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
> >>> +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> >>> +	{rex2} mov    0x90909090,%al
> >>> +	{rex2} movabs 0x1,%al
> >>> +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
> >>> +	{rex2} lodsb
> >>> +	{rex2} lods   %ds:(%esi),%al
> >>> +	{rex2} lodsb   (%esi)
> >>> +	{rex2} movs
> >>> +	{rex2} movs   (%esi), (%edi)
> >>> +	{rex2} scasl
> >>> +	{rex2} scas   %es:(%edi),%eax
> >>> +	{rex2} scasb   (%edi)
> >>> +	{rex2} stosb
> >>> +	{rex2} stosb   (%edi)
> >>> +	{rex2} stos   %eax,%es:(%edi)
> >>> +
> >>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> >>
> >> This also covers map 1 row 8, doesn't it?
> >>
> >
> > No, I didn't find 0xf8* in opcode table.
> 
> Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or wait,
> depends on what "opcode table" here means: The manual's or opcodes/i386-
> opc.tbl? The latter of course doesn't have them, as they're ...
> 
> >>> +	{rex2} jo     .+2-0x70
> >>> +	{rex2} jno    .+2-0x70
> >>> +	{rex2} jb     .+2-0x70
> >>> +	{rex2} jae    .+2-0x70
> >>> +	{rex2} je     .+2-0x70
> >>> +	{rex2} jne    .+2-0x70
> >>> +	{rex2} jbe    .+2-0x70
> >>> +	{rex2} ja     .+2-0x70
> >>> +	{rex2} js     .+2-0x70
> >>> +	{rex2} jns    .+2-0x70
> >>> +	{rex2} jp     .+2-0x70
> >>> +	{rex2} jnp    .+2-0x70
> >>> +	{rex2} jl     .+2-0x70
> >>> +	{rex2} jge    .+2-0x70
> >>> +	{rex2} jle    .+2-0x70
> >>> +	{rex2} jg     .+2-0x70
> 
> ... the disp32/disp16 forms of these branches, which are created only during
> relaxation.
>

Oh,  I see,  I found them in sdm and added testcase for them.

        #All opcodes in the row 0x8* (map1) prefixed REX2 are illegal.
        {rex2} jo     .+6+0x90909090
        {rex2} jno    .+6+0x90909090
        {rex2} jb     .+6+0x90909090
        {rex2} jae    .+6+0x90909090
        {rex2} je     .+6+0x90909090
        {rex2} jne    .+6+0x90909090
        {rex2} jbe    .+6+0x90909090
        {rex2} ja     .+6+0x90909090
        {rex2} js     .+6+0x90909090
        {rex2} jns    .+6+0x90909090
        {rex2} jp     .+6+0x90909090
        {rex2} jnp    .+6+0x90909090
        {rex2} jl     .+6+0x90909090
        {rex2} jge    .+6+0x90909090
        {rex2} jle    .+6+0x90909090
        {rex2} jg     .+6+0x90909090
 
> >>> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> >>> +     are reserved under REX2 and triggers #UD when prefixed with
> >>> + REX2 */  if (space == 0)
> >>> +    switch (opcode >> 4)
> >>
> >> Both here and ...
> >>
> >>> +      {
> >>> +      case 0x4:
> >>> +      case 0x7:
> >>> +      case 0xA:
> >>> +      case 0xE:
> >>> +	return true;
> >>> +      default:
> >>> +	return false;
> >>> +    }
> >>> +
> >>> +  if (space == SPACE_0F)
> >>> +    switch (opcode >> 4)
> >>
> >> ... here, don't you also need to mask off further bits? There are
> >> quite a few opcodes which have a kind-of ModR/M byte encoded directly
> >> in the opcode, for example.
> >>
> >
> > Thanks for reminding. Added the code like this.
> >
> > /* Some opcodes encode a ModR/M byte directly in the opcode.  */
> >   unsigned long long
> >   base_opcode = (length > 1) ? opcode >> (8 * length - 8) : opcode;
> 
> Can length be 0? I didn't think so, and then
> 
>    base_opcode = opcode >> (8 * length - 8);
> 
> would be all you need.
>

yes good way.

> Also in the comment, I think it would be slightly better to say "ModR/M-like
> byte".
> 

Done.

Thanks,
Lili.

next prev parent reply	other threads:[~2023-12-06 12:44 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24  7:02 [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Cui, Lili
2023-11-24  7:02 ` [PATCH v3 2/9] Support APX GPR32 with rex2 prefix Cui, Lili
2023-12-04 16:30   ` Jan Beulich
2023-12-05 13:31     ` Cui, Lili
2023-12-06  7:52       ` Jan Beulich
2023-12-06 12:43         ` Cui, Lili [this message]
2023-12-07  9:01           ` Jan Beulich
2023-12-08  3:10             ` Cui, Lili
2023-11-24  7:02 ` [PATCH v3 3/9] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-24  7:02 ` [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Cui, Lili
2023-12-07 12:38   ` Jan Beulich
2023-12-08 15:21     ` Cui, Lili
2023-12-11  8:34       ` Jan Beulich
2023-12-12 10:44         ` Cui, Lili
2023-12-12 11:16           ` Jan Beulich
2023-12-12 12:32             ` Cui, Lili
2023-12-12 12:39               ` Jan Beulich
2023-12-12 13:15                 ` Cui, Lili
2023-12-12 14:13                   ` Jan Beulich
2023-12-13  7:36                     ` Cui, Lili
2023-12-13  7:48                       ` Jan Beulich
2023-12-12 12:58         ` Cui, Lili
2023-12-12 14:04           ` Jan Beulich
2023-12-13  8:35             ` Cui, Lili
2023-12-13  9:13               ` Jan Beulich
2023-12-07 13:34   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:43       ` Jan Beulich
2023-12-11 11:50   ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 5/9] Add tests for " Cui, Lili
2023-12-07 14:05   ` Jan Beulich
2023-12-11  6:16     ` Cui, Lili
2023-12-11  8:55       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 6/9] Support APX NDD Cui, Lili
2023-12-08 14:12   ` Jan Beulich
2023-12-11 13:36     ` Cui, Lili
2023-12-11 16:50       ` Jan Beulich
2023-12-13 10:42         ` Cui, Lili
2024-03-22 10:02     ` Jan Beulich
2024-03-22 10:31       ` Jan Beulich
2024-03-26  2:04         ` Cui, Lili
2024-03-26  7:06           ` Jan Beulich
2024-03-26  7:18             ` Cui, Lili
2024-03-22 10:59       ` Jan Beulich
2024-03-26  8:22         ` Cui, Lili
2024-03-26  9:30           ` Jan Beulich
2024-03-27  2:41             ` Cui, Lili
2023-12-08 14:27   ` Jan Beulich
2023-12-12  5:53     ` Cui, Lili
2023-12-12  8:28       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 7/9] Support APX Push2/Pop2 Cui, Lili
2023-12-11 11:17   ` Jan Beulich
2023-12-15  8:38     ` Cui, Lili
2023-12-15  8:44       ` Jan Beulich
2023-11-24  7:02 ` [PATCH v3 8/9] Support APX NDD optimized encoding Cui, Lili
2023-12-11 12:27   ` Jan Beulich
2023-12-12  3:18     ` Hu, Lin1
2023-12-12  8:41       ` Jan Beulich
2023-12-13  5:31         ` Hu, Lin1
2023-12-12  8:45       ` Jan Beulich
2023-12-13  6:06         ` Hu, Lin1
2023-12-13  8:19           ` Jan Beulich
2023-12-13  8:34             ` Hu, Lin1
2023-11-24  7:02 ` [PATCH v3 9/9] Support APX JMPABS for disassembler Cui, Lili
2023-11-24  7:09 ` [PATCH 1/9] Make const_1_mode print $1 in AT&T syntax Jan Beulich
2023-11-24 11:22   ` Cui, Lili
2023-11-24 12:14     ` Jan Beulich
2023-12-12  2:57 ` Lu, Hongjiu
2023-12-12  8:16 ` Cui, Lili

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB5600FDF8200EEDEF70714AE79E84A@SJ0PR11MB5600.namprd11.prod.outlook.com \
    --to=lili.cui@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).