RE: [PATCH 4/8] Support APX NDD

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: "Cui, Lili" <lili.cui@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"Kong, Lingling" <lingling.kong@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH 4/8] Support APX NDD
Date: Sun, 22 Oct 2023 14:05:34 +0000	[thread overview]
Message-ID: <SJ0PR11MB5600EAA8556B44606A5908AB9ED9A@SJ0PR11MB5600.namprd11.prod.outlook.com> (raw)
In-Reply-To: <9d317289-6d83-3f9b-ef34-af574e798a3f@suse.com>



> >  gas/config/tc-i386.c                    |  80 ++++++++----
> >  gas/testsuite/gas/i386/x86-64-apx-ndd.d | 165
> > ++++++++++++++++++++++++  gas/testsuite/gas/i386/x86-64-apx-ndd.s |
> > 156 ++++++++++++++++++++++  gas/testsuite/gas/i386/x86-64-pseudos.d |
> > 42 ++++++  gas/testsuite/gas/i386/x86-64-pseudos.s |  43 ++++++
> >  gas/testsuite/gas/i386/x86-64.exp       |   1 +
> >  opcodes/i386-dis-evex-prefix.h          |   4 +-
> >  opcodes/i386-dis-evex-reg.h             | 123 ++++++++++++++++++
> >  opcodes/i386-dis-evex.h                 | 124 +++++++++---------
> >  opcodes/i386-dis.c                      |  47 ++++++-
> >  opcodes/i386-opc.h                      |   1 +
> >  opcodes/i386-opc.tbl                    |  67 ++++++++++
> >  12 files changed, 762 insertions(+), 91 deletions(-)  create mode
> > 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.d
> >  create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd.s
> >
> > diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index
> > 48916bc3846..381e389bb04 100644
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -2261,8 +2261,9 @@ operand_size_match (const insn_template *t)
> >        unsigned int given = i.operands - j - 1;
> >
> >        /* For FMA4 and XOP insns VEX.W controls just the first two
> > -	 register operands.  */
> > -      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP))
> > +	 register operands. And APX insns just swap the first operands.  */
> > +      if (is_cpu (t, CpuFMA4) || is_cpu (t, CpuXOP)
> > +	  || (is_cpu (t,CpuAPX_F) && i.operands == 3))
> >  	given = j < 2 ? 1 - j : j;
> 
> In the comment, how about "And APX_F insns just swap the two source
> operands, with the 3rd one being the destination"?
>
 
Done.

> Is the "i.operands == 3" part of the condition really needed? I.e. are there any
> APX_F insns which can make it here but must not take this path? Afaict 2-
> operand insns are fine to go here, and more-than-3-operand insns don't come
> with the D attribute.
> 
You're right, deleted "i.operands == 3".

> Also (nit) there's a missing blank after the comma.
> 

Done.

> > @@ -3876,6 +3877,7 @@ is_any_apx_encoding (void)  {
> >    return i.rex2
> >      || i.rex2_encoding
> > +    || i.tm.opcode_space == SPACE_EVEXMAP4
> >      || (i.vex.register_specifier
> >  	&& i.vex.register_specifier->reg_flags & RegRex2);  } @@ -4204,6
> > +4206,10 @@ build_legacy_insns_with_apx_encoding (void)
> >      }
> >
> >    build_evex_insns_with_extend_evex_prefix ();
> > +
> > +  /* Encode the NDD bit.  */
> > +  if (i.vex.register_specifier)
> > +    i.vex.bytes[3] |= 0x10;
> >  }
> >
> >  static void
> > @@ -7383,26 +7389,31 @@ match_template (char mnem_suffix)
> >  	  overlap1 = operand_type_and (operand_types[0],
> operand_types[1]);
> >  	  if (t->opcode_modifier.d && i.reg_operands == i.operands
> >  	      && !operand_type_all_zero (&overlap1))
> > -	    switch (i.dir_encoding)
> > -	      {
> > -	      case dir_encoding_load:
> > -		if (operand_type_check (operand_types[i.operands - 1],
> anymem)
> > -		    || t->opcode_modifier.regmem)
> > -		  goto check_reverse;
> > -		break;
> > +	    {
> > +	      int isMemOperand = (t->opcode_modifier.vexvvvv
> > +				  && t->opcode_space == SPACE_EVEXMAP4)
> > +				  ? i.operands - 2 : i.operands - 1;
> 
> "is" in the variable name is properly misleading. What you're determining
> here is which operand you want to _check_ for being the memory operand.
> 

Changed to MemOperand.
 
> > +				
> As to the condition, the two side of && may want swapping: In such a
> condition it is generally desirable to have the more restricting part first. Plus
> this may be more neat to express without ?: anyway:
> 
> i.operands - 1 - (t->opcode_space == SPACE_EVEXMAP4 && t-
> >opcode_modifier.vexvvvv)
> 
> (suitably line wrapped of course).
> 

Done.

> > +	      switch (i.dir_encoding)
> > +		{
> > +		case dir_encoding_load:
> > +		  if (operand_type_check (operand_types[isMemOperand],
> anymem)
> > +		      || t->opcode_modifier.regmem)
> > +		    goto check_reverse;
> > +		  break;
> >
> > -	      case dir_encoding_store:
> > -		if (!operand_type_check (operand_types[i.operands - 1],
> anymem)
> > -		    && !t->opcode_modifier.regmem)
> > -		  goto check_reverse;
> > -		break;
> > +		case dir_encoding_store:
> > +		  if (!operand_type_check (operand_types[isMemOperand],
> anymem)
> > +		      && !t->opcode_modifier.regmem)
> > +		    goto check_reverse;
> > +		  break;
> >
> > -	      case dir_encoding_swap:
> > -		goto check_reverse;
> > +		case dir_encoding_swap:
> > +		  goto check_reverse;
> >
> > -	      case dir_encoding_default:
> > -		break;
> > -	      }
> > +		case dir_encoding_default:
> > +		  break;
> > +		}
> > +	    }
> >  	  /* If we want store form, we skip the current load.  */
> >  	  if ((i.dir_encoding == dir_encoding_store
> >  	       || i.dir_encoding == dir_encoding_swap) @@ -7432,11 +7443,13
> > @@ match_template (char mnem_suffix)
> >  		continue;
> >  	      /* Try reversing direction of operands.  */
> >  	      j = is_cpu (t, CpuFMA4)
> > -		  || is_cpu (t, CpuXOP) ? 1 : i.operands - 1;
> > +		  || is_cpu (t, CpuXOP)
> > +		  || is_cpu (t, CpuAPX_F) ? 1 : i.operands - 1;
> >  	      overlap0 = operand_type_and (i.types[0], operand_types[j]);
> >  	      overlap1 = operand_type_and (i.types[j], operand_types[0]);
> >  	      overlap2 = operand_type_and (i.types[1], operand_types[1]);
> > -	      gas_assert (t->operands != 3 || !check_register);
> > +	      gas_assert (t->operands != 3 || !check_register
> > +			  || is_cpu (t,CpuAPX_F));
> 
> Nit: Missing blank again.

Done.

> 
> > @@ -7471,6 +7484,12 @@ match_template (char mnem_suffix)
> >  		  found_reverse_match = Opcode_VexW;
> >  		  goto check_operands_345;
> >  		}
> > +	      else if (is_cpu (t,CpuAPX_F)
> > +		       && i.operands == 3)
> > +		{
> > +		  found_reverse_match = Opcode_APX_NDDD;
> > +		  goto check_operands_345;
> > +		}
> >  	      else if (t->opcode_space != SPACE_BASE
> >  		       && (t->opcode_space != SPACE_0F
> >  			   /* MOV to/from CR/DR/TR, as an exception, follow
> @@ -7636,6
> > +7655,15 @@ match_template (char mnem_suffix)
> >  	 flipping VEX.W.  */
> >        i.tm.opcode_modifier.vexw ^= VEXW0 ^ VEXW1;
> >
> > +      j = i.tm.operand_types[0].bitfield.imm8;
> > +      i.tm.operand_types[j] = operand_types[j + 1];
> > +      i.tm.operand_types[j + 1] = operand_types[j];
> > +      break;
> 
> I'm not overly happy to see this code getting duplicated. Are there any
> encodings at all which have D and and immediate operand? I don't think so, in
> which case this at least wants simplifying. But read on.
> 
> > +    case Opcode_APX_NDDD:
> > +      /* Only the first two register operands need reversing.  */
> > +      i.tm.base_opcode ^= 0x2;
> 
> I think you mean Opcode_D here?
> 
> >        j = i.tm.operand_types[0].bitfield.imm8;
> >        i.tm.operand_types[j] = operand_types[j + 1];
> >        i.tm.operand_types[j + 1] = operand_types[j];
> 
> Taking both remarks together, do we need Opcode_APX_NDDD at all? Can't
> you use the ordinary Opcode_D, with
> 
>     default:
>       /* If we found a reverse match we must alter the opcode direction
> 	 bit and clear/flip the regmem modifier one.  found_reverse_match
> 	 holds bits to change (different for int & float insns).  */
> 
>       i.tm.base_opcode ^= found_reverse_match;
> 
>       if (i.tm.opcode_space == SPACE_EVEXMAP4 && i.operands == 3)
>         goto swap_first_2;
>     ...
>     swap_first_2:
>       j = i.tm.operand_types[0].bitfield.imm8;
>       i.tm.operand_types[j] = operand_types[j + 1];
>       i.tm.operand_types[j + 1] = operand_types[j];
>       break;
> 
> ? (I'm not convinced the i.operands == 3 part of the condition is needed; if at
> all possible it wants omitting.)
> 

Your suggestion is indeed better than before. It worked without "i.operands == 3".

> > @@ -8462,8 +8490,8 @@ process_operands (void)
> >    const reg_entry *default_seg = NULL;
> >
> >    /* We only need to check those implicit registers for instructions
> > -     with 3 operands or less.  */
> > -  if (i.operands <= 3)
> > +     with 4 operands or less.  */
> > +  if (i.operands <= 4)
> >      for (unsigned int j = 0; j < i.operands; j++)
> >        if (i.types[j].bitfield.instance != InstanceNone)
> >  	i.reg_operands--;
> 
> How useful is it to keep the outer if() when 4-operand insns now also need
> checking? There are extremely few 5-operand ones ...
> 

Deleted it.

> > @@ -8825,6 +8853,9 @@ build_modrm_byte (void)
> >        break;
> >    if (v >= dest)
> >      v = ~0;
> > +  if (i.tm.opcode_space == SPACE_EVEXMAP4
> > +      && i.tm.opcode_modifier.vexvvvv)
> > +    v = dest;
> >    if (i.tm.extension_opcode != None)
> >      {
> >        if (dest != source)
> > @@ -9088,6 +9119,9 @@ build_modrm_byte (void)
> >        set_rex_vrex (i.op[op].regs, REX_B, false);
> >  	}
> >
> > +      if (i.tm.opcode_space == SPACE_EVEXMAP4
> > +	  && i.tm.opcode_modifier.vexvvvv)
> > +	dest--;
> >        if (op == dest)
> >  	dest = ~0;
> >        if (op == source)
> 
> These two changes are at the very least problematic with .insn, whose
> behavior may not change. I'd also prefer if we could get away with just one
> change to the function. Did you consider alternatives? We could re- widen
> VexVVVV, such that the value 2 indicates that the destination is encoded there.
> That then also has no chance of conflicting with .insn.
> 
I added value 2 for NDD, if it's ok, I will create another patch to move  (i.tm.extension_opcode != None) to VexVVVVDEST branch, and use value 3 instead of SWAP_SOURCES, maybe name it VexVVVVSRC1, or just VexVVVVOP1, VexVVVVOP2 and VexVVVVOP3?

  /* How to encode VEX.vvvv:
     0: VEX.vvvv must be 1111b.
     1: VEX.vvvv encodes one of the register operands.
     2: VEX.vvvv encodes as the dest register operands.
   */
#define VexVVVVSRC   1
#define VexVVVVDEST  2
  VexVVVV,


  if (i.tm.opcode_modifier.vexvvvv == VexVVVVDEST)
    {
      v = dest;
      dest-- ;
    }
  else if (i.tm.opcode_modifier.vexvvvv == VexVVVVSRC)
    {
      v = source + 1;
      for (v = source + 1; v < dest; ++v)
        if (v != reg_slot)
          break;
      if (i.tm.extension_opcode != None)
        {
          if (dest != source)
            v = dest;
          dest = ~0;
        }
      gas_assert (source < dest);
      if (i.tm.opcode_modifier.operandconstraint == SWAP_SOURCES
          && source != op)
        {
          unsigned int tmp = source;

          source = v;
          v = tmp;
        }
    }
  else
    v = ~0; 

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd.s
> > @@ -0,0 +1,156 @@
> > +# Check 64bit APX NDD instructions with evex prefix encoding
> > +
> > +	.allow_index_reg
> > +	.text
> > +_start:
> > +cmovge 0x90909090(%eax),%edx,%r8d
> > +cmovle 0x90909090(%eax),%edx,%r8d
> > +cmovg  0x90909090(%eax),%edx,%r8d
> > +imul   0x90909(%eax),%edx,%r8d
> > +imul   0x909(%rax,%r31,8),%rdx,%r25
> 
> What about imul by immediate? The present spec is quite unclear there:
> The insn page says {ND=ZU} and the table says 0/1 in the ND column.
>

We don't support it yet, I put it in RFC.
...
2. Support APX ZU   -- In progress
3. Support APX CCMP and CTEST -- In progress
...

About 0/1 in the ND column, it means ZU can be 0/1.

IMUL with opcodes 0x69 and 0x6B in map 0 and SETcc instructions
Although these instructions do not support NDD, the EVEX.ND bit is used to control whether its
destination register has its upper bits (namely, bits [63:OSIZE]) zeroed when OSIZE is 8b or 16b.
That is, if EVEX.ND = 1, the upper bits are always zeroed; otherwise, they keep the old values
when OSIZE is 8b or 16b. For these instructions, EVEX.[V4,V3,V2,V1,V0] must be all zero.

> > +.byte 0x62,0xf4,0xfc,0x08,0xff,0xc0  #inc %rax .byte
> > +0x62,0xf4,0xec,0x08,0xff,0xc0  #bad
> 
> As before, please avoid .byte whenever possible. And please have a more
> detailed comment as to what is being encoded, when .byte cannot be avoided.
> Plus, if at all possible, have "bad" tests live in separate testcases from "good"
> ones.
> 

This case wants to test that inc supports evex format without GPR32,  patch part II 1/6 will cover it. The first case has been removed and the second case has been added to x86-64-apx-evex-promoted-bad.s.
 
> > --- a/opcodes/i386-dis-evex-prefix.h
> > +++ b/opcodes/i386-dis-evex-prefix.h
> 
> Once again I'll reply to disassembler changes separately.
> 

Ok.

> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -960,6 +960,7 @@ typedef struct insn_template
> >  /* The next value is arbitrary, as long as it's non-zero and distinct
> >     from all other values above.  */
> >  #define Opcode_VexW	0xf /* Operand order controlled by VEX.W. */
> > +#define Opcode_APX_NDDD	0x11 /* Direction bit for APX NDD insns. */
> 
> The comment talks of a single bit, but the value has two bits set.
> Plus in the code you also don't use this constant as described by the
> comment. Aiui like for Opcode_VexW the value is really arbitrary, just as long
> as it's different from others. In which case I'd rather suggest using e.g. 0xe (if,
> unlike suggested above, Opcode_D cannot be re-used).
> 
> Also I don't think there's a need for three D-s in the name.
> 

Deleted Opcode_APX_NDDD.

> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> 
> Comments given on the earlier patch apply here (and elsewhere) as well.
> 
> Jan

next prev parent reply	other threads:[~2023-10-22 14:05 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-19 15:25 [PATCH 0/8] [RFC] Support Intel APX EGPR Cui, Lili
2023-09-19 15:25 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-09-21 15:27   ` Jan Beulich
2023-09-27 15:57     ` Cui, Lili
2023-09-21 15:51   ` Jan Beulich
2023-09-27 15:59     ` Cui, Lili
2023-09-28  8:02       ` Jan Beulich
2023-10-07  3:27         ` Cui, Lili
2023-09-19 15:25 ` [PATCH 2/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-09-22 10:12   ` Jan Beulich
2023-10-17 15:48     ` Cui, Lili
2023-10-18  6:40       ` Jan Beulich
2023-10-18 10:44         ` Cui, Lili
2023-10-18 10:50           ` Jan Beulich
2023-09-22 10:50   ` Jan Beulich
2023-10-17 15:50     ` Cui, Lili
2023-10-17 16:11       ` Jan Beulich
2023-10-18  2:02         ` Cui, Lili
2023-10-18  6:10           ` Jan Beulich
2023-09-25  6:03   ` Jan Beulich
2023-10-17 15:52     ` Cui, Lili
2023-10-17 16:12       ` Jan Beulich
2023-10-18  6:31         ` Cui, Lili
2023-10-18  6:47           ` Jan Beulich
2023-10-18  7:52             ` Cui, Lili
2023-10-18  8:21               ` Jan Beulich
2023-10-18 11:30                 ` Cui, Lili
2023-10-19 11:58                   ` Cui, Lili
2023-10-19 15:24                     ` Jan Beulich
2023-10-19 16:38                       ` Cui, Lili
2023-10-20  6:25                         ` Jan Beulich
2023-10-22 14:33                           ` Cui, Lili
2023-09-19 15:25 ` [PATCH 3/8] Add tests for " Cui, Lili
2023-09-27 13:11   ` Jan Beulich
2023-10-17 15:53     ` FW: " Cui, Lili
2023-10-17 16:19       ` Jan Beulich
2023-10-18  2:32         ` Cui, Lili
2023-10-18  6:05           ` Jan Beulich
2023-10-18  7:16             ` Cui, Lili
2023-10-18  8:05               ` Jan Beulich
2023-10-18 11:26                 ` Cui, Lili
2023-10-18 12:06                   ` Jan Beulich
2023-10-25 16:03                     ` Cui, Lili
2023-09-27 13:19   ` Jan Beulich
2023-09-19 15:25 ` [PATCH 4/8] Support APX NDD Cui, Lili
2023-09-27 14:44   ` Jan Beulich
2023-10-22 14:05     ` Cui, Lili [this message]
2023-10-23  7:12       ` Jan Beulich
2023-10-25  8:10         ` Cui, Lili
2023-10-25  8:47           ` Jan Beulich
2023-10-25 15:49             ` Cui, Lili
2023-10-25 15:59               ` Jan Beulich
2023-09-28  7:57   ` Jan Beulich
2023-10-22 14:57     ` Cui, Lili
2023-10-24 11:39     ` Cui, Lili
2023-10-24 11:58       ` Jan Beulich
2023-10-25 15:29         ` Cui, Lili
2023-09-19 15:25 ` [PATCH 5/8] Support APX NDD optimized encoding Cui, Lili
2023-09-28  9:29   ` Jan Beulich
2023-10-23  2:57     ` Hu, Lin1
2023-10-23  7:23       ` Jan Beulich
2023-10-23  7:50         ` Hu, Lin1
2023-10-23  8:15           ` Jan Beulich
2023-10-24  1:40             ` Hu, Lin1
2023-10-24  6:03               ` Jan Beulich
2023-10-24  6:08                 ` Hu, Lin1
2023-10-23  3:07     ` [PATCH-V2] " Hu, Lin1
2023-10-23  3:30     ` [PATCH 5/8] [v2] " Hu, Lin1
2023-10-23  7:26       ` Jan Beulich
2023-09-19 15:25 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-09-28 11:37   ` Jan Beulich
2023-10-30 15:21     ` Cui, Lili
2023-10-30 15:31       ` Jan Beulich
2023-11-20 13:05         ` Cui, Lili
2023-09-19 15:25 ` [PATCH 7/8] Support APX NF Cui, Lili
2023-09-25  6:07   ` Jan Beulich
2023-09-28 12:42   ` Jan Beulich
2023-11-02 10:15     ` Cui, Lili
2023-11-02 10:23       ` Jan Beulich
2023-11-02 10:46         ` Cui, Lili
2023-12-12  2:59           ` H.J. Lu
2023-09-19 15:25 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-09-28 13:11   ` Jan Beulich
2023-11-02  2:32     ` Hu, Lin1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB5600EAA8556B44606A5908AB9ED9A@SJ0PR11MB5600.namprd11.prod.outlook.com \
    --to=lili.cui@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    --cc=lingling.kong@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).