public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: "Hu, Lin1" <lin1.hu@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH][v3] Support APX NDD optimized encoding.
Date: Fri, 17 Nov 2023 07:24:56 +0000	[thread overview]
Message-ID: <SJ0PR11MB5940EB00D422C5827E9F10FDA6B7A@SJ0PR11MB5940.namprd11.prod.outlook.com> (raw)
In-Reply-To: <4c7a8e8c-de67-4d40-9cba-8fc04de1e309@suse.com>

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, November 15, 2023 5:35 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH][v3] Support APX NDD optimized encoding.
> 
> On 15.11.2023 03:59, Hu, Lin1 wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -7208,6 +7208,43 @@ check_EgprOperands (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Optimize APX NDD insns to legacy insns.  */ static bool
> > +convert_NDD_to_REX2 (const insn_template *t) {
> > +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> > +      && t->opcode_space == SPACE_EVEXMAP4
> > +      && !i.has_nf
> > +      && i.reg_operands >= 2)
> > +    {
> > +      unsigned int readonly_var = ~0;
> > +      unsigned int dest = i.operands - 1;
> > +      unsigned int src1 = i.operands - 2;
> > +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> > +
> > +      if (i.types[src1].bitfield.class == Reg
> > +	  && i.op[src1].regs == i.op[dest].regs)
> > +	readonly_var = src2;
> > +      /* adcx, adox and imul can't support to swap the source operands.  */
> > +      else if (i.types[src2].bitfield.class == Reg
> > +	       && i.op[src2].regs == i.op[dest].regs
> > +	       && optimize > 1
> > +	       && t->opcode_modifier.commutative)
> 
> Comment and code still aren't in line: "support to swap the source
> operands"
> really is the D attribute in the opcode table, whereas
> t->opcode_modifier.commutative is related to the C attribute (and all
> t->three
> insns named really are commutative). It looks to me that the code is correct,
> so it would then be the comment that may need updating. But it may also
> be better to additionally check .d here (making the code robust against C
> being added to the truly commutative yet not eligible to be optimized insns).
> In which case the comment might say "adcx, adox, and imul, while
> commutative, don't support to swap the source operands".
>

I think we don't need to worry about it for now, because we've constrained the function with vexvvvvvvdest, and these instructions must be NDD instructions. And adcx, adox and imul don't have D attribute. If I add check .d here, I will need to exclude them. The code will back, which we had initially hoped to avoid by using C.

>
> > +	readonly_var = src1;
> > +      if (readonly_var != (unsigned int) ~0)
> > +	{
> > +	  if (readonly_var != src2)
> > +	    swap_2_operands (readonly_var, src2);
> > +
> > +	  --i.operands;
> > +	  --i.reg_operands;
> > +
> > +	  return true;
> > +	}
> > +    }
> > +  return false;
> > +}
> > +
> >  /* Helper function for the progress() macro in match_template().  */
> > static INLINE enum i386_error progress (enum i386_error new,
> >  					enum i386_error last,
> > @@ -7728,6 +7765,21 @@ match_template (char mnem_suffix)
> >  	  i.memshift = memshift;
> >  	}
> >
> > +      /* If we can optimize a NDD insn to non-NDD insn, like
> 
> The terminology here wants to match the function name below, i.e. (as
> indicated elsewhere for the name, in reply to your question) "legacy"
> instead of "non-NDD" (assuming the function name is changed as well, in
> line with that).
>

OK.

> 
> > +	 add %r16, %r8, %r8 -> add %r16, %r8,
> > +	 add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> > +	 Note that the semantics have not been changed.  */
> > +      if (optimize
> > +	  && !i.no_optimize
> > +	  && i.vec_encoding != vex_encoding_evex
> > +	  && t + 1 < current_templates->end
> > +	  && !t[1].opcode_modifier.evex
> 
> This is more fragile than it needs to be; it would imo be better to indeed go
> from opcode space of the supposed alternative encoding. Perhaps that's
> going to mean checking both.
>

Based on our previous discussion, I modified tc-i386.c as follows

+/* Check if the instruction use the REX registers.  */
+static bool
+check_RexOperands (const insn_template *t)
+{
+  for (unsigned int op = 0; op < i.operands; op++)
+    {
+      if (i.types[op].bitfield.class != Reg
+         /* Special case for (%dx) while doing input/output op */
+         || i.input_output_operand)
+       continue;
+
+      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
+       return true;
+    }
+
+  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
+      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
+    return true;
+
+  /* Check pseudo prefix {rex} are valid.  */
+  if (i.rex_encoding)
+    return true;
+  return false;
+}
+
+/* Optimize APX NDD insns to legacy insns.  */
+static unsigned int
+convert_NDD_to_legacy (const insn_template *t)
+{
+  unsigned int readonly_var = ~0;
+
+  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
+      && t->opcode_space == SPACE_EVEXMAP4
+      && !i.has_nf
+      && i.reg_operands >= 2)
+    {
+      unsigned int dest = i.operands - 1;
+      unsigned int src1 = i.operands - 2;
+      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+      if (i.types[src1].bitfield.class == Reg
+         && i.op[src1].regs == i.op[dest].regs)
+       readonly_var = src2;
+      /* adcx, adox, and imul, while commutative, don't support to swap
+        the source operands.  */
+      else if (i.types[src2].bitfield.class == Reg
+              && i.op[src2].regs == i.op[dest].regs
+              && optimize > 1
+              && t->opcode_modifier.commutative)
+       readonly_var = src1;
+    }
+  return readonly_var;
+}
+

@@ -7728,6 +7782,55 @@ match_template (char mnem_suffix)
          i.memshift = memshift;
        }

+      /* If we can optimize a NDD insn to legacy insn, like
+        add %r16, %r8, %r8 -> add %r16, %r8,
+        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
+        Note that the semantics have not been changed.  */
+      if (optimize
+         && !i.no_optimize
+         && i.vec_encoding != vex_encoding_evex
+         && t + 1 < current_templates->end
+         && !t[1].opcode_modifier.evex
+         && t[1].opcode_space <= SPACE_0F38)
+       {
+         unsigned int readonly_var = convert_NDD_to_legacy (t);
+         size_match = true;
+
+         if (readonly_var != (unsigned int) ~0)
+           {
+             for (j = 0; j < i.operands - 2; j++)
+               {
+                 check_register = j;
+                 if (t->opcode_modifier.d)
+                   check_register ^= 1;
+                 overlap0 = operand_type_and (i.types[check_register],
+                                              t[1].operand_types[check_register]);
+                 if (!operand_type_match (overlap0, i.types[check_register]))
+                   size_match = false;
+               }
+
+             if (size_match
+                 && (t[1].opcode_space <= SPACE_0F
+                     || (!check_EgprOperands (t + 1)	 // These conditions are exclude adcx/adox with inappropriate registers.
+                         && !check_RexOperands (t + 1)
+                         && !i.op[i.operands - 1].regs->reg_type.bitfield.qword)))
+               {
+                 unsigned int src1 = i.operands - 2;
+                 unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
+
+                 if (readonly_var != src2)
+                   swap_2_operands (readonly_var, src2);
+
+                 --i.operands;
+                 --i.reg_operands;
+
+                 specific_error = progress (internal_error);
+                 continue;
+               }
+
+           }
+       }
+
       /* We've found a match; break out of loop.  */
       break;

What's your opinion?

BRs,
Lin

  reply	other threads:[~2023-11-17  7:25 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05   ` Jan Beulich
2023-11-03  6:20     ` Cui, Lili
2023-11-03 13:05     ` Jan Beulich
2023-11-03 14:19   ` Jan Beulich
2023-11-06 15:20     ` Cui, Lili
2023-11-06 16:08       ` Jan Beulich
2023-11-07  8:16         ` Cui, Lili
2023-11-07 10:43           ` Jan Beulich
2023-11-07 15:31             ` Cui, Lili
2023-11-07 15:43               ` Jan Beulich
2023-11-07 15:53                 ` Cui, Lili
2023-11-06 15:02   ` Jan Beulich
2023-11-07  8:06     ` Cui, Lili
2023-11-07 10:20       ` Jan Beulich
2023-11-07 14:32         ` Cui, Lili
2023-11-07 15:08           ` Jan Beulich
2023-11-06 15:39   ` Jan Beulich
2023-11-09  8:02     ` Cui, Lili
2023-11-09 10:52       ` Jan Beulich
2023-11-09 13:27         ` Cui, Lili
2023-11-09 15:22           ` Jan Beulich
2023-11-10  7:11             ` Cui, Lili
2023-11-10  9:14               ` Jan Beulich
2023-11-10  9:21                 ` Jan Beulich
2023-11-10 12:38                   ` Cui, Lili
2023-12-14 10:13                   ` Cui, Lili
2023-12-18 15:24                     ` Jan Beulich
2023-12-18 16:23                       ` H.J. Lu
2023-11-10  9:47                 ` Cui, Lili
2023-11-10  9:57                   ` Jan Beulich
2023-11-10 12:05                     ` Cui, Lili
2023-11-10 12:35                       ` Jan Beulich
2023-11-13  0:18                         ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08  9:11   ` Jan Beulich
2023-11-15 14:56     ` Cui, Lili
2023-11-16  9:17       ` Jan Beulich
2023-11-16 15:34     ` Cui, Lili
2023-11-16 16:50       ` Jan Beulich
2023-11-17 12:42         ` Cui, Lili
2023-11-17 14:38           ` Jan Beulich
2023-11-22 13:40             ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39   ` Jan Beulich
2023-11-20  1:19     ` Cui, Lili
2023-11-08 11:13   ` Jan Beulich
2023-11-20 12:36     ` Cui, Lili
2023-11-20 16:33       ` Jan Beulich
2023-11-22  7:46         ` Cui, Lili
2023-11-22  8:47           ` Jan Beulich
2023-11-22 10:45             ` Cui, Lili
2023-11-23 10:57               ` Jan Beulich
2023-11-23 12:14                 ` Cui, Lili
2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07  8:17                   ` Cui, Lili
2023-12-07  8:33                     ` Cui, Lili
2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20  1:33     ` Cui, Lili
2023-11-20  8:19       ` Jan Beulich
2023-11-20 12:54         ` Cui, Lili
2023-11-20 16:43           ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44   ` Jan Beulich
2023-11-08 12:52     ` Jan Beulich
2023-11-22  5:48     ` Cui, Lili
2023-11-22  8:53       ` Jan Beulich
2023-11-22 12:26         ` Cui, Lili
2023-11-09  9:57   ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36   ` Jan Beulich
2023-11-10  5:43     ` Hu, Lin1
2023-11-10  9:54       ` Jan Beulich
2023-11-14  2:28         ` Hu, Lin1
2023-11-14 10:50           ` Jan Beulich
2023-11-15  2:52             ` Hu, Lin1
2023-11-15  8:57               ` Jan Beulich
2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
2023-11-15  9:34               ` Jan Beulich
2023-11-17  7:24                 ` Hu, Lin1 [this message]
2023-11-17  9:47                   ` Jan Beulich
2023-11-20  3:28                     ` Hu, Lin1
2023-11-20  8:34                       ` Jan Beulich
2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20           ` Jan Beulich
2023-11-15  1:49             ` Hu, Lin1
2023-11-15  8:52               ` Jan Beulich
2023-11-17  3:27                 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59   ` Jan Beulich
2023-11-14  3:26     ` Hu, Lin1
2023-11-14 11:15       ` Jan Beulich
2023-11-24  5:40         ` Hu, Lin1
2023-11-24  7:21           ` Jan Beulich
2023-11-27  2:16             ` Hu, Lin1
2023-11-27  8:03               ` Jan Beulich
2023-11-27  8:46                 ` Hu, Lin1
2023-11-27  8:54                   ` Jan Beulich
2023-11-27  9:03                     ` Hu, Lin1
2023-11-27 10:32                       ` Jan Beulich
2023-12-04  7:33                         ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42   ` Cui, Lili
2023-11-06  7:30     ` Jan Beulich
2023-11-06 14:20       ` Cui, Lili
2023-11-06 14:44         ` Jan Beulich
2023-11-06 16:03           ` Cui, Lili
2023-11-06 16:10             ` Jan Beulich
2023-11-07  1:53               ` Cui, Lili
2023-11-07 10:11                 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB5940EB00D422C5827E9F10FDA6B7A@SJ0PR11MB5940.namprd11.prod.outlook.com \
    --to=lin1.hu@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).