RE: [PATCH][v3] Support APX NDD optimized encoding.

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: "Hu, Lin1" <lin1.hu@intel.com>
To: "Beulich, Jan" <JBeulich@suse.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>
Subject: RE: [PATCH][v3] Support APX NDD optimized encoding.
Date: Mon, 20 Nov 2023 03:28:55 +0000	[thread overview]
Message-ID: <SJ0PR11MB5940BA0B79C7A0D86552CB1CA6B4A@SJ0PR11MB5940.namprd11.prod.outlook.com> (raw)
In-Reply-To: <018bb6c6-9f01-4723-bf32-1944758d695c@suse.com>

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, November 17, 2023 5:48 PM
> To: Hu, Lin1 <lin1.hu@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH][v3] Support APX NDD optimized encoding.
> 
> On 17.11.2023 08:24, Hu, Lin1 wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Wednesday, November 15, 2023 5:35 PM
> >>
> >> On 15.11.2023 03:59, Hu, Lin1 wrote:
> >>> --- a/gas/config/tc-i386.c
> >>> +++ b/gas/config/tc-i386.c
> >>> @@ -7208,6 +7208,43 @@ check_EgprOperands (const insn_template *t)
> >>>    return 0;
> >>>  }
> >>>
> >>> +/* Optimize APX NDD insns to legacy insns.  */ static bool
> >>> +convert_NDD_to_REX2 (const insn_template *t) {
> >>> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> >>> +      && t->opcode_space == SPACE_EVEXMAP4
> >>> +      && !i.has_nf
> >>> +      && i.reg_operands >= 2)
> >>> +    {
> >>> +      unsigned int readonly_var = ~0;
> >>> +      unsigned int dest = i.operands - 1;
> >>> +      unsigned int src1 = i.operands - 2;
> >>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> >>> +
> >>> +      if (i.types[src1].bitfield.class == Reg
> >>> +	  && i.op[src1].regs == i.op[dest].regs)
> >>> +	readonly_var = src2;
> >>> +      /* adcx, adox and imul can't support to swap the source operands.
> */
> >>> +      else if (i.types[src2].bitfield.class == Reg
> >>> +	       && i.op[src2].regs == i.op[dest].regs
> >>> +	       && optimize > 1
> >>> +	       && t->opcode_modifier.commutative)
> >>
> >> Comment and code still aren't in line: "support to swap the source
> >> operands"
> >> really is the D attribute in the opcode table, whereas
> >> t->opcode_modifier.commutative is related to the C attribute (and all
> >> t->three
> >> insns named really are commutative). It looks to me that the code is
> >> correct, so it would then be the comment that may need updating. But
> >> it may also be better to additionally check .d here (making the code
> >> robust against C being added to the truly commutative yet not eligible to
> be optimized insns).
> >> In which case the comment might say "adcx, adox, and imul, while
> >> commutative, don't support to swap the source operands".
> >>
> >
> > I think we don't need to worry about it for now, because we've constrained
> the function with vexvvvvvvdest, and these instructions must be NDD
> instructions. And adcx, adox and imul don't have D attribute.
> 
> Right, and I thought to leverage this. IOW ...
> 
> > If I add check .d here, I will need to exclude them.
> 
> ... I don't think I understand this.
>

I mean this place is to check if we can optimize something like "adcx %eax, %ebx, %eax -> adcx %ebx, %eax".  Adcx doesn't support D attribute, if we want to add a constraint t->opcode_modifier.d. The constraints should be && (t->opcode_modifier.d || t->mnem_off == MN_adcx) && t->opcode_modifier.commutative, because, adcx doesn't support D attribute. If we doesn't exclude it, it will be stopped by t->opcode_modifier.d.
 
>
> > Based on our previous discussion, I modified tc-i386.c as follows
> >
> > +/* Check if the instruction use the REX registers.  */ static bool
> > +check_RexOperands (const insn_template *t)
> 
> I don't think I can spot a use of the parameter in the function.
>

I merely mimicked Check_EgprOperands, and didn't pay attention to your comments about it. I will remove the parameter.

> 
> > +{
> > +  for (unsigned int op = 0; op < i.operands; op++)
> > +    {
> > +      if (i.types[op].bitfield.class != Reg
> > +         /* Special case for (%dx) while doing input/output op */
> > +         || i.input_output_operand)
> 
> Once again: Is this needed? Respective insns shouldn't even make it here.
> Plus if they did, ...
>

I modified the constraint be

	If (i.types[op].bitfield.class != Reg)
	  continue;
 
>
> > +       continue;
> > +
> > +      if (i.op[op].regs->reg_flags & (RegRex | RegRex64))
> > +       return true;
> 
> ... the loop would continue for (%dx) kind operands anyway.
>
>
> > +    }
> > +
> > +  if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64)))
> > +      || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64))))
> > +    return true;
> > +
> > +  /* Check pseudo prefix {rex} are valid.  */  if (i.rex_encoding)
> > +    return true;
> > +  return false;
> 
> Just "return i.rex_encoding;"?
>

Indeed, it's more simplified.
 
>
> > +}
> > +
> > +/* Optimize APX NDD insns to legacy insns.  */ static unsigned int
> > +convert_NDD_to_legacy (const insn_template *t) {
> > +  unsigned int readonly_var = ~0;
> 
> One issue I continue to have is the name of this variable. Good names help
> understanding what code is doing. And in 3-operand NDD insns there are
> uniformly 2 operands which are only read.
>

Maybe I should change the variable to be called readonly_reg_pos.

> 
> > +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> > +      && t->opcode_space == SPACE_EVEXMAP4
> > +      && !i.has_nf
> > +      && i.reg_operands >= 2)
> > +    {
> > +      unsigned int dest = i.operands - 1;
> > +      unsigned int src1 = i.operands - 2;
> > +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> > +
> > +      if (i.types[src1].bitfield.class == Reg
> > +         && i.op[src1].regs == i.op[dest].regs)
> > +       readonly_var = src2;
> > +      /* adcx, adox, and imul, while commutative, don't support to swap
> > +        the source operands.  */
> > +      else if (i.types[src2].bitfield.class == Reg
> > +              && i.op[src2].regs == i.op[dest].regs
> > +              && optimize > 1
> > +              && t->opcode_modifier.commutative)
> > +       readonly_var = src1;
> > +    }
> > +  return readonly_var;
> > +}
> 
> You're no longer converting anything in this function, which - I'm sorry to say
> that - once again makes its name unsuitable.
>

True, but I think the function's name can be changed later, removing some operations on i prevents me from doing unnecessary backtracking.

I think the function maybe can be named as can_convert_NDD_to_legacy. Return 0 means can't, others mean can.

> 
> > @@ -7728,6 +7782,55 @@ match_template (char mnem_suffix)
> >           i.memshift = memshift;
> >         }
> >
> > +      /* If we can optimize a NDD insn to legacy insn, like
> > +        add %r16, %r8, %r8 -> add %r16, %r8,
> > +        add  %r8, %r16, %r8 -> add %r16, %r8, then rematch template.
> > +        Note that the semantics have not been changed.  */
> > +      if (optimize
> > +         && !i.no_optimize
> > +         && i.vec_encoding != vex_encoding_evex
> > +         && t + 1 < current_templates->end
> > +         && !t[1].opcode_modifier.evex
> > +         && t[1].opcode_space <= SPACE_0F38)
> 
> In all of these checks what I'm missing is a check that we're actually dealing
> with an NDD template.
>

Of course I can add it here, then I'll remove it from convert_NDD_to_legacy.
 
>
> > +       {
> > +         unsigned int readonly_var = convert_NDD_to_legacy (t);
> > +         size_match = true;
> > +
> > +         if (readonly_var != (unsigned int) ~0)
> > +           {
> > +             for (j = 0; j < i.operands - 2; j++)
> > +               {
> > +                 check_register = j;
> > +                 if (t->opcode_modifier.d)
> > +                   check_register ^= 1;
> > +                 overlap0 = operand_type_and (i.types[check_register],
> > +                                              t[1].operand_types[check_register]);
> > +                 if (!operand_type_match (overlap0, i.types[check_register]))
> > +                   size_match = false;
> > +               }
> 
> I'm afraid that without a comment I don't understand what this is about.
> 

I want to make sure that the two neighboring templates have the same input. I tried base_code but it misses some special cases, so I want to start with the first parameter (ATT), like shld Imm8 and shld shiftCount. But some insns with .d. So I need to check if the first operand can match the second type.  Seems like it looks like NDD can simply match that way now.

I'm having some problems with this current version, I've modified it. And delete this part of the code now won't have any effect because the original version of the NDD instruction is sorted by us, and I think that this part will only be used in the case that someone has sorted .tbl incorrectly.

The current version is

+         if (readonly_var != (unsigned int) ~0)
+           {
+             overlap0 = operand_type_and (i.types[0],
+                                          t[1].operand_types[0]);
+             if (t->opcode_modifier.d)
+               overlap1 = operand_type_and (i.types[0],
+                                            t[1].operand_types[1]);
+             if (!operand_type_match (overlap0, i.types[0])
+                 && (!t->opcode_modifier.d
+                     || (t->opcode_modifier.d
+                         && !operand_type_match (overlap1, i.types[0]))))
+               size_match = false;
+

>
> > +             if (size_match
> > +                 && (t[1].opcode_space <= SPACE_0F
> > +                     || (!check_EgprOperands (t + 1)	 // These conditions are
> exclude adcx/adox with inappropriate registers.
> > +                         && !check_RexOperands (t + 1)
> > +                         && !i.op[i.operands -
> > + 1].regs->reg_type.bitfield.qword)))
> 
> Saying "inappropriate" in such a comment doesn't really help, as it's then
> still unclear what is "appropriate". But the comment will need re-formatting
> anyway.
>

I modifed the comment as "Optimizing some non-legacy-map0/1 without REX/REX2 prefix will be valuable."
 
>
> > +               {
> > +                 unsigned int src1 = i.operands - 2;
> 
> Looks like this variable is no longer used?
>

Yes, I removed it.

> 
> > +                 unsigned int src2 = (i.operands > 3) ? i.operands -
> > + 3 : 0;
> > +
> > +                 if (readonly_var != src2)
> > +                   swap_2_operands (readonly_var, src2);
> > +
> > +                 --i.operands;
> > +                 --i.reg_operands;
> > +
> > +                 specific_error = progress (internal_error);
> > +                 continue;
> > +               }
> > +
> > +           }
> > +       }
> > +
> >        /* We've found a match; break out of loop.  */
> >        break;
> >
> > What's your opinion?
> 
> I need some further clarification first, as per above. I also don't think I can
> properly identify (yet) which parts of the code are solely related to the
> ADCX/ADOX special case. The more code that's special for these, the more I'd
> be inclined to ask that dealing with them be a separate patch, for us to
> judge whether effort and effect are in reasonable balance.
> 

Currently only this part of the code 

+                 /* Optimizing some non-legacy-map0/1 without REX/REX2 prefix will be valuable.  */
+                 && (t[1].opcode_space <= SPACE_0F
+                     || (!check_EgprOperands (t + 1)
+                         && !check_RexOperands ()
+                         && !i.op[i.operands - 1].regs->reg_type.bitfield.qword)))

is related to adcx/adox (Include check_RexOperands), all other kinds of constraints are just to make the code more robust. If you still think this part is too complicated, I would consider holding off on optimizing adcx/adox, after all, there aren't many of these types of commands at the moment.

BRs,
Lin

next prev parent reply	other threads:[~2023-11-20  3:29 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05   ` Jan Beulich
2023-11-03  6:20     ` Cui, Lili
2023-11-03 13:05     ` Jan Beulich
2023-11-03 14:19   ` Jan Beulich
2023-11-06 15:20     ` Cui, Lili
2023-11-06 16:08       ` Jan Beulich
2023-11-07  8:16         ` Cui, Lili
2023-11-07 10:43           ` Jan Beulich
2023-11-07 15:31             ` Cui, Lili
2023-11-07 15:43               ` Jan Beulich
2023-11-07 15:53                 ` Cui, Lili
2023-11-06 15:02   ` Jan Beulich
2023-11-07  8:06     ` Cui, Lili
2023-11-07 10:20       ` Jan Beulich
2023-11-07 14:32         ` Cui, Lili
2023-11-07 15:08           ` Jan Beulich
2023-11-06 15:39   ` Jan Beulich
2023-11-09  8:02     ` Cui, Lili
2023-11-09 10:52       ` Jan Beulich
2023-11-09 13:27         ` Cui, Lili
2023-11-09 15:22           ` Jan Beulich
2023-11-10  7:11             ` Cui, Lili
2023-11-10  9:14               ` Jan Beulich
2023-11-10  9:21                 ` Jan Beulich
2023-11-10 12:38                   ` Cui, Lili
2023-12-14 10:13                   ` Cui, Lili
2023-12-18 15:24                     ` Jan Beulich
2023-12-18 16:23                       ` H.J. Lu
2023-11-10  9:47                 ` Cui, Lili
2023-11-10  9:57                   ` Jan Beulich
2023-11-10 12:05                     ` Cui, Lili
2023-11-10 12:35                       ` Jan Beulich
2023-11-13  0:18                         ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08  9:11   ` Jan Beulich
2023-11-15 14:56     ` Cui, Lili
2023-11-16  9:17       ` Jan Beulich
2023-11-16 15:34     ` Cui, Lili
2023-11-16 16:50       ` Jan Beulich
2023-11-17 12:42         ` Cui, Lili
2023-11-17 14:38           ` Jan Beulich
2023-11-22 13:40             ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39   ` Jan Beulich
2023-11-20  1:19     ` Cui, Lili
2023-11-08 11:13   ` Jan Beulich
2023-11-20 12:36     ` Cui, Lili
2023-11-20 16:33       ` Jan Beulich
2023-11-22  7:46         ` Cui, Lili
2023-11-22  8:47           ` Jan Beulich
2023-11-22 10:45             ` Cui, Lili
2023-11-23 10:57               ` Jan Beulich
2023-11-23 12:14                 ` Cui, Lili
2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07  8:17                   ` Cui, Lili
2023-12-07  8:33                     ` Cui, Lili
2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20  1:33     ` Cui, Lili
2023-11-20  8:19       ` Jan Beulich
2023-11-20 12:54         ` Cui, Lili
2023-11-20 16:43           ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44   ` Jan Beulich
2023-11-08 12:52     ` Jan Beulich
2023-11-22  5:48     ` Cui, Lili
2023-11-22  8:53       ` Jan Beulich
2023-11-22 12:26         ` Cui, Lili
2023-11-09  9:57   ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36   ` Jan Beulich
2023-11-10  5:43     ` Hu, Lin1
2023-11-10  9:54       ` Jan Beulich
2023-11-14  2:28         ` Hu, Lin1
2023-11-14 10:50           ` Jan Beulich
2023-11-15  2:52             ` Hu, Lin1
2023-11-15  8:57               ` Jan Beulich
2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
2023-11-15  9:34               ` Jan Beulich
2023-11-17  7:24                 ` Hu, Lin1
2023-11-17  9:47                   ` Jan Beulich
2023-11-20  3:28                     ` Hu, Lin1 [this message]
2023-11-20  8:34                       ` Jan Beulich
2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20           ` Jan Beulich
2023-11-15  1:49             ` Hu, Lin1
2023-11-15  8:52               ` Jan Beulich
2023-11-17  3:27                 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59   ` Jan Beulich
2023-11-14  3:26     ` Hu, Lin1
2023-11-14 11:15       ` Jan Beulich
2023-11-24  5:40         ` Hu, Lin1
2023-11-24  7:21           ` Jan Beulich
2023-11-27  2:16             ` Hu, Lin1
2023-11-27  8:03               ` Jan Beulich
2023-11-27  8:46                 ` Hu, Lin1
2023-11-27  8:54                   ` Jan Beulich
2023-11-27  9:03                     ` Hu, Lin1
2023-11-27 10:32                       ` Jan Beulich
2023-12-04  7:33                         ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42   ` Cui, Lili
2023-11-06  7:30     ` Jan Beulich
2023-11-06 14:20       ` Cui, Lili
2023-11-06 14:44         ` Jan Beulich
2023-11-06 16:03           ` Cui, Lili
2023-11-06 16:10             ` Jan Beulich
2023-11-07  1:53               ` Cui, Lili
2023-11-07 10:11                 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR11MB5940BA0B79C7A0D86552CB1CA6B4A@SJ0PR11MB5940.namprd11.prod.outlook.com \
    --to=lin1.hu@intel.com \
    --cc=JBeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).