Re: [PATCH 7/8] Support APX NDD optimized encoding.

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: "Hu, Lin1" <lin1.hu@intel.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>,
	"ccoutant@gmail.com" <ccoutant@gmail.com>,
	"binutils@sourceware.org" <binutils@sourceware.org>,
	"Cui, Lili" <lili.cui@intel.com>
Subject: Re: [PATCH 7/8] Support APX NDD optimized encoding.
Date: Fri, 10 Nov 2023 10:54:11 +0100	[thread overview]
Message-ID: <ae061ffc-807c-aa05-59eb-dbb02f024956@suse.com> (raw)
In-Reply-To: <SJ0PR11MB5940D3738C2DFF63C9082754A6AEA@SJ0PR11MB5940.namprd11.prod.outlook.com>

On 10.11.2023 06:43, Hu, Lin1 wrote:
>> On 02.11.2023 12:29, Cui, Lili wrote:

Btw, your shrinking of reply context above from here is problematic. Someone
reading just this mail can't tell who ...

>> Similarly I'm concerned of the ND form of CFCMOVcc, which isn't there yet in
>> the patches, but which will also need excluding from this optimization. Obviously
>> this concern then extends to any future ND- encoded insns, which (likely) won't
>> have legacy-encoded (and hence
>> REX2-encodable) counterparts. Are there any plans how to deal with such?
>> (There's a possible approach mentioned further down.)

... originally said this.

> Looking at other current NDD instructions, it should be possible to use evex encoding even if it doesn't have rex2 encoding.

Should be possible - yes. But why would you do such a transformation? That's
not an optimization at all, afaict. And we shouldn't alter what the
programmer wrote if the result isn't in at least some respect deemed better
than the original. Considering this, the helper function may want further
naming differently than already suggested, to e.g. convert_NDD_to_REX2().

>>> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
>>> +
>>> +      if (i.types[src1].bitfield.class == Reg
>>> +	  && i.op[src1].regs == i.op[dest].regs)
>>> +	readonly_var = src2;
>>
>> As can be seen in the testcase, this also results in ADCX/ADOX to be converted to
>> non-ND EVEX forms, i.e. even when that's not a win at all.
>> We shouldn't change what the user has written when the encoding doesn't
>> actually improve. (Or else, but I'd be hesitant to accept that, at the very least the
>> effect would need pointing out in the description or even a code comment, so
>> that later on it is possible to figure out whether that was intentional or an
>> oversight.)
>>
>> This is where my template ordering remark in reply to patch 5 comes into play:
>> Whether invoking re-parse is okay would further need to depend on whether an
>> alternative (earlier) template actually allows
>> REX2 encoding (same base-opcode could be one of the criteria for how far to
>> look back through earlier templates; an option might also be to put the 3-
>> operand templates first, so that looking backwards wouldn't be necessary in the
>> first place). This would then likely also address one of the forward looking
>> concerns I've raised above.
>>
> 
> Indeed, adcx's legacy insn can't support rex2.
> 
> For my problem, I prefer to re-order templates order, because, I hadn't thought of a way to simply move t to the farthest same base_opcode template for the moment. The following is a tentative scenario: the order will be ndd evex - rex2 - evex.

Yes, this matches my understanding / expectation.

> And I will need a tmp_variable to avoid the insn doesn't match the rex2, let me backtrack the match's result and the value of i.

This, however, I'm not convinced of. I'd rather see this vaguely in line
with 58bceb182740 ("x86: prefer VEX encodings over EVEX ones when
possible"): Do another full matching round with the removed operand,
arranging for "internal error" to be raised in case that fails. Your
approach would, I think, result in silent bad code generation in case
something went wrong. Thing is - you don't even need to advance (or
backtrack) t in that case

>>> +	readonly_var = src1;
>>> +      if (readonly_var != (unsigned int) ~0)
>>> +	{
>>> +	  --i.operands;
>>> +	  --i.reg_operands;
>>> +	  --i.tm.operands;
>>> +
>>> +	  if (readonly_var != src2)
>>> +	    swap_2_operands (readonly_var, src2);
>>
>> May I suggest that just out of precaution the swapping be done before operand
>> counts are decremented? In principle swap_2_operands() could do with having
>> assertions added as to it actually dealing with valid operands. (You'll note that
>> elsewhere, when we add a new operand, we increment first and then swap.)
>>
> 
> Indeed, it's safer, I've exchanged the order of execution, do you have any other comments on the assertions (If I understand correctly, there is a desire for some gcc_assert?), for the time being I can guarantee that the two indexes are definitely in range, is there anything else that needs to be judged?

That was a remark towards possible future changes, independent of your
work here. I merely want to make sure that possibly introducing such an
assertion wouldn't require code changes elsewhere when that can be
easily avoided right away.

>>> @@ -7728,6 +7766,14 @@ match_template (char mnem_suffix)
>>>  	  i.memshift = memshift;
>>>  	}
>>>
>>> +      /* If we can optimize a NDD insn to non-NDD insn, like
>>> +	 add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
>>> +      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
>>
>> So you do this optimization at -O1, but not at -O2? Imo the "== 1"
>> simply needs dropping. Furthermore the {nooptimize} and {evex} pseudo
>> prefixes need respecting. Quite likely respecting {evex} would eliminate the need
>> for the explicit .has_nf check in the helper function, as I expect .vec_encoding to
>> be set alongside that bit anyway. Further quite likely respecting {evex} here will
>> mean that in patch 3 you need to introduce a new enumerator (e.g.
>> vex_encoding_egpr, vaguely similar to vex_encoding_evex512), to avoid
>> setting .vec_encoding to vex_encoding_evex when an eGPR is parsed.
>>
>> As to optimization level: In build_vex_prefix() we leverage C only at -O2 or
>> higher (including -Os). We may want to be consistent in this regard here (i.e. by
>> an extra check in the helper function).
>>
> 
> It's a mistake, I have fixed it. The conditions will be. I will try later, after the NF patch is done, to see if the constraint i.has_nf can be removed or not.
> 
>        /* If we can optimize a NDD insn to non-NDD insn, like
>          add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
> -      if (optimize == 1 && optimize_NDD_to_nonNDD (t))
> +      if (!i.no_optimize && i.vec_encoding != vex_encoding_evex
> +         && optimize && optimize_NDD_to_nonNDD (t))
>         {

Regardless of what the final expression is going to be, please keep the
check of "optimize" first, such that the common case of optimization
being disabled will be impacted as little as possible.

>>> +	{
>>> +	  t = current_templates->start - 1;
>>
>> As per a remark further up, this adjustment could be avoided if the ND templates
>> came ahead of the legacy ones. They can't be wrongly used in place of the
>> legacy ones, due to the extra operand they require. Then a comment here would
>> merely point out this ordering aspect. But of course care will then need to be
>> taken to not go past i386_optab[]'s bounds (by having suitably ordered
>> conditionals when looking for whether there is an alternative template in the
>> first place; again see the respective remark further up).
>>
> 
> Yes, if we reorder the template's order, I will remove the line. Only one example of a possible implementation is given here:
> 
>         }
> 
> +      bool have_converted_NDD_to_nonNDD = false;
> +      i386_insn tmp_i;
> +
> +      if (!i.no_optimize && i.vec_encoding != vex_encoding_evex
> +         && optimize && !have_converted_NDD_to_nonNDD
> +         && convert_NDD_to_nonNDD (t))
> +       {
> +         have_converted_NDD_to_nonNDD = true;
> +         tmp_i = i;
> +       }
> +
>        /* We've found a match; break out of loop.  */
>        break;
>      }
> @@ -7787,6 +7802,9 @@ match_template (char mnem_suffix)
>        return NULL;
>      }
> 
> +  if (have_converted_to_nonNDD)
> +    i = tmp_i;
> +
>    if (!quiet_warnings)

I have to admit that I don't understand what the goal is of this playing
with i and tmp_i.

>> For all of the changes below (which are a little hard to review in email), aiui they
>> only add C as needed. I once again would prefer if that attribute could be added
>> right as the templates are introduced, with the description stating the intention
>> and that the actual use of the attribute will be added later (i.e. as expressed
>> earlier already for NF).
> 
> After the changes are finalized, I'll break out this part of the modification that adds the C to lili so she can put it where it belongs.

Hmm, that will need doing early, as the NDD patch is hopefully going to
land soon-ish. Same for the template re-ordering (which will need
explaining when pulled ahead, but it will want pulling ahead to reduce
churn).

Jan

next prev parent reply	other threads:[~2023-11-10  9:54 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05   ` Jan Beulich
2023-11-03  6:20     ` Cui, Lili
2023-11-03 13:05     ` Jan Beulich
2023-11-03 14:19   ` Jan Beulich
2023-11-06 15:20     ` Cui, Lili
2023-11-06 16:08       ` Jan Beulich
2023-11-07  8:16         ` Cui, Lili
2023-11-07 10:43           ` Jan Beulich
2023-11-07 15:31             ` Cui, Lili
2023-11-07 15:43               ` Jan Beulich
2023-11-07 15:53                 ` Cui, Lili
2023-11-06 15:02   ` Jan Beulich
2023-11-07  8:06     ` Cui, Lili
2023-11-07 10:20       ` Jan Beulich
2023-11-07 14:32         ` Cui, Lili
2023-11-07 15:08           ` Jan Beulich
2023-11-06 15:39   ` Jan Beulich
2023-11-09  8:02     ` Cui, Lili
2023-11-09 10:52       ` Jan Beulich
2023-11-09 13:27         ` Cui, Lili
2023-11-09 15:22           ` Jan Beulich
2023-11-10  7:11             ` Cui, Lili
2023-11-10  9:14               ` Jan Beulich
2023-11-10  9:21                 ` Jan Beulich
2023-11-10 12:38                   ` Cui, Lili
2023-12-14 10:13                   ` Cui, Lili
2023-12-18 15:24                     ` Jan Beulich
2023-12-18 16:23                       ` H.J. Lu
2023-11-10  9:47                 ` Cui, Lili
2023-11-10  9:57                   ` Jan Beulich
2023-11-10 12:05                     ` Cui, Lili
2023-11-10 12:35                       ` Jan Beulich
2023-11-13  0:18                         ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08  9:11   ` Jan Beulich
2023-11-15 14:56     ` Cui, Lili
2023-11-16  9:17       ` Jan Beulich
2023-11-16 15:34     ` Cui, Lili
2023-11-16 16:50       ` Jan Beulich
2023-11-17 12:42         ` Cui, Lili
2023-11-17 14:38           ` Jan Beulich
2023-11-22 13:40             ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39   ` Jan Beulich
2023-11-20  1:19     ` Cui, Lili
2023-11-08 11:13   ` Jan Beulich
2023-11-20 12:36     ` Cui, Lili
2023-11-20 16:33       ` Jan Beulich
2023-11-22  7:46         ` Cui, Lili
2023-11-22  8:47           ` Jan Beulich
2023-11-22 10:45             ` Cui, Lili
2023-11-23 10:57               ` Jan Beulich
2023-11-23 12:14                 ` Cui, Lili
2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07  8:17                   ` Cui, Lili
2023-12-07  8:33                     ` Cui, Lili
2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20  1:33     ` Cui, Lili
2023-11-20  8:19       ` Jan Beulich
2023-11-20 12:54         ` Cui, Lili
2023-11-20 16:43           ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44   ` Jan Beulich
2023-11-08 12:52     ` Jan Beulich
2023-11-22  5:48     ` Cui, Lili
2023-11-22  8:53       ` Jan Beulich
2023-11-22 12:26         ` Cui, Lili
2023-11-09  9:57   ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36   ` Jan Beulich
2023-11-10  5:43     ` Hu, Lin1
2023-11-10  9:54       ` Jan Beulich [this message]
2023-11-14  2:28         ` Hu, Lin1
2023-11-14 10:50           ` Jan Beulich
2023-11-15  2:52             ` Hu, Lin1
2023-11-15  8:57               ` Jan Beulich
2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
2023-11-15  9:34               ` Jan Beulich
2023-11-17  7:24                 ` Hu, Lin1
2023-11-17  9:47                   ` Jan Beulich
2023-11-20  3:28                     ` Hu, Lin1
2023-11-20  8:34                       ` Jan Beulich
2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20           ` Jan Beulich
2023-11-15  1:49             ` Hu, Lin1
2023-11-15  8:52               ` Jan Beulich
2023-11-17  3:27                 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59   ` Jan Beulich
2023-11-14  3:26     ` Hu, Lin1
2023-11-14 11:15       ` Jan Beulich
2023-11-24  5:40         ` Hu, Lin1
2023-11-24  7:21           ` Jan Beulich
2023-11-27  2:16             ` Hu, Lin1
2023-11-27  8:03               ` Jan Beulich
2023-11-27  8:46                 ` Hu, Lin1
2023-11-27  8:54                   ` Jan Beulich
2023-11-27  9:03                     ` Hu, Lin1
2023-11-27 10:32                       ` Jan Beulich
2023-12-04  7:33                         ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42   ` Cui, Lili
2023-11-06  7:30     ` Jan Beulich
2023-11-06 14:20       ` Cui, Lili
2023-11-06 14:44         ` Jan Beulich
2023-11-06 16:03           ` Cui, Lili
2023-11-06 16:10             ` Jan Beulich
2023-11-07  1:53               ` Cui, Lili
2023-11-07 10:11                 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae061ffc-807c-aa05-59eb-dbb02f024956@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=ccoutant@gmail.com \
    --cc=hongjiu.lu@intel.com \
    --cc=lili.cui@intel.com \
    --cc=lin1.hu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).