Re: [PATCH 7/8] Support APX NDD optimized encoding.

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>, "Hu, Lin1" <lin1.hu@intel.com>
Cc: hongjiu.lu@intel.com, ccoutant@gmail.com, binutils@sourceware.org
Subject: Re: [PATCH 7/8] Support APX NDD optimized encoding.
Date: Thu, 9 Nov 2023 11:36:09 +0100	[thread overview]
Message-ID: <f003a4d4-8e4f-620e-fd44-388fcf60dff2@suse.com> (raw)
In-Reply-To: <20231102112911.2372810-8-lili.cui@intel.com>

On 02.11.2023 12:29, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -7208,6 +7208,44 @@ check_EgprOperands (const insn_template *t)
>    return 0;
>  }
>  
> +/* Optimize APX NDD insns to non-NDD insns.  */
> +
> +static bool
> +optimize_NDD_to_nonNDD (const insn_template *t)
> +{
> +  if (t->opcode_modifier.vexvvvv == VexVVVV_DST
> +      && t->opcode_space == SPACE_EVEXMAP4
> +      && !i.has_nf

As mentioned before, I'd still prefer the optimization to only be
added after NF handling was put in place. But I'm not meaning to make
this a strict requirement: Introducing the has_nf field here (with
it only being read, never [explicitly] written) is certainly okay-ish.
(But see also below.)

Similarly I'm concerned of the ND form of CFCMOVcc, which isn't there
yet in the patches, but which will also need excluding from this
optimization. Obviously this concern then extends to any future ND-
encoded insns, which (likely) won't have legacy-encoded (and hence
REX2-encodable) counterparts. Are there any plans how to deal with
such? (There's a possible approach mentioned further down.)

> +      && i.reg_operands >= 2
> +      && i.types[i.operands - 1].bitfield.class == Reg)

Isn't this implicit from the VexVVVV check further up?

> +    {
> +      unsigned int readonly_var = ~0;
> +      unsigned int dest = i.operands - 1;
> +      unsigned int src1 = (i.operands > 2) ? i.operands - 2 : 0;

Since we already know i.operands >= 2 from the earlier check of
i.reg_operands, can't this simply be

      unsigned int src1 = i.operands - 2;

?

> +      unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> +
> +      if (i.types[src1].bitfield.class == Reg
> +	  && i.op[src1].regs == i.op[dest].regs)
> +	readonly_var = src2;

As can be seen in the testcase, this also results in ADCX/ADOX to be
converted to non-ND EVEX forms, i.e. even when that's not a win at all.
We shouldn't change what the user has written when the encoding doesn't
actually improve. (Or else, but I'd be hesitant to accept that, at the
very least the effect would need pointing out in the description or
even a code comment, so that later on it is possible to figure out
whether that was intentional or an oversight.)

This is where my template ordering remark in reply to patch 5 comes
into play: Whether invoking re-parse is okay would further need to
depend on whether an alternative (earlier) template actually allows
REX2 encoding (same base-opcode could be one of the criteria for how
far to look back through earlier templates; an option might also be to
put the 3-operand templates first, so that looking backwards wouldn't
be necessary in the first place). This would then likely also address
one of the forward looking concerns I've raised above.

> +      /* adcx, adox and imul don't have D bit.  */
> +      else if (i.types[src2].bitfield.class == Reg
> +	       && i.op[src2].regs == i.op[dest].regs
> +	       && t->opcode_modifier.commutative)

There's a disconnect between comment and code here: You don't use the
D attribute, so why is it being mentioned?

> +	readonly_var = src1;
> +      if (readonly_var != (unsigned int) ~0)
> +	{
> +	  --i.operands;
> +	  --i.reg_operands;
> +	  --i.tm.operands;
> +
> +	  if (readonly_var != src2)
> +	    swap_2_operands (readonly_var, src2);

May I suggest that just out of precaution the swapping be done before
operand counts are decremented? In principle swap_2_operands() could
do with having assertions added as to it actually dealing with valid
operands. (You'll note that elsewhere, when we add a new operand, we
increment first and then swap.)

> +	  return 1;
> +	}
> +    }
> +  return 0;
> +}

Nit: A function returning bool would preferably return true/false, not
0/1.

> @@ -7728,6 +7766,14 @@ match_template (char mnem_suffix)
>  	  i.memshift = memshift;
>  	}
>  
> +      /* If we can optimize a NDD insn to non-NDD insn, like
> +	 add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
> +      if (optimize == 1 && optimize_NDD_to_nonNDD (t))

So you do this optimization at -O1, but not at -O2? Imo the "== 1"
simply needs dropping. Furthermore the {nooptimize} and {evex} pseudo
prefixes need respecting. Quite likely respecting {evex} would
eliminate the need for the explicit .has_nf check in the helper
function, as I expect .vec_encoding to be set alongside that bit
anyway. Further quite likely respecting {evex} here will mean that in
patch 3 you need to introduce a new enumerator (e.g.
vex_encoding_egpr, vaguely similar to vex_encoding_evex512), to avoid
setting .vec_encoding to vex_encoding_evex when an eGPR is parsed.

As to optimization level: In build_vex_prefix() we leverage C only
at -O2 or higher (including -Os). We may want to be consistent in this
regard here (i.e. by an extra check in the helper function).

> +	{
> +	  t = current_templates->start - 1;

As per a remark further up, this adjustment could be avoided if the ND
templates came ahead of the legacy ones. They can't be wrongly used in
place of the legacy ones, due to the extra operand they require. Then
a comment here would merely point out this ordering aspect. But of
course care will then need to be taken to not go past i386_optab[]'s
bounds (by having suitably ordered conditionals when looking for
whether there is an alternative template in the first place; again see
the respective remark further up).

> +	  continue;
> +	}

Btw, considering this re-matching, I wonder whether "convert" wouldn't
be better in the function name compared to "optimize".

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
> @@ -0,0 +1,117 @@
> +# Check 64bit APX NDD instructions with optimized encoding
> +
> +	.text
> +_start:
> +inc    %r31,%r31
> +incb   %r31b,%r31b
> +add    %r31,%r8,%r8
> +addb   %r31b,%r8b,%r8b
> +{store} add    %r31,%r8,%r8
> +{load}  add    %r31,%r8,%r8
> +add    %r31,(%r8),%r31
> +add    (%r31),%r8,%r8
> +add    $0x12344433,%r15,%r15
> +add    $0xfffffffff4332211,%r8,%r8
> +dec    %r17,%r17
> +decb   %r17b,%r17b
> +not    %r17,%r17
> +notb   %r17b,%r17b
> +neg    %r17,%r17
> +negb   %r17b,%r17b
> +sub    %r15,%r17,%r17
> +subb   %r15b,%r17b,%r17b
> +sub    %r15,(%r8),%r15
> +sub    (%r15,%rax,1),%r16,%r16
> +sub    $0x1234,%r30,%r30
> +sbb    %r15,%r17,%r17
> +sbbb   %r15b,%r17b,%r17b
> +sbb    %r15,(%r8),%r15
> +sbb    (%r15,%rax,1),%r16,%r16
> +sbb    $0x1234,%r30,%r30
> +adc    %r15,%r17,%r17
> +adcb   %r15b,%r17b,%r17b
> +adc    %r15,(%r8),%r15
> +adc    (%r15,%rax,1),%r16,%r16
> +adc    $0x1234,%r30,%r30
> +or     %r15,%r17,%r17
> +orb    %r15b,%r17b,%r17b
> +or     %r15,(%r8),%r15
> +or     (%r15,%rax,1),%r16,%r16
> +or     $0x1234,%r30,%r30
> +xor    %r15,%r17,%r17
> +xorb   %r15b,%r17b,%r17b
> +xor    %r15,(%r8),%r15
> +xor    (%r15,%rax,1),%r16,%r16
> +xor    $0x1234,%r30,%r30
> +and    %r15,%r17,%r17
> +andb   %r15b,%r17b,%r17b
> +and    %r15,(%r8),%r15
> +and    (%r15,%rax,1),%r16,%r16
> +and    $0x1234,%r30,%r30
> +ror    %r31,%r31
> +rorb   %r31b,%r31b
> +ror    $0x2,%r12,%r12
> +rorb   $0x2,%r12b,%r12b
> +rol    %r31,%r31
> +rolb   %r31b,%r31b
> +rol    $0x2,%r12,%r12
> +rolb   $0x2,%r12b,%r12b
> +rcr    %r31,%r31
> +rcrb   %r31b,%r31b
> +rcr    $0x2,%r12,%r12
> +rcrb   $0x2,%r12b,%r12b
> +rcl    %r31,%r31
> +rclb   %r31b,%r31b
> +rcl    $0x2,%r12,%r12
> +rclb   $0x2,%r12b,%r12b
> +shl    %r31,%r31
> +shlb   %r31b,%r31b
> +shl    $0x2,%r12,%r12
> +shlb   $0x2,%r12b,%r12b
> +sar    %r31,%r31
> +sarb   %r31b,%r31b
> +sar    $0x2,%r12,%r12
> +sarb   $0x2,%r12b,%r12b
> +shl    %r31,%r31
> +shlb   %r31b,%r31b
> +shl    $0x2,%r12,%r12
> +shlb   $0x2,%r12b,%r12b
> +shr    %r31,%r31
> +shrb   %r31b,%r31b
> +shr    $0x2,%r12,%r12
> +shrb   $0x2,%r12b,%r12b
> +shld   $0x1,%r12,(%rax),%r12
> +shld   $0x2,%r8,%r12,%r12
> +shld   %cl,%r9,(%rax),%r9
> +shld   %cl,%r12,%r16,%r16
> +shld   %cl,%r13,(%r19,%rax,4),%r13

What's the difference (in what is being tested) between this and the
first of the %cl tests? Shouldn't one of them rather be of the form
"reg1,reg2,reg1"? And then shouldn't there be a similar test with an
immediate operand? (Same for SHRD then, obviously.)

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -145,6 +145,8 @@
>  // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
>  // the bit to mark commutative VEX encodings where swapping the source
>  // operands may allow to switch from 3-byte to 2-byte VEX encoding.
> +// And re-use the bit to mark some NDD insns that swapping the source operands
> +// may allow to switch from 3 operands to 2 operands.
>  #define C StaticRounding

The 3-to-2 conversion isn't what we're primarily after (see comments above).
It's the EVEX->REX2 encoding conversion which we'd like to do.

> @@ -166,6 +168,10 @@
>  
>  ### MARKER ###
>  
> +// Please don't add a NDD insn which may be optimized to a REX2 insn before the
> +// mov. It may result that a good UB checker object the behavior
> +// "template->start - 1" at the end of match_template.
> +
>  // Move instructions.

While I mentioned adding a comment here as a minimal solution, did you try
to think of better approaches, or some enforcement of this restriction
(like gas_assert() before the expression in question)? You could even go
as far as simply not trying the optimization when t == i386_optab, with no
need to have a comment here (the comment would then be next to that
part of the condition, thus right where it's really relevant). Then anyone
misplacing a new template in the opcode table would simply observe that
an expected optimization doesn't happen, and they surely would find the
conditional with its comment.

For all of the changes below (which are a little hard to review in email),
aiui they only add C as needed. I once again would prefer if that attribute
could be added right as the templates are introduced, with the description
stating the intention and that the actual use of the attribute will be
added later (i.e. as expressed earlier already for NF).

Jan

next prev parent reply	other threads:[~2023-11-09 10:36 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02 11:29 [PATCH v2 0/8] Support Intel APX EGPR Cui, Lili
2023-11-02 11:29 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-11-02 17:05   ` Jan Beulich
2023-11-03  6:20     ` Cui, Lili
2023-11-03 13:05     ` Jan Beulich
2023-11-03 14:19   ` Jan Beulich
2023-11-06 15:20     ` Cui, Lili
2023-11-06 16:08       ` Jan Beulich
2023-11-07  8:16         ` Cui, Lili
2023-11-07 10:43           ` Jan Beulich
2023-11-07 15:31             ` Cui, Lili
2023-11-07 15:43               ` Jan Beulich
2023-11-07 15:53                 ` Cui, Lili
2023-11-06 15:02   ` Jan Beulich
2023-11-07  8:06     ` Cui, Lili
2023-11-07 10:20       ` Jan Beulich
2023-11-07 14:32         ` Cui, Lili
2023-11-07 15:08           ` Jan Beulich
2023-11-06 15:39   ` Jan Beulich
2023-11-09  8:02     ` Cui, Lili
2023-11-09 10:52       ` Jan Beulich
2023-11-09 13:27         ` Cui, Lili
2023-11-09 15:22           ` Jan Beulich
2023-11-10  7:11             ` Cui, Lili
2023-11-10  9:14               ` Jan Beulich
2023-11-10  9:21                 ` Jan Beulich
2023-11-10 12:38                   ` Cui, Lili
2023-12-14 10:13                   ` Cui, Lili
2023-12-18 15:24                     ` Jan Beulich
2023-12-18 16:23                       ` H.J. Lu
2023-11-10  9:47                 ` Cui, Lili
2023-11-10  9:57                   ` Jan Beulich
2023-11-10 12:05                     ` Cui, Lili
2023-11-10 12:35                       ` Jan Beulich
2023-11-13  0:18                         ` Cui, Lili
2023-11-02 11:29 ` [PATCH 2/8] Created an empty EVEX_MAP4_ sub-table for EVEX instructions Cui, Lili
2023-11-02 11:29 ` [PATCH 3/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-11-02 11:29 ` [PATCH 4/8] Add tests for " Cui, Lili
2023-11-08  9:11   ` Jan Beulich
2023-11-15 14:56     ` Cui, Lili
2023-11-16  9:17       ` Jan Beulich
2023-11-16 15:34     ` Cui, Lili
2023-11-16 16:50       ` Jan Beulich
2023-11-17 12:42         ` Cui, Lili
2023-11-17 14:38           ` Jan Beulich
2023-11-22 13:40             ` Cui, Lili
2023-11-02 11:29 ` [PATCH 5/8] Support APX NDD Cui, Lili
2023-11-08 10:39   ` Jan Beulich
2023-11-20  1:19     ` Cui, Lili
2023-11-08 11:13   ` Jan Beulich
2023-11-20 12:36     ` Cui, Lili
2023-11-20 16:33       ` Jan Beulich
2023-11-22  7:46         ` Cui, Lili
2023-11-22  8:47           ` Jan Beulich
2023-11-22 10:45             ` Cui, Lili
2023-11-23 10:57               ` Jan Beulich
2023-11-23 12:14                 ` Cui, Lili
2023-11-24  6:56                 ` [PATCH v3 0/9] Support Intel APX EGPR Cui, Lili
2023-12-07  8:17                   ` Cui, Lili
2023-12-07  8:33                     ` Cui, Lili
2023-11-09  9:37   ` [PATCH 5/8] Support APX NDD Jan Beulich
2023-11-20  1:33     ` Cui, Lili
2023-11-20  8:19       ` Jan Beulich
2023-11-20 12:54         ` Cui, Lili
2023-11-20 16:43           ` Jan Beulich
2023-11-02 11:29 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-11-08 11:44   ` Jan Beulich
2023-11-08 12:52     ` Jan Beulich
2023-11-22  5:48     ` Cui, Lili
2023-11-22  8:53       ` Jan Beulich
2023-11-22 12:26         ` Cui, Lili
2023-11-09  9:57   ` Jan Beulich
2023-11-02 11:29 ` [PATCH 7/8] Support APX NDD optimized encoding Cui, Lili
2023-11-09 10:36   ` Jan Beulich [this message]
2023-11-10  5:43     ` Hu, Lin1
2023-11-10  9:54       ` Jan Beulich
2023-11-14  2:28         ` Hu, Lin1
2023-11-14 10:50           ` Jan Beulich
2023-11-15  2:52             ` Hu, Lin1
2023-11-15  8:57               ` Jan Beulich
2023-11-15  2:59             ` [PATCH][v3] " Hu, Lin1
2023-11-15  9:34               ` Jan Beulich
2023-11-17  7:24                 ` Hu, Lin1
2023-11-17  9:47                   ` Jan Beulich
2023-11-20  3:28                     ` Hu, Lin1
2023-11-20  8:34                       ` Jan Beulich
2023-11-14  2:58         ` [PATCH 1/2] Reorder APX insns in i386.tbl Hu, Lin1
2023-11-14 11:20           ` Jan Beulich
2023-11-15  1:49             ` Hu, Lin1
2023-11-15  8:52               ` Jan Beulich
2023-11-17  3:27                 ` Hu, Lin1
2023-11-02 11:29 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-11-09 12:59   ` Jan Beulich
2023-11-14  3:26     ` Hu, Lin1
2023-11-14 11:15       ` Jan Beulich
2023-11-24  5:40         ` Hu, Lin1
2023-11-24  7:21           ` Jan Beulich
2023-11-27  2:16             ` Hu, Lin1
2023-11-27  8:03               ` Jan Beulich
2023-11-27  8:46                 ` Hu, Lin1
2023-11-27  8:54                   ` Jan Beulich
2023-11-27  9:03                     ` Hu, Lin1
2023-11-27 10:32                       ` Jan Beulich
2023-12-04  7:33                         ` Hu, Lin1
2023-11-02 13:22 ` [PATCH v2 0/8] Support Intel APX EGPR Jan Beulich
2023-11-03 16:42   ` Cui, Lili
2023-11-06  7:30     ` Jan Beulich
2023-11-06 14:20       ` Cui, Lili
2023-11-06 14:44         ` Jan Beulich
2023-11-06 16:03           ` Cui, Lili
2023-11-06 16:10             ` Jan Beulich
2023-11-07  1:53               ` Cui, Lili
2023-11-07 10:11                 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f003a4d4-8e4f-620e-fd44-388fcf60dff2@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=ccoutant@gmail.com \
    --cc=hongjiu.lu@intel.com \
    --cc=lili.cui@intel.com \
    --cc=lin1.hu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).