Re: [PATCH 5/8] Support APX NDD optimized encoding.

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: "Cui, Lili" <lili.cui@intel.com>
Cc: hongjiu.lu@intel.com, "Hu, Lin1" <lin1.hu@intel.com>,
	binutils@sourceware.org
Subject: Re: [PATCH 5/8] Support APX NDD optimized encoding.
Date: Thu, 28 Sep 2023 11:29:48 +0200	[thread overview]
Message-ID: <a2e3a361-28c3-010a-54fa-4b5edd2bf3b6@suse.com> (raw)
In-Reply-To: <20230919152527.497773-6-lili.cui@intel.com>

On 19.09.2023 17:25, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -7091,6 +7091,46 @@ check_EgprOperands (const insn_template *t)
>    return 0;
>  }
>  
> +/* Optimize APX NDD insns to non-NDD insns.  */
> +
> +static int

"bool" please when the function merely returns a yes/no indicator.

> +optimize_NDD_to_nonNDD (const insn_template *t)
> +{
> +  if (t->opcode_modifier.vexvvvv
> +      && t->opcode_space == SPACE_EVEXMAP4
> +      && i.reg_operands >= 2

See the remark near the bottom of the changes to this file: This
condition is likely insufficient, as
- further insns allowing ND may not be treated this way (CCMPscc,
  CTESTscc, and one of the CFCMOVcc forms at the very least),
- {nf} uses will want excluding, as it would be merely a waste of
  time to try to re-match with fewer operands.

> +      && (i.types[i.operands - 1].bitfield.dword
> +	  || i.types[i.operands - 1].bitfield.qword))

Why do you exclude byte and word operations? Imo what you want to
check is class being Reg.

> +    {
> +      int tmp_flag = -1;

Either type or name need to change: A variable of this name wants to be
"bool". I would have suggested "dupl" as the name, but that how doesn't
fit how the variable is used below.

(Of course I'm also not happy with the use of plain int here and below,
but that's a wider issue throughout the source file. Still it would be
nice if new code properly used unsigned int whenever only non-negative
values are to be held.)

> +      int dest = i.operands - 1;
> +      int src1 = (i.operands > 2) ? i.operands - 2 : 0;
> +      int src2 = (i.operands > 3) ? i.operands - 3 : 0;
> +
> +      if (i.op[src1].regs == i.op[dest].regs)
> +	tmp_flag = src2;
> +      /* adcx and adox don't have D bit.  */

IMUL doesn't either, yet ought to also be eligible? We have a "commutative"
flag already - can't you extend its use to the purposes you have here? I
expect this would simplify ...

> +      else if (i.op[src2].regs == i.op[dest].regs

Just to mention it: Both this and the earlier similar equality check aren't
entirely legal, as you haven't previously checked that the respective
operands actually are register ones. Analysis tools (like the UB
sanitizer) may choke on this, even if otherwise this shouldn't be a problem
from what I can tell.

> +	       && (t->opcode_modifier.d
> +		   || t->mnem_off == MN_adcx
> +		   || t->mnem_off == MN_adox)
> +	       && (t->mnem_off != MN_sub)
> +	       && (t->mnem_off != MN_sbb))

... this condition.

> +	tmp_flag = src1;
> +      if (tmp_flag != -1)
> +	{
> +	  --i.operands;
> +	  --i.reg_operands;
> +	  --i.tm.operands;
> +
> +	  if (tmp_flag != src2)
> +	      swap_2_operands (tmp_flag, src2);

Nit: There's once again something wrong with indentation here.

> +	  return 1;
> +	}
> +    }
> +  return 0;
> +}
> +
>  /* Helper function for the progress() macro in match_template().  */
>  static INLINE enum i386_error progress (enum i386_error new,
>  					enum i386_error last,
> @@ -7562,6 +7602,15 @@ match_template (char mnem_suffix)
>  	     slip through to break.  */
>  	}
>  
> +      /* If we can optimize a NDD insn to non-NDD insn, like
> +	 add %r16, %r8, %r8 -> add %r16, %r8, then rematch template.  */
> +      if (optimize_NDD_to_nonNDD (t))

I don't think such an optimization should be done without any form of
-O.

As to the function name, maybe better optimize_NDD_to_REX2()?

> +	{
> +	  t = current_templates->start;
> +	  --t;
> +	  continue;

So the decrement is to compensate the loop continuation. Nevertheless imo
this wants spelling "t = current_templates->start - 1". Yet like above
note that a good UB checker may object to this subtraction of 1, unless
you build upon templates making it here not being part of the first group
in i386_optab[]. (If any such dependency exists, it needs spelling out in
a suitable place, i.e. in particular one that's hard to overlook when
doing re-arrangements.)

> +	}
> +
>        /* Check if VEX/EVEX encoding requirements can be satisfied.  */
>        if (VEX_check_encoding (t))
>  	{

I also wonder whether this doesn't need moving further down. I expect
at least VEX_check_encoding() may need running first.

But I'm worried anyway that this patch comes too early in the series.
{evex} prefixes or use of {nf} would need skipping the optimization
attempt, and getting all the conditions right is likely easier to see
when all the rest of the infrastructure is in place.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s
> @@ -0,0 +1,115 @@
> +# Check 64bit APX NDD instructions with optimized encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +inc    %ax,%ax
> +inc    %eax,%eax
> +inc    %rax,%rax
> +inc    %r16,%r16
> +add    %r31b,%r8b,%r8b
> +add    %r31b,%r8b,%r31b
> +addb    %r31b,%r8b,%r8b
> +add    %r31,%r8,%r8
> +addq    %r31,%r8,%r8
> +add    %r31d,%r8d,%r8d
> +addl    %r31d,%r8d,%r8d
> +add    %r31w,%r8w,%r8w
> +addw    %r31w,%r8w,%r8w
> +{store} add    %r31,%r8,%r8
> +{load}  add    %r31,%r8,%r8
> +add    %r31,(%r8),%r31
> +add    (%r31),%r8,%r8
> +add    $0x12344433,%r15,%r15
> +add    $0xfffffffff4332211,%r8,%r8
> +dec    %r17,%r17
> +not    %r17,%r17
> +neg    %r17,%r17
> +sub    %r15,%r17,%r17
> +sub    %r15d,(%r8),%r15d
> +sub    (%r15,%rax,1),%r16,%r16
> +sub    $0x1234,%r30,%r30
> +sbb    %r15,%r17,%r17
> +sbb    %r15,(%r8),%r15
> +sbb    (%r15,%rax,1),%r16,%r16
> +sbb    $0x1234,%r30,%r30
> +adc    %r15,%r17,%r17
> +adc    %r15d,(%r8),%r15d
> +adc    (%r15,%rax,1),%r16,%r16
> +adc    $0x1234,%r30,%r30
> +or    %r15,%r17,%r17
> +or    %r15d,(%r8),%r15d
> +or    (%r15,%rax,1),%r16,%r16
> +or    $0x1234,%r30,%r30
> +xor    %r15,%r17,%r17
> +xor    %r15d,(%r8),%r15d
> +xor    (%r15,%rax,1),%r16,%r16
> +xor    $0x1234,%r30,%r30
> +and    %r15,%r17,%r17
> +and    %r15d,(%r8),%r15d
> +and    (%r15,%rax,1),%r16,%r16
> +and    $0x1234,%r30,%r30
> +and    $0x1234,%r30
> +ror    %r31,%r31
> +rorb   %r31b,%r31b

Please be consistent with omitting (or having) suffixes (further up you
simply test both variants, but that's not the case throughout ...

> +ror    $0x2,%r12,%r12
> +rol    %r31,%r31
> +rolb   %r31b,%r31b
> +rol    $0x2,%r12,%r12
> +rcr    %r31,%r31
> +rcrb   %r31b,%r31b
> +rcr    $0x2,%r12b,%r12b
> +rcr    $0x2,%r12,%r12
> +rcl    %r31,%r31
> +rclb   %r31b,%r31b
> +rcl    $0x2,%r12b,%r12b
> +rcl    $0x2,%r12,%r12
> +shl    %r31,%r31
> +shlb   %r31b,%r31b
> +shl    $0x2,%r12b,%r12b
> +shl    $0x2,%r12,%r12
> +sar    %r31,%r31
> +sarb   %r31b,%r31b
> +sar    $0x2,%r12b,%r12b
> +sar    $0x2,%r12,%r12
> +shl    %r31,%r31
> +shlb   %r31b,%r31b
> +shl    $0x2,%r12b,%r12b
> +shl    $0x2,%r12,%r12
> +shr    %r31,%r31
> +shrb   %r31b,%r31b

... here.

Jan

next prev parent reply	other threads:[~2023-09-28  9:29 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-19 15:25 [PATCH 0/8] [RFC] Support Intel APX EGPR Cui, Lili
2023-09-19 15:25 ` [PATCH 1/8] Support APX GPR32 with rex2 prefix Cui, Lili
2023-09-21 15:27   ` Jan Beulich
2023-09-27 15:57     ` Cui, Lili
2023-09-21 15:51   ` Jan Beulich
2023-09-27 15:59     ` Cui, Lili
2023-09-28  8:02       ` Jan Beulich
2023-10-07  3:27         ` Cui, Lili
2023-09-19 15:25 ` [PATCH 2/8] Support APX GPR32 with extend evex prefix Cui, Lili
2023-09-22 10:12   ` Jan Beulich
2023-10-17 15:48     ` Cui, Lili
2023-10-18  6:40       ` Jan Beulich
2023-10-18 10:44         ` Cui, Lili
2023-10-18 10:50           ` Jan Beulich
2023-09-22 10:50   ` Jan Beulich
2023-10-17 15:50     ` Cui, Lili
2023-10-17 16:11       ` Jan Beulich
2023-10-18  2:02         ` Cui, Lili
2023-10-18  6:10           ` Jan Beulich
2023-09-25  6:03   ` Jan Beulich
2023-10-17 15:52     ` Cui, Lili
2023-10-17 16:12       ` Jan Beulich
2023-10-18  6:31         ` Cui, Lili
2023-10-18  6:47           ` Jan Beulich
2023-10-18  7:52             ` Cui, Lili
2023-10-18  8:21               ` Jan Beulich
2023-10-18 11:30                 ` Cui, Lili
2023-10-19 11:58                   ` Cui, Lili
2023-10-19 15:24                     ` Jan Beulich
2023-10-19 16:38                       ` Cui, Lili
2023-10-20  6:25                         ` Jan Beulich
2023-10-22 14:33                           ` Cui, Lili
2023-09-19 15:25 ` [PATCH 3/8] Add tests for " Cui, Lili
2023-09-27 13:11   ` Jan Beulich
2023-10-17 15:53     ` FW: " Cui, Lili
2023-10-17 16:19       ` Jan Beulich
2023-10-18  2:32         ` Cui, Lili
2023-10-18  6:05           ` Jan Beulich
2023-10-18  7:16             ` Cui, Lili
2023-10-18  8:05               ` Jan Beulich
2023-10-18 11:26                 ` Cui, Lili
2023-10-18 12:06                   ` Jan Beulich
2023-10-25 16:03                     ` Cui, Lili
2023-09-27 13:19   ` Jan Beulich
2023-09-19 15:25 ` [PATCH 4/8] Support APX NDD Cui, Lili
2023-09-27 14:44   ` Jan Beulich
2023-10-22 14:05     ` Cui, Lili
2023-10-23  7:12       ` Jan Beulich
2023-10-25  8:10         ` Cui, Lili
2023-10-25  8:47           ` Jan Beulich
2023-10-25 15:49             ` Cui, Lili
2023-10-25 15:59               ` Jan Beulich
2023-09-28  7:57   ` Jan Beulich
2023-10-22 14:57     ` Cui, Lili
2023-10-24 11:39     ` Cui, Lili
2023-10-24 11:58       ` Jan Beulich
2023-10-25 15:29         ` Cui, Lili
2023-09-19 15:25 ` [PATCH 5/8] Support APX NDD optimized encoding Cui, Lili
2023-09-28  9:29   ` Jan Beulich [this message]
2023-10-23  2:57     ` Hu, Lin1
2023-10-23  7:23       ` Jan Beulich
2023-10-23  7:50         ` Hu, Lin1
2023-10-23  8:15           ` Jan Beulich
2023-10-24  1:40             ` Hu, Lin1
2023-10-24  6:03               ` Jan Beulich
2023-10-24  6:08                 ` Hu, Lin1
2023-10-23  3:07     ` [PATCH-V2] " Hu, Lin1
2023-10-23  3:30     ` [PATCH 5/8] [v2] " Hu, Lin1
2023-10-23  7:26       ` Jan Beulich
2023-09-19 15:25 ` [PATCH 6/8] Support APX Push2/Pop2 Cui, Lili
2023-09-28 11:37   ` Jan Beulich
2023-10-30 15:21     ` Cui, Lili
2023-10-30 15:31       ` Jan Beulich
2023-11-20 13:05         ` Cui, Lili
2023-09-19 15:25 ` [PATCH 7/8] Support APX NF Cui, Lili
2023-09-25  6:07   ` Jan Beulich
2023-09-28 12:42   ` Jan Beulich
2023-11-02 10:15     ` Cui, Lili
2023-11-02 10:23       ` Jan Beulich
2023-11-02 10:46         ` Cui, Lili
2023-12-12  2:59           ` H.J. Lu
2023-09-19 15:25 ` [PATCH 8/8] Support APX JMPABS Cui, Lili
2023-09-28 13:11   ` Jan Beulich
2023-11-02  2:32     ` Hu, Lin1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2e3a361-28c3-010a-54fa-4b5edd2bf3b6@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hongjiu.lu@intel.com \
    --cc=lili.cui@intel.com \
    --cc=lin1.hu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).