public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Binutils <binutils@sourceware.org>
Subject: x86: further optimization opportunities
Date: Fri, 26 Aug 2022 14:12:24 +0200	[thread overview]
Message-ID: <7e2041c5-57c0-21a0-7246-54b8196f3c9c@suse.com> (raw)

H.J.,

over time I've accumulated a list of possible transformations we could
do in addition to what we do already. Some are a little exotic, so may
not be worth it. Hence I'd like to ask for your view on things, if you
don't mind.

1) {,X}OR r<N>,0 and AND/TEST r<N>,~0  -->  TEST r<N>,r<N>

Except for 32-bit forms in 64-bit mode. Note that ADD/CMP/SUB can't
be replaced this way, because TEST leaves AF undefined. But perhaps
IMUL r<N>,1 can be, unless we feared people depending on a particular
implementation's setting of PF, SF, and ZF.

2) AND r<N>,0 and perhaps IMUL r<N>,r<M>,0  -->  XOR r<N>,r<N>

3) {,V}PCMPEQQ  -->  e.g. {,V}PCMPEQD 
   {,V}PCMPGTQ  -->  {,V}PXOR.

when both source operands match, for being a 1 byte shorter encoding.
Some of the respective AVX512 forms can be transformed into KX{,N}OR*.

4) P{AND{,N},{,X}OR} and {AND{,N},{,X}OR}PD  -->  {AND{,N},{,X}OR}PS
   MOVDQ{A,U} and MOV{A,U}PD  -->  MOV{A,U}PS

for saving the prefix byte. Perhaps only when -Os.

5) PSHUFD  --> SHUFPS

with identical register operands, and again perhaps only when -Os.

6) VPCMP{,U}{B,W,D,Q} and VPCOM{,U}{B,W,D,Q}  -->  VPCMP{EQ,GT}{B,W,D,Q}

where suitable, saving the immediate byte and in the latter case
also possibly allowing for the shorter VEX2 encoding.

7) VPSUB{,U}S{B,W,D,Q}  -->  VPXOR
   VPCMPGT{B,W,D,Q} (pre-AVX512)  -->  VPXOR

when both source operands are identical.

8) VFMADD{P,S}{S,D} et al  -->  VFMADD{132,231,213}{P,S}{S,D}

when one operand is suitably repeated. (This requires CpuFMA to be
explicitly enabled, as that's not a prereq to CpuFMA4.)

9) MOVZX

with 64-bit destination to drop the REX64 prefix.

10) RET/RETF/LRET

with immediate of zero to immediate-less form.

11) 32-bit TEST

with {8..15}-bit immediate in 16-bit mode.

12) MOVABS

displacement optimization with -Os, using 32-bit addressing mode as
applicable.

13) BT{,C,R,S}

with in-range immediate to operand-size-prefix-less forms. For memory
operands only by reducing nominal operand size (for register operands
going from 16- to 32-bit operand size is okay) and with an adjustment
to the displacement as necessary (perhaps leaving alone ones with LOCK
prefix).

14) BT{,C,R,S}

with memory operand and out-of-range immediate, transforming the upper
immediate bits into an adjustment to the displacement. Accompanied by
a warning, as the upper bits would no longer end up being ignored. The
SDM in fact suggests this as a model assemblers might follow.

Note that examples of 4 and 5 can actually be found in Linux'es crypto
code.

                 reply	other threads:[~2022-08-26 12:12 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e2041c5-57c0-21a0-7246-54b8196f3c9c@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=hjl.tools@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).