Re: [RFC] x86: proposal for a new .insn directive

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Jan Beulich <jbeulich@suse.com>
To: Binutils <binutils@sourceware.org>
Cc: "H.J. Lu" <hjl.tools@gmail.com>,
	"Jiang, Haochen" <haochen.jiang@intel.com>
Subject: Re: [RFC] x86: proposal for a new .insn directive
Date: Fri, 3 Feb 2023 12:39:40 +0100	[thread overview]
Message-ID: <05989195-cca9-f334-4287-a1bdbadf0352@suse.com> (raw)
In-Reply-To: <7166b647-c3a3-6103-c4d2-7c59a1520518@suse.com>

On 13.01.2023 12:58, Jan Beulich via Binutils wrote:
> certain other architectures (Arm, RISC-V) have such, and x86 would imo
> benefit from such even more: It is notoriously difficult to encode new
> insns with operands which a certain version of gas doesn't support yet.
> This is in particular related to the building of the ModR/M and SIB
> bytes as well the VEX/XOP/EVEX prefixes.
> 
> I would appreciate feedback on the proposal (in form of an assembly
> source file, providing examples at the same time). Besides pointing
> out issues / oversights, thoughts on the various TBDs would be helpful.

Some updates below, resulting from first steps taken. (There are other
more mechanical ones, which will be covered by the doc addition yet to
be written.)

> #	.insn [<prefix>] [<encoding>] <major-opcode>[+r|/<extension>] [,<operand>[,...]]
> 
> # Legacy encoding prefixes altering encoding space (0x0f, 0x0f38, 0x0f3a)
> # have to be specified as high byte(s) of <major-opcode>. This also extends
> # to certain FPU opcodes or sub-spaces like that of major opcode 0x0f01.
> 
> # Legacy encoding prefixes altering meaning (0x66, 0xF2, 0xF3) may be
> # specified as high byte of <major-opcode> (perhaps already including an
> # encoding space prefix). Other prefixes should be spelled out as usual
> # ahead of <major-opcode> or, for segment overrides, with the memory
> # operand.
> 
> # Operand order may not match that of the instruction actually being
> # expressed: While for a memory operand (of which there can be only one) it
> # is clear how to encode it in the resulting ModR/M byte, register operands
> # are encoded strictly in the order

# - {E,}VEX.vvvv for 1-register-operand VEX/XOP/EVEX insns,

> # - ModR/M.rm, ModR/M.reg for 2-operand insns,
> # - ModR/M.rm, {E,}VEX.vvvv, ModR/M.reg for 3-operand insns, and
> # - Imm{4,5}, ModR/M.rm, {E,}VEX.vvvv, ModR/M.reg for 4-operand insns,
> # obviously with the ModR/M.rm slot skipped when there is a memory operand,
> # and obviously with the ModR/M.reg slot skipped when there is an extension
> # opcode. (For Intel syntax of course all in the opposite order.)
> 
> # Immediate operands (including immediate-like displacements, i.e. when not
> # part of ModR/M addressing) should be specified by separate .byte / .word /
> # .long / .quad (or alike) directives.
> # TBD: How to deal with this for RIP-relative addressing?
> # TBD: How to deal with this for 4-operand insns?

The earlier two proposals how to address these two issues were

# Proposal 1: $({u,s}<bits>)<number>
# Proposal 2: $<number>:{u,s}<bits>

Neither will easily fit within the way operands are currently parsed.
To avoid further fragility, I'm therefore considering to extend what
we currently use for vector operations: Prefix or suffix the size
specifier enclosed in curly braces (using [] instead to represent
alternatives):

# Proposal 3: ${[u,s]<bits>}<number>
# Proposal 4: $<number>{[u,s]<bits>}

The former would be easiest to deal with from what I can tell right
now.

> # When register operand size varies for an actual insn (like e.g. for MOVZX or
> # VPMOVZX*), registers nevertheless need spelling out in a uniform manner, such
> # that any of them could be used to derive operand size attributes (e.g.
> # operand size prefix, REX.W, VEX.W, or VEX.L) as well as the EVEX Disp8
> # scaling factor.
> # TBD: Could also go from largest operand size, albeit that may end up confusing
> #      in AT&T mode, where memory operands don't have size, yet the memory
> #      operand may have larger size than the register one(s) (and would hence be
> #      the one which the <len> attribute - see below - needs deriving from).

Using largest operand size has turned out to be preferable. The AT&T
syntax concern is easy to address: Respective attributes can simply
be specified explicitly in the {VEX,XOP,EVEX}... construct when
operands don't allow correctly deriving one or more of them.

> # For VEX / XOP / EVEX <encoding> is arranged like this:
> # {VEX,XOP,EVEX}[.<len>][.<prefix>][.<space>][.<w>]

I've changed this for XOP, as being more natural this way:

# {,E}VEX[.<len>][.<prefix>][.<space>][.<w>]
# XOP<space>[.<len>][.<prefix>][.<w>]

> # where
> # - <len> can be LIG, 128, 256, or (EVEX only) 512 as well as L0/L1 for
> #   VEX / XOP and L0-L3 for EVEX,
> # - <prefix> can be NP, 66, F3, or F2,
> # - <space> can be
> #   - 0f, 0f38, 0f3a, or M0...M31 for VEX,
> #   - 08...3f (hex) for XOP,

This ranges only from 08 through to 1f.

> #   - 0f, 0f38, 0f3a, or M0...M15 for EVEX,
> # - <w> can be WIG, W0, or W1.
> # Omitted <len> means "infer from operand size" if there is at least one
> # sized operand, or LIG otherwise.
> # Omitted <prefix> means NP.
> # Omitted <space> implies encoding is taken from <major-opcode>.
> # Omitted <w> means "infer from GPR operand size" if there is at least
> # one GPR operand, or WIG otherwise.
>[...]
> # TBD: How to specify the Disp8 scaling factor here? (In Intel syntax we can simply
> #      use memory operand size.) Proposal: 4(%eax):4 or 4(%eax):d4.

Like for immediates, the proposals present parsing challenges (and here
there's also a [mild] forward compatibility concern, as we don't know
what may further be added to the architecture). Hence, like there I'm
now considering to instead put the size specifiers inside (potentially
already present) curly braces, e.g.

	.insn EVEX.M5.W0 0x5a, 16(%eax){:d16}, %zmm0	# vcvtph2pd 16(%eax), %zmm0
	.insn EVEX.M5.W0 0x5a, 2(%eax){1to8,:d2}, %zmm0	# vcvtph2pd 2(%eax){1to8}, %zmm0

I'd like to keep the colons to reduce the risk of issues which, as
said, might result from future additions to the spec. Whether the
comma as a separator is also wanted is secondary at this point. In
particular if it turned out to cause problems to the parsing code, I
wouldn't be worried to drop it. We could also follow the masking
syntax and use

	.insn EVEX.M5.W0 0x5a, 2(%eax){1to8}{:d2}, %zmm0 # vcvtph2pd 2(%eax){1to8}, %zmm0

Once again - input appreciated especially on all still open aspects.

Jan

     prev parent reply	other threads:[~2023-02-03 11:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-13 11:58 Jan Beulich
2023-01-17 15:56 ` H.J. Lu
2023-01-17 16:16   ` Jan Beulich
2023-01-20  1:25     ` Jiang, Haochen
2023-01-20  9:07       ` Jan Beulich
2023-02-03 11:39 ` Jan Beulich [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05989195-cca9-f334-4287-a1bdbadf0352@suse.com \
    --to=jbeulich@suse.com \
    --cc=binutils@sourceware.org \
    --cc=haochen.jiang@intel.com \
    --cc=hjl.tools@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).