public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Binutils <binutils@sourceware.org>
Subject: Re: [RFC] x86: proposal for a new .insn directive
Date: Tue, 17 Jan 2023 07:56:23 -0800	[thread overview]
Message-ID: <CAMe9rOoNHy4sBv0aifC1ZZ-nReWtc1irdu2sf-qknx7LNs7Ojw@mail.gmail.com> (raw)
In-Reply-To: <7166b647-c3a3-6103-c4d2-7c59a1520518@suse.com>

On Fri, Jan 13, 2023 at 3:58 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> All,
>
> certain other architectures (Arm, RISC-V) have such, and x86 would imo
> benefit from such even more: It is notoriously difficult to encode new
> insns with operands which a certain version of gas doesn't support yet.
> This is in particular related to the building of the ModR/M and SIB
> bytes as well the VEX/XOP/EVEX prefixes.
>
> I would appreciate feedback on the proposal (in form of an assembly
> source file, providing examples at the same time). Besides pointing
> out issues / oversights, thoughts on the various TBDs would be helpful.
>
> Thanks, Jan
>
>         .text
> insn:
>
> #       .insn [<prefix>] [<encoding>] <major-opcode>[+r|/<extension>] [,<operand>[,...]]
>
> # Legacy encoding prefixes altering encoding space (0x0f, 0x0f38, 0x0f3a)
> # have to be specified as high byte(s) of <major-opcode>. This also extends
> # to certain FPU opcodes or sub-spaces like that of major opcode 0x0f01.
>
> # Legacy encoding prefixes altering meaning (0x66, 0xF2, 0xF3) may be
> # specified as high byte of <major-opcode> (perhaps already including an
> # encoding space prefix). Other prefixes should be spelled out as usual
> # ahead of <major-opcode> or, for segment overrides, with the memory
> # operand.
>
> # Operand order may not match that of the instruction actually being
> # expressed: While for a memory operand (of which there can be only one) it
> # is clear how to encode it in the resulting ModR/M byte, register operands
> # are encoded strictly in the order
> # - ModR/M.rm, ModR/M.reg for 2-operand insns,
> # - ModR/M.rm, {E,}VEX.vvvv, ModR/M.reg for 3-operand insns, and
> # - Imm{4,5}, ModR/M.rm, {E,}VEX.vvvv, ModR/M.reg for 4-operand insns,
> # obviously with the ModR/M.rm slot skipped when there is a memory operand,
> # and obviously with the ModR/M.reg slot skipped when there is an extension
> # opcode. (For Intel syntax of course all in the opposite order.)
>
> # Immediate operands (including immediate-like displacements, i.e. when not
> # part of ModR/M addressing) should be specified by separate .byte / .word /
> # .long / .quad (or alike) directives.
> # TBD: How to deal with this for RIP-relative addressing?
> # TBD: How to deal with this for 4-operand insns?
>
> # When register operand size varies for an actual insn (like e.g. for MOVZX or
> # VPMOVZX*), registers nevertheless need spelling out in a uniform manner, such
> # that any of them could be used to derive operand size attributes (e.g.
> # operand size prefix, REX.W, VEX.W, or VEX.L) as well as the EVEX Disp8
> # scaling factor.
> # TBD: Could also go from largest operand size, albeit that may end up confusing
> #      in AT&T mode, where memory operands don't have size, yet the memory
> #      operand may have larger size than the register one(s) (and would hence be
> #      the one which the <len> attribute - see below - needs deriving from).
>
> # For VEX / XOP / EVEX <encoding> is arranged like this:
> # {VEX,XOP,EVEX}[.<len>][.<prefix>][.<space>][.<w>]
> # where
> # - <len> can be LIG, 128, 256, or (EVEX only) 512 as well as L0/L1 for
> #   VEX / XOP and L0-L3 for EVEX,
> # - <prefix> can be NP, 66, F3, or F2,
> # - <space> can be
> #   - 0f, 0f38, 0f3a, or M0...M31 for VEX,
> #   - 08...3f (hex) for XOP,
> #   - 0f, 0f38, 0f3a, or M0...M15 for EVEX,
> # - <w> can be WIG, W0, or W1.
> # Omitted <len> means "infer from operand size" if there is at least one
> # sized operand, or LIG otherwise.
> # Omitted <prefix> means NP.
> # Omitted <space> implies encoding is taken from <major-opcode>.
> # Omitted <w> means "infer from GPR operand size" if there is at least
> # one GPR operand, or WIG otherwise.
>
> # TBD: Is operand order being dependent on AT&T vs Intel syntax okay?
>
>         .insn 0x90                                      # nop
>         .insn 0xf390                                    # pause
>         .insn rep 0x90                                  # pause
>         .insn 0xd9c9                                    # fxch
>         .insn 0xf30f01d9                                # vmgexit
>
>         .insn 0x89, %ecx, %eax                          # mov %ecx, %eax
>         .insn 0x89, %ax, %cx                            # mov %ax, %cx
>
>         .insn 0x8b, (%eax), %ecx                        # mov (%eax), %ecx
>
>         .insn 0x0fc8+r, %edx                            # bswap %edx
>
>         .insn lock 0x80/0, %fs:(%eax); .byte 1          # lock addb $1, %fs:(%eax)
>
> 1:
>         .insn 0xe2; .byte 1b-.-1                        # loop 1b
>         .insn 0xc7f8; .long 1b-.-4                      # xbegin 1b
>
>         .insn 0x0fb6, %ax, %cx                          # movzx %al, %cx
>         .insn 0x0fb7, %eax, %ecx                        # movzx %ax, %ecx
>
>         .insn VEX.66.0F 0x58, %xmm0, %xmm1, %xmm2       # vaddpd %xmm0, %xmm1, %xmm2
>         .insn VEX.66 0x0f58, %ymm0, %ymm1, %ymm2        # vaddpd %ymm0, %ymm1, %ymm2
>         .insn VEX.LIG.F3.0F 0x58, %xmm0, %xmm1, %xmm2   # vaddss %xmm0, %xmm1, %xmm2
>
>         .insn VEX.66.0F3A.W0 0x68, %xmm0, %xmm1, (%edx), %xmm3          # vfmaddps %xmm0, %xmm1, (%edx), %xmm3
>         .insn VEX.66.0F3A.W1 0x68, %xmm0, %xmm1, (%edx), %xmm3          # vfmaddps %xmm0, %xmm1, %xmm3, (%edx)
>         .insn VEX.66.0F3A.W1 0x68, %xmm0, %xmm1, %xmm2, (%ebx)          # vfmaddps %xmm0, %xmm1, %xmm2, (%ebx)
>
>         .insn VEX.66.0F3A.W0 0x48, $0, %xmm0, %xmm1, (%edx), %xmm3      # vpermil2ps $0, %xmm0, %xmm1, (%edx), %xmm3
>         .insn VEX.66.0F3A.W1 0x48, $1, %xmm0, %xmm1, (%edx), %xmm3      # vpermil2ps $1, %xmm0, %xmm1, %xmm3, (%edx)
>         .insn VEX.66.0F3A.W1 0x48, $2, %xmm0, %xmm1, %xmm2, (%ebx)      # vpermil2ps $2, %xmm0, %xmm1, %xmm2, (%ebx)
>
>         .insn VEX.L0.0F.W0 0x93, %eax, %k0              # kmovw %eax, %k0
>
>         .insn VEX.256.0F.WIG 0x77                       # vzeroall
>
>         .insn EVEX.NP.0F.W0 0x58, {rn-sae}, %zmm0, %zmm1, %zmm2         # vaddps {rn-sae}, %zmm0, %zmm1, %zmm2
>         .insn EVEX.66.0F.W1 0x58, 8(%eax){1to8}, %zmm1, %zmm2{%k2}{z}   # vaddpd 8(%eax){1to8}, %zmm0, %zmm1{%k2}{z}
>
> # TBD: How to specify the Disp8 scaling factor here? (In Intel syntax we can simply
> #      use memory operand size.)
>         .insn EVEX.66.0F38.W0 0x88, 4(%eax), %ymm1      # vexpandps 4(%eax), %ymm1

I think it is a nice feature.  But it will be very difficult to
support all complex
x86 encoding schemes which change over time.  We can start with the regular
encoding schemes first.

-- 
H.J.

  reply	other threads:[~2023-01-17 15:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-13 11:58 Jan Beulich
2023-01-17 15:56 ` H.J. Lu [this message]
2023-01-17 16:16   ` Jan Beulich
2023-01-20  1:25     ` Jiang, Haochen
2023-01-20  9:07       ` Jan Beulich
2023-02-03 11:39 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMe9rOoNHy4sBv0aifC1ZZ-nReWtc1irdu2sf-qknx7LNs7Ojw@mail.gmail.com \
    --to=hjl.tools@gmail.com \
    --cc=binutils@sourceware.org \
    --cc=jbeulich@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).