Re: [PATCH 2/4] aarch64: fp8 convert and scale - Add advsimd insn variants

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Andrew Carlotti <andrew.carlotti@arm.com>
To: Victor Do Nascimento <victor.donascimento@arm.com>
Cc: binutils@sourceware.org, richard.earnshaw@arm.com, nickc@redhat.com
Subject: Re: [PATCH 2/4] aarch64: fp8 convert and scale - Add advsimd insn variants
Date: Mon, 20 May 2024 16:36:09 +0100	[thread overview]
Message-ID: <734c54de-1d64-d2a2-c11e-f5aa67193623@e124511.cambridge.arm.com> (raw)
In-Reply-To: <20240410152950.1134020-3-victor.donascimento@arm.com>

On Wed, Apr 10, 2024 at 04:29:48PM +0100, Victor Do Nascimento wrote:
> Add the advanced SIMD variant of the FP8 convert and scale
> instructions, enabled at assembly-time using the `+fp8'
> architectural extension flag.  More specifically, support is
> added for the following instructions:
> 
...
> diff --git a/gas/testsuite/gas/aarch64/advsimd-fp8.s b/gas/testsuite/gas/aarch64/advsimd-fp8.s
> new file mode 100644
> index 00000000000..e49f38d420a
> --- /dev/null
> +++ b/gas/testsuite/gas/aarch64/advsimd-fp8.s
> @@ -0,0 +1,76 @@
> +	/* advsimd-fp8.s Test file for AArch64 8-bit floating-point vector
> +	instructions.  */
> +
> +	/* Instructions convert the elements from the lower half of the source
> +	vector while scaling the values by 2^-UInt(FPMR.LSCALE{2}[3:0]).  */
> +
> +	.macro cvrt_lowerhalf, op
> +	\op 	v0.8h, v0.8b
> +	\op 	v1.8h, v0.8b
> +	\op 	v0.8h, v1.8b
> +	\op 	v1.8h, v1.8b
> +	\op 	v16.8h, v17.8b
> +	.endm
> +
> +	cvrt_lowerhalf	bf1cvtl
> +	cvrt_lowerhalf	bf2cvtl
> +	cvrt_lowerhalf	f1cvtl
> +	cvrt_lowerhalf	f2cvtl
> +
> +	/* Instructions convert the elements from the upper half of the source
> +	vector while scaling the values by 2^-UInt(FPMR.LSCALE{2}[3:0]).  */
> +
> +	.macro cvrt_upperhalf, op
> +	\op 	v0.8h, v0.16b
> +	\op 	v1.8h, v0.16b
> +	\op 	v0.8h, v1.16b
> +	\op 	v1.8h, v1.16b
> +	\op 	v16.8h, v17.16b
> +	.endm
> +
> +	cvrt_upperhalf	bf1cvtl2
> +	cvrt_upperhalf	bf2cvtl2
> +	cvrt_upperhalf	f1cvtl2
> +	cvrt_upperhalf	f2cvtl2
> +
> +	/* Floating-point adjust exponent by vector.  */
> +
> +	.macro fscale_gen, op_var
> +	fscale	v0.\op_var, v0.\op_var, v0.\op_var
> +	fscale	v1.\op_var, v0.\op_var, v0.\op_var
> +	fscale	v0.\op_var, v1.\op_var, v0.\op_var
> +	fscale	v0.\op_var, v0.\op_var, v1.\op_var
> +	fscale	v1.\op_var, v1.\op_var, v0.\op_var
> +	fscale	v0.\op_var, v1.\op_var, v1.\op_var
> +	fscale	v1.\op_var, v1.\op_var, v1.\op_var
> +	fscale	v16.\op_var, v17.\op_var, v18.\op_var
> +	.endm
> +
> +	/* Half-precision variant.  */
> +	fscale_gen	4h
> +	fscale_gen	8h
> +	/* Single-precision variant.  */
> +	fscale_gen	2s
> +	fscale_gen	4s
> +	fscale_gen	2d
> +
> +	/* Half and single-precision to FP8 convert and narrow.  */
> +
> +	.macro fcvtn_to_fp8, op, sd, ss
> +	\op	v0.\sd, v0.\ss, v0.\ss
> +	\op	v1.\sd, v0.\ss, v0.\ss
> +	\op	v0.\sd, v1.\ss, v0.\ss
> +	\op	v0.\sd, v0.\ss, v1.\ss
> +	\op	v1.\sd, v1.\ss, v0.\ss
> +	\op	v0.\sd, v1.\ss, v1.\ss
> +	\op	v1.\sd, v1.\ss, v1.\ss
> +	\op	v16.\sd, v17.\ss, v18.\ss
> +	.endm
> +
> +	/* Half-precision variant.  */
> +	fcvtn_to_fp8 fcvtn 8b, 4h
> +	fcvtn_to_fp8 fcvtn 16b, 8h
> +
> +	/* Single-precision variant.  */
> +	fcvtn_to_fp8 fcvtn, 8b, 4s
> +	fcvtn_to_fp8 fcvtn2, 16b, 4s

Some register operand bits always take the value 0 in these tests, so they don't
show that the opcode mask is correct at those locations.  It would be better to
ensure that each operand bit is set to both 0 and 1 in some test.  A good way
to do this in general is to have tests that use register 31 (or the highest
valid register) for each operand in turn, while leaving the other registers set
to 0. 


> diff --git a/opcodes/aarch64-tbl.h b/opcodes/aarch64-tbl.h
> index 7e603462a37..f876c1b342f 100644
> --- a/opcodes/aarch64-tbl.h
> +++ b/opcodes/aarch64-tbl.h
> @@ -2368,6 +2368,34 @@
>    QLF3(X,X,NIL),                                        \
>  }
>  
> +#define QL_V3_BSS_LOWER    \
> +{			   \
> +  QLF3(V_8B, V_4S, V_4S),  \
> +}
> +
> +#define QL_V3_BSS_FULL	   \
> +{			   \
> +  QLF3(V_16B, V_4S, V_4S), \
> +}
> +
> +#define QL_V3_BHH	   \
> +{			   \
> +  QLF3(V_8B, V_4H, V_4H),  \
> +  QLF3(V_16B, V_8H, V_8H), \
> +}
> +
> +/* e.g. BF1CVTL <Vd>.8H, <Vn>.8B.  */
> +#define QL_V2FP8B8H	   \
> +{			   \
> +  QLF2(V_8H, V_8B),	   \
> +}

How about aligning names with existing qualifier sets - e.g. QL_V2LONGB{2} to
match QL_V2LONGHS{2}?  Or, alternatively, QL_V2_HB_{LOWER|FULL} to match your
other new qualifiers?

> +/* e.g. BF1CVTL2 <Vd>.8H, <Vn>.16B.  */
> +#define QL_V28H16B	   \
> +{			   \
> +  QLF2(V_8H, V_16B),	   \
> +}
> +
>  /* e.g. UDOT <Vd>.2S, <Vn>.8B, <Vm>.8B.  */
>  #define QL_V3DOT	   \
>  {			   \
> @@ -6459,6 +6487,19 @@ const struct aarch64_opcode aarch64_opcode_table[] =
>    SVE2p1_INSNC("st2q",0xe4600000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
>    SVE2p1_INSNC("st3q",0xe4a00000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
>    SVE2p1_INSNC("st4q",0xe4e00000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
> +  FP8_INSN("bf1cvtl", 0x2ea17800, 0xfffffc00, asimdmisc, OP2 (Vd, Vn), QL_V2FP8B8H, 0),
> +  FP8_INSN("bf1cvtl2", 0x6ea17800, 0xfffffc00, asimdmisc, OP2 (Vd, Vn), QL_V28H16B, 0),
> +  FP8_INSN("bf2cvtl", 0x2ee17800, 0xfffffc00, asimdmisc, OP2 (Vd, Vn), QL_V2FP8B8H, 0),
> +  FP8_INSN("bf2cvtl2", 0x6ee17800, 0xfffffc00, asimdmisc, OP2 (Vd, Vn), QL_V28H16B, 0),
> +  FP8_INSN("f1cvtl", 0x2e217800, 0xfffffc00, asimdmisc, OP2 (Vd, Vn), QL_V2FP8B8H, 0),
> +  FP8_INSN("f1cvtl2", 0x6e217800, 0xfffffc00, asimdmisc, OP2 (Vd, Vn), QL_V28H16B, 0),
> +  FP8_INSN("f2cvtl", 0x2e617800, 0xfffffc00, asimdmisc, OP2 (Vd, Vn), QL_V2FP8B8H, 0),
> +  FP8_INSN("f2cvtl2", 0x6e617800, 0xfffffc00, asimdmisc, OP2 (Vd, Vn), QL_V28H16B, 0),
> +  FP8_INSN("fcvtn",  0xe00f400, 0xffe0fc00, asimdmisc, OP3 (Vd, Vn, Vm), QL_V3_BSS_LOWER, 0),
> +  FP8_INSN("fcvtn2", 0x4e00f400, 0xffe0fc00, asimdmisc, OP3 (Vd, Vn, Vm), QL_V3_BSS_FULL, 0),
> +  FP8_INSN("fcvtn", 0xe40f400,  0xbfe0fc00, asimdmisc, OP3 (Vd, Vn, Vm), QL_V3_BHH, F_SIZEQ),
> +  FP8_INSN("fscale", 0x2ec03c00, 0xbfe0fc00, asimdmisc, OP3 (Vd, Vn, Vm), QL_VSHIFT_H, F_SIZEQ),
> +  FP8_INSN("fscale", 0x2ea0fc00, 0xbfa0fc00, asimdmisc, OP3 (Vd, Vn, Vm), QL_V3SAMESD, F_SIZEQ),
>  
>  /* Checked Pointer Arithmetic Instructions.  */
>    CPA_INSN ("addpt",  0x9a002000, 0xffe0e000, aarch64_misc, OP3 (Rd_SP, Rn_SP, Rm_LSL), QL_I3SAMEX),
> -- 
> 2.34.1
>

next prev parent reply	other threads:[~2024-05-20 15:36 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-10 15:29 [PATCH 0/4] aarch64: Add armv9.5-a FP8 datatype conversion Victor Do Nascimento
2024-04-10 15:29 ` [PATCH 1/4] aarch64: fp8 convert and scale - add feature flags and related structures Victor Do Nascimento
2024-04-10 15:29 ` [PATCH 2/4] aarch64: fp8 convert and scale - Add advsimd insn variants Victor Do Nascimento
2024-05-17 15:43   ` Richard Earnshaw (lists)
2024-05-20 15:36   ` Andrew Carlotti [this message]
2024-04-10 15:29 ` [PATCH 3/4] aarch64: fp8 convert and scale - add sve2 " Victor Do Nascimento
2024-04-10 15:29 ` [PATCH 4/4] aarch64: fp8 convert and scale - add sme2 " Victor Do Nascimento
2024-04-17  9:50 ` [PATCH 0/4] aarch64: Add armv9.5-a FP8 datatype conversion Nick Clifton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=734c54de-1d64-d2a2-c11e-f5aa67193623@e124511.cambridge.arm.com \
    --to=andrew.carlotti@arm.com \
    --cc=binutils@sourceware.org \
    --cc=nickc@redhat.com \
    --cc=richard.earnshaw@arm.com \
    --cc=victor.donascimento@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).