[PATCH 0/6] Arm64: (mostly) SVE adjustments

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 0/6] Arm64: (mostly) SVE adjustments
@ 2024-02-23 11:26 Jan Beulich
  2024-02-23 11:28 ` [PATCH 1/6] Arm64: correct B16B16 indexed bf{mla,mls,mul} Jan Beulich
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Jan Beulich @ 2024-02-23 11:26 UTC (permalink / raw)
  To: Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton

Some of the issues addressed here were pointed out before, but only
not overly involved ones of those (plus a couple of subsequent findings)
are taken care of. The rest is left to people more familiar with the
inner workings of the operand type machinery.

1: correct B16B16 indexed bf{mla,mls,mul}
2: check matching operands for predicated B16B16 insns
3: check tied operand specifier in aarch64-gen
4: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate)
5: correct SVE2.1 ld2q (scalar plus scalar)
6: gas/NEWS: drop mention of Arm64's SVE2.1 and SME2.1

At least the last patch wants backporting to 2.42.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] Arm64: correct B16B16 indexed bf{mla,mls,mul}
  2024-02-23 11:26 [PATCH 0/6] Arm64: (mostly) SVE adjustments Jan Beulich
@ 2024-02-23 11:28 ` Jan Beulich
  2024-03-20 15:54   ` Richard Earnshaw (lists)
  2024-02-23 11:28 ` [PATCH 2/6] Arm64: check matching operands for predicated B16B16 insns Jan Beulich
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2024-02-23 11:28 UTC (permalink / raw)
  To: Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft

Their index is in bits 19, 20, and 22. Bit 11 in particular is already
set in the base opcode. Note also how disassembler output didn't match
assembler input in the respective testcase.

--- a/gas/testsuite/gas/aarch64/bfloat16-1.d
+++ b/gas/testsuite/gas/aarch64/bfloat16-1.d
@@ -56,24 +56,24 @@
 .*:	65221084 	bfmla	z4.h, p4\/m, z4.h, z2.h
 .*:	65211908 	bfmla	z8.h, p6\/m, z8.h, z1.h
 .*:	65201e10 	bfmla	z16.h, p7\/m, z16.h, z0.h
-.*:	643e0a00 	bfmla	z0.h, z16.h, z6.h\[7\]
-.*:	643d0901 	bfmla	z1.h, z8.h, z5.h\[7\]
-.*:	643409c2 	bfmla	z2.h, z14.h, z4.h\[5\]
-.*:	642a0aa4 	bfmla	z4.h, z21.h, z2.h\[3\]
-.*:	64210988 	bfmla	z8.h, z12.h, z1.h\[1\]
-.*:	64200950 	bfmla	z16.h, z10.h, z0.h\[1\]
+.*:	647e0a00 	bfmla	z0.h, z16.h, z6.h\[7\]
+.*:	64750901 	bfmla	z1.h, z8.h, z5.h\[6\]
+.*:	646409c2 	bfmla	z2.h, z14.h, z4.h\[4\]
+.*:	64320aa4 	bfmla	z4.h, z21.h, z2.h\[2\]
+.*:	64290988 	bfmla	z8.h, z12.h, z1.h\[1\]
+.*:	64200950 	bfmla	z16.h, z10.h, z0.h\[0\]
 .*:	65302000 	bfmls	z0.h, p0\/m, z0.h, z16.h
 .*:	65282421 	bfmls	z1.h, p1\/m, z1.h, z8.h
 .*:	65242842 	bfmls	z2.h, p2\/m, z2.h, z4.h
 .*:	65223084 	bfmls	z4.h, p4\/m, z4.h, z2.h
 .*:	65213908 	bfmls	z8.h, p6\/m, z8.h, z1.h
 .*:	65203e10 	bfmls	z16.h, p7\/m, z16.h, z0.h
-.*:	643e0e00 	bfmls	z0.h, z16.h, z6.h\[7\]
-.*:	643d0d01 	bfmls	z1.h, z8.h, z5.h\[7\]
-.*:	64340dc2 	bfmls	z2.h, z14.h, z4.h\[5\]
-.*:	642a0ea4 	bfmls	z4.h, z21.h, z2.h\[3\]
-.*:	64210d88 	bfmls	z8.h, z12.h, z1.h\[1\]
-.*:	64200d50 	bfmls	z16.h, z10.h, z0.h\[1\]
+.*:	647e0e00 	bfmls	z0.h, z16.h, z6.h\[7\]
+.*:	64750d01 	bfmls	z1.h, z8.h, z5.h\[6\]
+.*:	64640dc2 	bfmls	z2.h, z14.h, z4.h\[4\]
+.*:	64320ea4 	bfmls	z4.h, z21.h, z2.h\[2\]
+.*:	64290d88 	bfmls	z8.h, z12.h, z1.h\[1\]
+.*:	64200d50 	bfmls	z16.h, z10.h, z0.h\[0\]
 .*:	65028200 	bfmul	z0.h, p0\/m, z0.h, z16.h
 .*:	65028501 	bfmul	z1.h, p1\/m, z1.h, z8.h
 .*:	65028882 	bfmul	z2.h, p2\/m, z2.h, z4.h
@@ -86,12 +86,12 @@
 .*:	65020a04 	bfmul	z4.h, z16.h, z2.h
 .*:	65010a88 	bfmul	z8.h, z20.h, z1.h
 .*:	65000b10 	bfmul	z16.h, z24.h, z0.h
-.*:	643e2a00 	bfmul	z0.h, z16.h, z6.h\[7\]
-.*:	643d2901 	bfmul	z1.h, z8.h, z5.h\[7\]
-.*:	643429c2 	bfmul	z2.h, z14.h, z4.h\[5\]
-.*:	642a2aa4 	bfmul	z4.h, z21.h, z2.h\[3\]
-.*:	64212988 	bfmul	z8.h, z12.h, z1.h\[1\]
-.*:	64202950 	bfmul	z16.h, z10.h, z0.h\[1\]
+.*:	647e2a00 	bfmul	z0.h, z16.h, z6.h\[7\]
+.*:	64752901 	bfmul	z1.h, z8.h, z5.h\[6\]
+.*:	646429c2 	bfmul	z2.h, z14.h, z4.h\[4\]
+.*:	64322aa4 	bfmul	z4.h, z21.h, z2.h\[2\]
+.*:	64292988 	bfmul	z8.h, z12.h, z1.h\[1\]
+.*:	64202950 	bfmul	z16.h, z10.h, z0.h\[0\]
 .*:	65018200 	bfsub	z0.h, p0\/m, z0.h, z16.h
 .*:	65018501 	bfsub	z1.h, p1\/m, z1.h, z8.h
 .*:	65018882 	bfsub	z2.h, p2\/m, z2.h, z4.h
--- a/opcodes/aarch64-tbl.h
+++ b/opcodes/aarch64-tbl.h
@@ -6344,9 +6344,9 @@ const struct aarch64_opcode aarch64_opco
   B16B16_INSN("bfmul", 0x65000800, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
   B16B16_INSNC("bfsub", 0x65018000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
   B16B16_INSN("bfsub", 0x65000400, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
-  B16B16_INSN("bfmla", 0x64200800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_11_INDEX), OP_SVE_VVV_H, 0, 0),
-  B16B16_INSN("bfmls", 0x64200c00, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_11_INDEX), OP_SVE_VVV_H, 0, 0),
-  B16B16_INSN("bfmul", 0x64202800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_11_INDEX), OP_SVE_VVV_H, 0, 0),
+  B16B16_INSN("bfmla", 0x64200800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
+  B16B16_INSN("bfmls", 0x64200c00, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
+  B16B16_INSN("bfmul", 0x64202800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
 
 /* SME2.1 movaz instructions.  */
   SME2p1_INSN ("movaz", 0xc0060600, 0xffff1f83, sme2_movaz, 0, OP2 (SME_Zdnx4, SME_ZA_array_vrsb_2), OP_SVE_BB, 0, 0),


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/6] Arm64: check matching operands for predicated B16B16 insns
  2024-02-23 11:26 [PATCH 0/6] Arm64: (mostly) SVE adjustments Jan Beulich
  2024-02-23 11:28 ` [PATCH 1/6] Arm64: correct B16B16 indexed bf{mla,mls,mul} Jan Beulich
@ 2024-02-23 11:28 ` Jan Beulich
  2024-03-20 16:19   ` Richard Earnshaw (lists)
  2024-02-23 11:29 ` [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen Jan Beulich
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2024-02-23 11:28 UTC (permalink / raw)
  To: Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft

Except for bfml{a,s} their 1st and 3rd operands need to match - pass
the TIED macro argument accordingly. While doing that also slightly
re-arrange table entries, such that all predicated insns are close
together.

At the same time change the existing test source to actually use non-
matching operands for the respective bfml{a,s} forms.

--- a/gas/testsuite/gas/aarch64/bfloat16-1.d
+++ b/gas/testsuite/gas/aarch64/bfloat16-1.d
@@ -50,24 +50,24 @@
 .*:	64222604 	bfclamp	z4.h, z16.h, z2.h
 .*:	64212688 	bfclamp	z8.h, z20.h, z1.h
 .*:	64202710 	bfclamp	z16.h, z24.h, z0.h
-.*:	65300000 	bfmla	z0.h, p0\/m, z0.h, z16.h
-.*:	65280421 	bfmla	z1.h, p1\/m, z1.h, z8.h
-.*:	65240842 	bfmla	z2.h, p2\/m, z2.h, z4.h
-.*:	65221084 	bfmla	z4.h, p4\/m, z4.h, z2.h
-.*:	65211908 	bfmla	z8.h, p6\/m, z8.h, z1.h
-.*:	65201e10 	bfmla	z16.h, p7\/m, z16.h, z0.h
+.*:	65300080 	bfmla	z0.h, p0\/m, z4.h, z16.h
+.*:	65280501 	bfmla	z1.h, p1\/m, z8.h, z8.h
+.*:	65240982 	bfmla	z2.h, p2\/m, z12.h, z4.h
+.*:	65221204 	bfmla	z4.h, p4\/m, z16.h, z2.h
+.*:	65211a88 	bfmla	z8.h, p6\/m, z20.h, z1.h
+.*:	65201f10 	bfmla	z16.h, p7\/m, z24.h, z0.h
 .*:	647e0a00 	bfmla	z0.h, z16.h, z6.h\[7\]
 .*:	64750901 	bfmla	z1.h, z8.h, z5.h\[6\]
 .*:	646409c2 	bfmla	z2.h, z14.h, z4.h\[4\]
 .*:	64320aa4 	bfmla	z4.h, z21.h, z2.h\[2\]
 .*:	64290988 	bfmla	z8.h, z12.h, z1.h\[1\]
 .*:	64200950 	bfmla	z16.h, z10.h, z0.h\[0\]
-.*:	65302000 	bfmls	z0.h, p0\/m, z0.h, z16.h
-.*:	65282421 	bfmls	z1.h, p1\/m, z1.h, z8.h
-.*:	65242842 	bfmls	z2.h, p2\/m, z2.h, z4.h
-.*:	65223084 	bfmls	z4.h, p4\/m, z4.h, z2.h
-.*:	65213908 	bfmls	z8.h, p6\/m, z8.h, z1.h
-.*:	65203e10 	bfmls	z16.h, p7\/m, z16.h, z0.h
+.*:	65302080 	bfmls	z0.h, p0\/m, z4.h, z16.h
+.*:	65282501 	bfmls	z1.h, p1\/m, z8.h, z8.h
+.*:	65242982 	bfmls	z2.h, p2\/m, z12.h, z4.h
+.*:	65223204 	bfmls	z4.h, p4\/m, z16.h, z2.h
+.*:	65213a88 	bfmls	z8.h, p6\/m, z20.h, z1.h
+.*:	65203f10 	bfmls	z16.h, p7\/m, z24.h, z0.h
 .*:	647e0e00 	bfmls	z0.h, z16.h, z6.h\[7\]
 .*:	64750d01 	bfmls	z1.h, z8.h, z5.h\[6\]
 .*:	64640dc2 	bfmls	z2.h, z14.h, z4.h\[4\]
--- a/gas/testsuite/gas/aarch64/bfloat16-1.s
+++ b/gas/testsuite/gas/aarch64/bfloat16-1.s
@@ -46,12 +46,13 @@ bfclamp z2.h, z12.h, z4.h
 bfclamp z4.h, z16.h, z2.h
 bfclamp z8.h, z20.h, z1.h
 bfclamp z16.h, z24.h, z0.h
-bfmla z0.h, p0/m, z0.h, z16.h
-bfmla z1.h, p1/m, z1.h, z8.h
-bfmla z2.h, p2/m, z2.h, z4.h
-bfmla z4.h, p4/m, z4.h, z2.h
-bfmla z8.h, p6/m, z8.h, z1.h
-bfmla z16.h, p7/m, z16.h, z0.h
+
+bfmla z0.h, p0/m, z4.h, z16.h
+bfmla z1.h, p1/m, z8.h, z8.h
+bfmla z2.h, p2/m, z12.h, z4.h
+bfmla z4.h, p4/m, z16.h, z2.h
+bfmla z8.h, p6/m, z20.h, z1.h
+bfmla z16.h, p7/m, z24.h, z0.h
 
 bfmla z0.h, z16.h, z6.h[7]
 bfmla z1.h, z8.h, z5.h[6]
@@ -60,12 +61,12 @@ bfmla z4.h, z21.h, z2.h[2]
 bfmla z8.h, z12.h, z1.h[1]
 bfmla z16.h, z10.h, z0.h[0]
 
-bfmls z0.h, p0/m, z0.h, z16.h
-bfmls z1.h, p1/m, z1.h, z8.h
-bfmls z2.h, p2/m, z2.h, z4.h
-bfmls z4.h, p4/m, z4.h, z2.h
-bfmls z8.h, p6/m, z8.h, z1.h
-bfmls z16.h, p7/m, z16.h, z0.h
+bfmls z0.h, p0/m, z4.h, z16.h
+bfmls z1.h, p1/m, z8.h, z8.h
+bfmls z2.h, p2/m, z12.h, z4.h
+bfmls z4.h, p4/m, z16.h, z2.h
+bfmls z8.h, p6/m, z20.h, z1.h
+bfmls z16.h, p7/m, z24.h, z0.h
 
 bfmls z0.h, z16.h, z6.h[7]
 bfmls z1.h, z8.h, z5.h[6]
--- a/gas/testsuite/gas/aarch64/bfloat16-bad.l
+++ b/gas/testsuite/gas/aarch64/bfloat16-bad.l
@@ -41,24 +41,24 @@
 .*: Error: selected processor does not support `bfclamp z4.h,z16.h,z2.h'
 .*: Error: selected processor does not support `bfclamp z8.h,z20.h,z1.h'
 .*: Error: selected processor does not support `bfclamp z16.h,z24.h,z0.h'
-.*: Error: selected processor does not support `bfmla z0.h,p0\/m,z0.h,z16.h'
-.*: Error: selected processor does not support `bfmla z1.h,p1\/m,z1.h,z8.h'
-.*: Error: selected processor does not support `bfmla z2.h,p2\/m,z2.h,z4.h'
-.*: Error: selected processor does not support `bfmla z4.h,p4\/m,z4.h,z2.h'
-.*: Error: selected processor does not support `bfmla z8.h,p6\/m,z8.h,z1.h'
-.*: Error: selected processor does not support `bfmla z16.h,p7\/m,z16.h,z0.h'
+.*: Error: selected processor does not support `bfmla .*
+.*: Error: selected processor does not support `bfmla .*
+.*: Error: selected processor does not support `bfmla .*
+.*: Error: selected processor does not support `bfmla .*
+.*: Error: selected processor does not support `bfmla .*
+.*: Error: selected processor does not support `bfmla .*
 .*: Error: selected processor does not support `bfmla z0.h,z16.h,z6.h\[7\]'
 .*: Error: selected processor does not support `bfmla z1.h,z8.h,z5.h\[6\]'
 .*: Error: selected processor does not support `bfmla z2.h,z14.h,z4.h\[4\]'
 .*: Error: selected processor does not support `bfmla z4.h,z21.h,z2.h\[2\]'
 .*: Error: selected processor does not support `bfmla z8.h,z12.h,z1.h\[1\]'
 .*: Error: selected processor does not support `bfmla z16.h,z10.h,z0.h\[0\]'
-.*: Error: selected processor does not support `bfmls z0.h,p0\/m,z0.h,z16.h'
-.*: Error: selected processor does not support `bfmls z1.h,p1\/m,z1.h,z8.h'
-.*: Error: selected processor does not support `bfmls z2.h,p2\/m,z2.h,z4.h'
-.*: Error: selected processor does not support `bfmls z4.h,p4\/m,z4.h,z2.h'
-.*: Error: selected processor does not support `bfmls z8.h,p6\/m,z8.h,z1.h'
-.*: Error: selected processor does not support `bfmls z16.h,p7\/m,z16.h,z0.h'
+.*: Error: selected processor does not support `bfmls .*
+.*: Error: selected processor does not support `bfmls .*
+.*: Error: selected processor does not support `bfmls .*
+.*: Error: selected processor does not support `bfmls .*
+.*: Error: selected processor does not support `bfmls .*
+.*: Error: selected processor does not support `bfmls .*
 .*: Error: selected processor does not support `bfmls z0.h,z16.h,z6.h\[7\]'
 .*: Error: selected processor does not support `bfmls z1.h,z8.h,z5.h\[6\]'
 .*: Error: selected processor does not support `bfmls z2.h,z14.h,z4.h\[4\]'
--- /dev/null
+++ b/gas/testsuite/gas/aarch64/bfloat16-invalid.d
@@ -0,0 +1,4 @@
+#name: Test Bfloat16 instructions with wrong operand combinations
+#as: -march=armv9.4-a
+#source: bfloat16-invalid.s
+#error_output: bfloat16-invalid.l
--- /dev/null
+++ b/gas/testsuite/gas/aarch64/bfloat16-invalid.l
@@ -0,0 +1,8 @@
+.*: Assembler messages:
+[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfadd .*
+[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfmax .*
+[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfmaxnm .*
+[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfmin .*
+[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfminnm .*
+[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfmul .*
+[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfsub .*
--- /dev/null
+++ b/gas/testsuite/gas/aarch64/bfloat16-invalid.s
@@ -0,0 +1,13 @@
+bfadd z0.h, p0/m, z1.h, z0.h
+
+bfmax z0.h, p0/m, z1.h, z0.h
+
+bfmaxnm z0.h, p0/m, z1.h, z0.h
+
+bfmin z0.h, p0/m, z1.h, z0.h
+
+bfminnm z0.h, p0/m, z1.h, z0.h
+
+bfmul z0.h, p0/m, z1.h, z0.h
+
+bfsub z0.h, p0/m, z1.h, z0.h
--- a/opcodes/aarch64-dis-2.c
+++ b/opcodes/aarch64-dis-2.c
@@ -32211,14 +32211,14 @@ aarch64_find_next_opcode (const aarch64_
     case 1705: return NULL;		/* ldff1h --> NULL.  */
     case 1659: value = 3313; break;	/* ld2h --> ld2q.  */
     case 3313: return NULL;		/* ld2q --> NULL.  */
-    case 2464: value = 3279; break;	/* fclamp --> bfclamp.  */
-    case 3279: return NULL;		/* bfclamp --> NULL.  */
+    case 2464: value = 3281; break;	/* fclamp --> bfclamp.  */
+    case 3281: return NULL;		/* bfclamp --> NULL.  */
     case 1778: value = 1779; break;	/* ldr --> ldr.  */
     case 1779: return NULL;		/* ldr --> NULL.  */
-    case 1434: value = 3278; break;	/* fadd --> bfadd.  */
-    case 3278: return NULL;		/* bfadd --> NULL.  */
-    case 1501: value = 3281; break;	/* fmul --> bfmul.  */
-    case 3281: return NULL;		/* bfmul --> NULL.  */
+    case 1434: value = 3280; break;	/* fadd --> bfadd.  */
+    case 3280: return NULL;		/* bfadd --> NULL.  */
+    case 1501: value = 3282; break;	/* fmul --> bfmul.  */
+    case 3282: return NULL;		/* bfmul --> NULL.  */
     case 1527: value = 3283; break;	/* fsub --> bfsub.  */
     case 3283: return NULL;		/* bfsub --> NULL.  */
     case 1492: value = 3276; break;	/* fmla --> bfmla.  */
@@ -32251,12 +32251,12 @@ aarch64_find_next_opcode (const aarch64_
     case 3271: return NULL;		/* bfadd --> NULL.  */
     case 1482: value = 3273; break;	/* fmaxnm --> bfmaxnm.  */
     case 3273: return NULL;		/* bfmaxnm --> NULL.  */
-    case 1502: value = 3280; break;	/* fmul --> bfmul.  */
-    case 3280: return NULL;		/* bfmul --> NULL.  */
+    case 1502: value = 3278; break;	/* fmul --> bfmul.  */
+    case 3278: return NULL;		/* bfmul --> NULL.  */
     case 1480: value = 3272; break;	/* fmax --> bfmax.  */
     case 3272: return NULL;		/* bfmax --> NULL.  */
-    case 1528: value = 3282; break;	/* fsub --> bfsub.  */
-    case 3282: return NULL;		/* bfsub --> NULL.  */
+    case 1528: value = 3279; break;	/* fsub --> bfsub.  */
+    case 3279: return NULL;		/* bfsub --> NULL.  */
     case 1488: value = 3275; break;	/* fminnm --> bfminnm.  */
     case 3275: return NULL;		/* bfminnm --> NULL.  */
     case 1486: value = 3274; break;	/* fmin --> bfmin.  */
--- a/opcodes/aarch64-tbl.h
+++ b/opcodes/aarch64-tbl.h
@@ -6331,18 +6331,18 @@ const struct aarch64_opcode aarch64_opco
   D128_THE_INSN("rcwsswppl", 0x5960a000, 0xffe0fc00, OP3 (Rt, Rs, ADDR_SIMPLE), QL_X2NIL, 0),
 
 /* BFloat16 SVE Instructions.  */
-  B16B16_INSNC("bfadd", 0x65008000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
-  B16B16_INSNC("bfmax", 0x65068000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
-  B16B16_INSNC("bfmaxnm", 0x65048000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
-  B16B16_INSNC("bfmin", 0x65078000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
-  B16B16_INSNC("bfminnm", 0x65058000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
+  B16B16_INSNC("bfadd", 0x65008000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
+  B16B16_INSNC("bfmax", 0x65068000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
+  B16B16_INSNC("bfmaxnm", 0x65048000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
+  B16B16_INSNC("bfmin", 0x65078000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
+  B16B16_INSNC("bfminnm", 0x65058000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
   B16B16_INSNC("bfmla", 0x65200000, 0xffe0e000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zn, SVE_Zm_16), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
   B16B16_INSNC("bfmls", 0x65202000, 0xffe0e000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zn, SVE_Zm_16), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
+  B16B16_INSNC("bfmul", 0x65028000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
+  B16B16_INSNC("bfsub", 0x65018000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
   B16B16_INSN("bfadd", 0x65000000, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
   B16B16_INSN("bfclamp", 0x64202400, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
-  B16B16_INSNC("bfmul", 0x65028000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
   B16B16_INSN("bfmul", 0x65000800, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
-  B16B16_INSNC("bfsub", 0x65018000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
   B16B16_INSN("bfsub", 0x65000400, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
   B16B16_INSN("bfmla", 0x64200800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
   B16B16_INSN("bfmls", 0x64200c00, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen
  2024-02-23 11:26 [PATCH 0/6] Arm64: (mostly) SVE adjustments Jan Beulich
  2024-02-23 11:28 ` [PATCH 1/6] Arm64: correct B16B16 indexed bf{mla,mls,mul} Jan Beulich
  2024-02-23 11:28 ` [PATCH 2/6] Arm64: check matching operands for predicated B16B16 insns Jan Beulich
@ 2024-02-23 11:29 ` Jan Beulich
  2024-03-15 16:09   ` Andrew Carlotti
  2024-03-20 16:51   ` Richard Earnshaw (lists)
  2024-02-23 11:29 ` [PATCH 4/6] Arm64: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate) Jan Beulich
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2024-02-23 11:29 UTC (permalink / raw)
  To: Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton

Make sure that field actually matches the specified operands. Don't
follow existing F_PSEUDO checking in using assertions, though. Print
meaingful error messages, thus - while not having a line number
available - at least providing some indication of where things are
wrong.

Fix SVE2.1's extq accordingly, but don't extend the testsuite there:
There are further issues with its operands (SVE_Zm_imm4 doesn't look to
be correct to use there, as that describes an indexed vector register,
while here a separate vector register and immediate operand are to be
specified).

--- a/opcodes/aarch64-gen.c
+++ b/opcodes/aarch64-gen.c
@@ -129,6 +129,7 @@ read_table (const struct aarch64_opcode*
   const struct aarch64_opcode *ent = table;
   opcode_node **new_ent;
   unsigned int index = initialize_index (table);
+  unsigned int errors = 0;
 
   if (!ent->name)
     return;
@@ -140,6 +141,8 @@ read_table (const struct aarch64_opcode*
 
   do
     {
+      bool match = false;
+
       /* F_PSEUDO needs to be used together with F_ALIAS to indicate an alias
 	 opcode is a programmer friendly pseudo instruction available only in
 	 the assembly code (thus will not show up in the disassembly).  */
@@ -150,12 +153,45 @@ read_table (const struct aarch64_opcode*
 	  index++;
 	  continue;
 	}
+
+      /* Check tied_operand against operands[].  */
+      for (unsigned int i = 1; i < ARRAY_SIZE (ent->operands); ++i)
+	{
+	  if (ent->operands[i] == AARCH64_OPND_NIL)
+	    break;
+
+	  if (ent->operands[i] != ent->operands[0])
+	    continue;
+	  match = true;
+
+	  if (i != ent->tied_operand)
+	    {
+	      fprintf (stderr, "%s: operands 1 and %u match, but tied=%u\n",
+		       ent->name, i + 1, ent->tied_operand);
+	      ++errors;
+	    }
+	}
+      if (!match && ent->tied_operand
+	  /* SME LDR/STR (array vector) tie together inner immediates only.  */
+	  && ent->iclass != sme_ldr && ent->iclass != sme_str)
+	{
+	  fprintf (stderr, "%s: no operands match, but tied=%u\n",
+		   ent->name, ent->tied_operand);
+	  ++errors;
+	}
+
       *new_ent = new_opcode_node ();
       (*new_ent)->opcode = ent->opcode;
       (*new_ent)->mask = ent->mask;
       (*new_ent)->index = index++;
       new_ent = &((*new_ent)->next);
     } while ((++ent)->name);
+
+  if (errors)
+    {
+      fprintf (stderr, "%u errors, exiting\n", errors);
+      xexit (3);
+    }
 }
 
 static inline void
--- a/opcodes/aarch64-tbl.h
+++ b/opcodes/aarch64-tbl.h
@@ -6375,7 +6375,7 @@ const struct aarch64_opcode aarch64_opco
   SVE2p1_INSNC("fminqv",0x6417a000, 0xff3fe000, sve2_urqvs, 0, OP3 (Vd, SVE_Pg3, SVE_Zn), OP_SVE_vUS_HSD_HSD, F_OPD_SIZE, C_SCAN_MOVPRFX, 0),
 
   SVE2p1_INSN("dupq",0x05202400, 0xffe0fc00, sve_index1, 0, OP2 (SVE_Zd, SVE_Zn_5_INDEX), OP_SVE_VV_BHSD, 0, 0),
-  SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 0),
+  SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 1),
   SVE2p1_INSNC("ld1q",0xc400a000, 0xffe0e000, sve_misc, 0, OP3 (SVE_Zt, SVE_Pg3, SVE_ADDR_ZX), OP_SVE_SZS_QD, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld2q",0xa490e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/6] Arm64: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate)
  2024-02-23 11:26 [PATCH 0/6] Arm64: (mostly) SVE adjustments Jan Beulich
                   ` (2 preceding siblings ...)
  2024-02-23 11:29 ` [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen Jan Beulich
@ 2024-02-23 11:29 ` Jan Beulich
  2024-05-09 14:31   ` Richard Earnshaw (lists)
  2024-02-23 11:30 ` [PATCH 5/6] Arm64: correct SVE2.1 ld2q (scalar plus scalar) Jan Beulich
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2024-02-23 11:29 UTC (permalink / raw)
  To: Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton

Like their byte, half, word, and doubleword counterparts their
immediates are multiples of 3 / 4 respectively.

--- a/gas/testsuite/gas/aarch64/sve2p1-1.d
+++ b/gas/testsuite/gas/aarch64/sve2p1-1.d
@@ -1,4 +1,4 @@
-#name: Test of SVE2.1 min max instructions.
+#name: Test of SVE2.1 instructions
 #as: -march=armv9.4-a+sve2p1
 #objdump: -dr
 
@@ -91,15 +91,15 @@
 .*:	6497bc10 	fminqv	v16.4s, p7, z0.s
 .*:	c400b200 	ld1q	z0.q, p4/z, \[z16.d, x0\]
 .*:	a49ef000 	ld2q	{z0.q, z1.q}, p4/z, \[x0, #-4, mul vl\]
-.*:	a51ef000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, #-4, mul vl\]
-.*:	a59ef000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, #-4, mul vl\]
+.*:	a51ef000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, #-6, mul vl\]
+.*:	a59ef000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, #-8, mul vl\]
 .*:	a4a2f000 	ld2h	{z0.h-z1.h}, p4/z, \[x0, #4, mul vl\]
 .*:	a5249000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, x4, lsl #4\]
 .*:	a5a69000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, x6, lsl #4\]
 .*:	e4203200 	st1q	z0.q, p4, \[z16.d, x0\]
 .*:	e44e1000 	st2q	{z0.q, z1.q}, p4, \[x0, #-4, mul vl\]
-.*:	e48e1000 	st3q	{z0.q, z1.q, z2.q}, p4, \[x0, #-4, mul vl\]
-.*:	e4ce1000 	st4q	{z0.q, z1.q, z2.q, z3.q}, p4, \[x0, #-4, mul vl\]
+.*:	e48e1000 	st3q	{z0.q, z1.q, z2.q}, p4, \[x0, #-6, mul vl\]
+.*:	e4ce1000 	st4q	{z0.q, z1.q, z2.q, z3.q}, p4, \[x0, #-8, mul vl\]
 .*:	e4621000 	st2q	{z0.q, z1.q}, p4, \[x0, x2, lsl #4\]
 .*:	e4a41000 	st3q	{z0.q, z1.q, z2.q}, p4, \[x0, x4, lsl #4\]
 .*:	e4e61000 	st4q	{z0.q, z1.q, z2.q, z3.q}, p4, \[x0, x6, lsl #4\]
--- a/gas/testsuite/gas/aarch64/sve2p1-1.s
+++ b/gas/testsuite/gas/aarch64/sve2p1-1.s
@@ -92,16 +92,16 @@ fminqv v8.2d, p4, z1.d
 fminqv v16.4s, p7, z0.s
 ld1q Z0.Q, p4/Z, [Z16.D, x0]
 ld2q {Z0.Q, Z1.Q}, p4/Z, [x0,  #-4, MUL VL]
-ld3q {Z0.Q, Z1.Q, Z2.Q}, p4/Z, [x0,  #-4, MUL VL]
-ld4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4/Z, [x0,  #-4, MUL VL]
+ld3q {Z0.Q, Z1.Q, Z2.Q}, p4/Z, [x0,  #-6, MUL VL]
+ld4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4/Z, [x0,  #-8, MUL VL]
 ld2q {Z0.Q, Z1.Q}, p4/Z, [x0, x2, lsl  #4]
 ld3q {Z0.Q, Z1.Q, Z2.Q}, p4/Z, [x0, x4, lsl  #4]
 ld4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4/Z, [x0, x6, lsl  #4]
 
 st1q Z0.Q, p4, [Z16.D, x0]
 st2q {Z0.Q, Z1.Q}, p4, [x0,  #-4, MUL VL]
-st3q {Z0.Q, Z1.Q, Z2.Q}, p4, [x0,  #-4, MUL VL]
-st4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4, [x0,  #-4, MUL VL]
+st3q {Z0.Q, Z1.Q, Z2.Q}, p4, [x0,  #-6, MUL VL]
+st4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4, [x0,  #-8, MUL VL]
 st2q {Z0.Q, Z1.Q}, p4, [x0, x2, lsl  #4]
 st3q {Z0.Q, Z1.Q, Z2.Q}, p4, [x0, x4, lsl  #4]
 st4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4, [x0, x6, lsl  #4]
--- a/gas/testsuite/gas/aarch64/sve2p1-1-bad.l
+++ b/gas/testsuite/gas/aarch64/sve2p1-1-bad.l
@@ -82,15 +82,15 @@
 .*: Error: selected processor does not support `fminqv v16.4s,p7,z0.s'
 .*: Error: selected processor does not support `ld1q Z0.Q,p4/Z,\[Z16.D,x0\]'
 .*: Error: selected processor does not support `ld2q {Z0.Q,Z1.Q},p4/Z,\[x0,#-4,MUL VL\]'
-.*: Error: selected processor does not support `ld3q {Z0.Q,Z1.Q,Z2.Q},p4/Z,\[x0,#-4,MUL VL\]'
-.*: Error: selected processor does not support `ld4q {Z0.Q,Z1.Q,Z2.Q,Z3.Q},p4/Z,\[x0,#-4,MUL VL\]'
+.*: Error: selected processor does not support `ld3q .*
+.*: Error: selected processor does not support `ld4q .*
 .*: Error: selected processor does not support `ld2q {Z0.Q,Z1.Q},p4/Z,\[x0,x2,lsl#4\]'
 .*: Error: selected processor does not support `ld3q {Z0.Q,Z1.Q,Z2.Q},p4/Z,\[x0,x4,lsl#4\]'
 .*: Error: selected processor does not support `ld4q {Z0.Q,Z1.Q,Z2.Q,Z3.Q},p4/Z,\[x0,x6,lsl#4\]'
 .*: Error: selected processor does not support `st1q Z0.Q,p4,\[Z16.D,x0\]'
 .*: Error: selected processor does not support `st2q {Z0.Q,Z1.Q},p4,\[x0,#-4,MUL VL\]'
-.*: Error: selected processor does not support `st3q {Z0.Q,Z1.Q,Z2.Q},p4,\[x0,#-4,MUL VL\]'
-.*: Error: selected processor does not support `st4q {Z0.Q,Z1.Q,Z2.Q,Z3.Q},p4,\[x0,#-4,MUL VL\]'
+.*: Error: selected processor does not support `st3q .*
+.*: Error: selected processor does not support `st4q .*
 .*: Error: selected processor does not support `st2q {Z0.Q,Z1.Q},p4,\[x0,x2,lsl#4\]'
 .*: Error: selected processor does not support `st3q {Z0.Q,Z1.Q,Z2.Q},p4,\[x0,x4,lsl#4\]'
 .*: Error: selected processor does not support `st4q {Z0.Q,Z1.Q,Z2.Q,Z3.Q},p4,\[x0,x6,lsl#4\]'
--- a/opcodes/aarch64-tbl.h
+++ b/opcodes/aarch64-tbl.h
@@ -6378,16 +6378,16 @@ const struct aarch64_opcode aarch64_opco
   SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 1),
   SVE2p1_INSNC("ld1q",0xc400a000, 0xffe0e000, sve_misc, 0, OP3 (SVE_Zt, SVE_Pg3, SVE_ADDR_ZX), OP_SVE_SZS_QD, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld2q",0xa490e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
-  SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
-  SVE2p1_INSNC("ld4q",0xa590e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
+  SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x3xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
+  SVE2p1_INSNC("ld4q",0xa590e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x4xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld2q",0xa4a0e000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld3q",0xa5208000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld4q",0xa5a08000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
 
   SVE2p1_INSNC("st1q",0xe4202000, 0xffe0e000, sve_misc, 0, OP3 (SVE_Zt, SVE_Pg3, SVE_ADDR_ZX), OP_SVE_SUS_QD, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("st2q",0xe4400000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
-  SVE2p1_INSNC("st3q",0xe4800000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
-  SVE2p1_INSNC("st4q",0xe4c00000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
+  SVE2p1_INSNC("st3q",0xe4800000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x3xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
+  SVE2p1_INSNC("st4q",0xe4c00000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x4xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("st2q",0xe4600000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("st3q",0xe4a00000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("st4q",0xe4e00000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/6] Arm64: correct SVE2.1 ld2q (scalar plus scalar)
  2024-02-23 11:26 [PATCH 0/6] Arm64: (mostly) SVE adjustments Jan Beulich
                   ` (3 preceding siblings ...)
  2024-02-23 11:29 ` [PATCH 4/6] Arm64: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate) Jan Beulich
@ 2024-02-23 11:30 ` Jan Beulich
  2024-05-09 14:34   ` Richard Earnshaw (lists)
  2024-02-23 11:30 ` [PATCH 6/6] gas/NEWS: drop mention of Arm64's SVE2.1 and SME2.1 Jan Beulich
  2024-03-15 16:20 ` [PATCH 0/6] Arm64: (mostly) SVE adjustments Andrew Carlotti
  6 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2024-02-23 11:30 UTC (permalink / raw)
  To: Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton

It's opcode was wrong, as was e.g. easily visible from the inappropriate
testcase expectation.

--- a/gas/testsuite/gas/aarch64/sve2p1-1.d
+++ b/gas/testsuite/gas/aarch64/sve2p1-1.d
@@ -93,7 +93,7 @@
 .*:	a49ef000 	ld2q	{z0.q, z1.q}, p4/z, \[x0, #-4, mul vl\]
 .*:	a51ef000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, #-6, mul vl\]
 .*:	a59ef000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, #-8, mul vl\]
-.*:	a4a2f000 	ld2h	{z0.h-z1.h}, p4/z, \[x0, #4, mul vl\]
+.*:	a4a29000 	ld2q	{z0.q, z1.q}, p4/z, \[x0, x2, lsl #4\]
 .*:	a5249000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, x4, lsl #4\]
 .*:	a5a69000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, x6, lsl #4\]
 .*:	e4203200 	st1q	z0.q, p4, \[z16.d, x0\]
--- a/opcodes/aarch64-tbl.h
+++ b/opcodes/aarch64-tbl.h
@@ -6380,7 +6380,7 @@ const struct aarch64_opcode aarch64_opco
   SVE2p1_INSNC("ld2q",0xa490e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x3xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld4q",0xa590e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x4xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
-  SVE2p1_INSNC("ld2q",0xa4a0e000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
+  SVE2p1_INSNC("ld2q",0xa4a08000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld3q",0xa5208000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
   SVE2p1_INSNC("ld4q",0xa5a08000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 6/6] gas/NEWS: drop mention of Arm64's SVE2.1 and SME2.1
  2024-02-23 11:26 [PATCH 0/6] Arm64: (mostly) SVE adjustments Jan Beulich
                   ` (4 preceding siblings ...)
  2024-02-23 11:30 ` [PATCH 5/6] Arm64: correct SVE2.1 ld2q (scalar plus scalar) Jan Beulich
@ 2024-02-23 11:30 ` Jan Beulich
  2024-03-15 16:20 ` [PATCH 0/6] Arm64: (mostly) SVE adjustments Andrew Carlotti
  6 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2024-02-23 11:30 UTC (permalink / raw)
  To: Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton

... plus the SME part of B16B16. As per

https://sourceware.org/pipermail/binutils/2024-February/132408.html

SVE2.1 support is both incomplete and buggy. SME2.1 "support" goes as
far as a single instruction (a subset of movaz forms) only. The SME part
of B16B16 is entirely missing.

--- a/gas/NEWS
+++ b/gas/NEWS
@@ -4,11 +4,7 @@ Changes in 2.42:
 
 * Add support for AMD znver5 processor.
 
-* Add support for the AArch64 Scalable Vector Extension version 2.1 (SVE2.1).
-
-* Add support for the AArch64 Scalable Matrix Extension version 2.1 (SME2.1).
-
-* Add support for the AArch64 BFloat16 to BFloat16 arithmetic for SVE2 and SME2
+* Add support for the AArch64 BFloat16 to BFloat16 arithmetic for SVE2
   (B16B16).
 
 * Add support for the AArch64 Reliability, Availability and Serviceability


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen
  2024-02-23 11:29 ` [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen Jan Beulich
@ 2024-03-15 16:09   ` Andrew Carlotti
  2024-03-18  8:35     ` Jan Beulich
  2024-03-20 16:51   ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 20+ messages in thread
From: Andrew Carlotti @ 2024-03-15 16:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Richard Earnshaw, Marcus Shawcroft, Nick Clifton

On Fri, Feb 23, 2024 at 12:29:00PM +0100, Jan Beulich wrote:
> Make sure that field actually matches the specified operands. Don't
> follow existing F_PSEUDO checking in using assertions, though. Print
> meaingful error messages, thus - while not having a line number
> available - at least providing some indication of where things are
> wrong.

This new check should be helpful.  However, some mnemonics have a lot of
variants, so could you also add the opcode (and maybe the mask) to the new
error messages? For example:

extq (0x05602400,0xfff0fc00): operands 1 and 2 match, but tied=0

> Fix SVE2.1's extq accordingly, but don't extend the testsuite there:
> There are further issues with its operands (SVE_Zm_imm4 doesn't look to
> be correct to use there, as that describes an indexed vector register,
> while here a separate vector register and immediate operand are to be
> specified).
> 
> --- a/opcodes/aarch64-gen.c
> +++ b/opcodes/aarch64-gen.c
> @@ -129,6 +129,7 @@ read_table (const struct aarch64_opcode*
>    const struct aarch64_opcode *ent = table;
>    opcode_node **new_ent;
>    unsigned int index = initialize_index (table);
> +  unsigned int errors = 0;
>  
>    if (!ent->name)
>      return;
> @@ -140,6 +141,8 @@ read_table (const struct aarch64_opcode*
>  
>    do
>      {
> +      bool match = false;
> +
>        /* F_PSEUDO needs to be used together with F_ALIAS to indicate an alias
>  	 opcode is a programmer friendly pseudo instruction available only in
>  	 the assembly code (thus will not show up in the disassembly).  */
> @@ -150,12 +153,45 @@ read_table (const struct aarch64_opcode*
>  	  index++;
>  	  continue;
>  	}
> +
> +      /* Check tied_operand against operands[].  */
> +      for (unsigned int i = 1; i < ARRAY_SIZE (ent->operands); ++i)
> +	{
> +	  if (ent->operands[i] == AARCH64_OPND_NIL)
> +	    break;
> +
> +	  if (ent->operands[i] != ent->operands[0])
> +	    continue;
> +	  match = true;
> +
> +	  if (i != ent->tied_operand)
> +	    {
> +	      fprintf (stderr, "%s: operands 1 and %u match, but tied=%u\n",
> +		       ent->name, i + 1, ent->tied_operand);
> +	      ++errors;
> +	    }
> +	}
> +      if (!match && ent->tied_operand
> +	  /* SME LDR/STR (array vector) tie together inner immediates only.  */
> +	  && ent->iclass != sme_ldr && ent->iclass != sme_str)
> +	{
> +	  fprintf (stderr, "%s: no operands match, but tied=%u\n",
> +		   ent->name, ent->tied_operand);
> +	  ++errors;
> +	}
> +
>        *new_ent = new_opcode_node ();
>        (*new_ent)->opcode = ent->opcode;
>        (*new_ent)->mask = ent->mask;
>        (*new_ent)->index = index++;
>        new_ent = &((*new_ent)->next);
>      } while ((++ent)->name);
> +
> +  if (errors)
> +    {
> +      fprintf (stderr, "%u errors, exiting\n", errors);
> +      xexit (3);
> +    }
>  }
>  
>  static inline void
> --- a/opcodes/aarch64-tbl.h
> +++ b/opcodes/aarch64-tbl.h
> @@ -6375,7 +6375,7 @@ const struct aarch64_opcode aarch64_opco
>    SVE2p1_INSNC("fminqv",0x6417a000, 0xff3fe000, sve2_urqvs, 0, OP3 (Vd, SVE_Pg3, SVE_Zn), OP_SVE_vUS_HSD_HSD, F_OPD_SIZE, C_SCAN_MOVPRFX, 0),
>  
>    SVE2p1_INSN("dupq",0x05202400, 0xffe0fc00, sve_index1, 0, OP2 (SVE_Zd, SVE_Zn_5_INDEX), OP_SVE_VV_BHSD, 0, 0),
> -  SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 0),
> +  SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 1),
>    SVE2p1_INSNC("ld1q",0xc400a000, 0xffe0e000, sve_misc, 0, OP3 (SVE_Zt, SVE_Pg3, SVE_ADDR_ZX), OP_SVE_SZS_QD, 0, C_SCAN_MOVPRFX, 0),
>    SVE2p1_INSNC("ld2q",0xa490e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>    SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Arm64: (mostly) SVE adjustments
  2024-02-23 11:26 [PATCH 0/6] Arm64: (mostly) SVE adjustments Jan Beulich
                   ` (5 preceding siblings ...)
  2024-02-23 11:30 ` [PATCH 6/6] gas/NEWS: drop mention of Arm64's SVE2.1 and SME2.1 Jan Beulich
@ 2024-03-15 16:20 ` Andrew Carlotti
  2024-03-18  8:23   ` Jan Beulich
  6 siblings, 1 reply; 20+ messages in thread
From: Andrew Carlotti @ 2024-03-15 16:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Binutils, Richard Earnshaw, Marcus Shawcroft, Nick Clifton

On Fri, Feb 23, 2024 at 12:26:37PM +0100, Jan Beulich wrote:
> Some of the issues addressed here were pointed out before, but only
> not overly involved ones of those (plus a couple of subsequent findings)
> are taken care of. The rest is left to people more familiar with the
> inner workings of the operand type machinery.
> 
> 1: correct B16B16 indexed bf{mla,mls,mul}
> 2: check matching operands for predicated B16B16 insns
> 3: check tied operand specifier in aarch64-gen
> 4: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate)
> 5: correct SVE2.1 ld2q (scalar plus scalar)
> 6: gas/NEWS: drop mention of Arm64's SVE2.1 and SME2.1
> 
> At least the last patch wants backporting to 2.42.
> 
> Jan

A general comment: we refer to the port as "AArch64", and typically use
"aarch64: ..." in commit message headers.

I've pushed a further gas/NEWS update on top of your already committed patch 6,
to match the change on the release branch.  Aside from the aforementioned
naming issues and my suggested error message improvement, the remainder of this
series looks good to me (although I can't formally approve anything).

(If/when this is merged, Srinath can rebase his other fixes on top of this
series.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Arm64: (mostly) SVE adjustments
  2024-03-15 16:20 ` [PATCH 0/6] Arm64: (mostly) SVE adjustments Andrew Carlotti
@ 2024-03-18  8:23   ` Jan Beulich
  2024-05-09 14:17     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2024-03-18  8:23 UTC (permalink / raw)
  To: Andrew Carlotti
  Cc: Binutils, Richard Earnshaw, Marcus Shawcroft, Nick Clifton,
	Srinath Parvathaneni

On 15.03.2024 17:20, Andrew Carlotti wrote:
> On Fri, Feb 23, 2024 at 12:26:37PM +0100, Jan Beulich wrote:
>> Some of the issues addressed here were pointed out before, but only
>> not overly involved ones of those (plus a couple of subsequent findings)
>> are taken care of. The rest is left to people more familiar with the
>> inner workings of the operand type machinery.
>>
>> 1: correct B16B16 indexed bf{mla,mls,mul}
>> 2: check matching operands for predicated B16B16 insns
>> 3: check tied operand specifier in aarch64-gen
>> 4: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate)
>> 5: correct SVE2.1 ld2q (scalar plus scalar)
>> 6: gas/NEWS: drop mention of Arm64's SVE2.1 and SME2.1
>>
>> At least the last patch wants backporting to 2.42.
> 
> A general comment: we refer to the port as "AArch64", and typically use
> "aarch64: ..." in commit message headers.

I'm aware of aarch64 being the "arch identifier". I'm not alone though in
preferring Arm64 in textual uses - see Linux sources for a prominent
example.

> I've pushed a further gas/NEWS update on top of your already committed patch 6,
> to match the change on the release branch.  Aside from the aforementioned
> naming issues and my suggested error message improvement, the remainder of this
> series looks good to me (although I can't formally approve anything).
> 
> (If/when this is merged, Srinath can rebase his other fixes on top of this
> series.)

In the absence of arch maintainer comments I've committed the first two
patches. I'll reply to your comment on patch 3 separately. Having seen
Srinath's patches, I'm actually okay with dropping patches 4 and 5 from
here, in favor of his more complete fixes. Once patch 3 is in, he'd need
to rebase there, of course.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen
  2024-03-15 16:09   ` Andrew Carlotti
@ 2024-03-18  8:35     ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2024-03-18  8:35 UTC (permalink / raw)
  To: Andrew Carlotti
  Cc: Binutils, Richard Earnshaw, Marcus Shawcroft, Nick Clifton

On 15.03.2024 17:09, Andrew Carlotti wrote:
> On Fri, Feb 23, 2024 at 12:29:00PM +0100, Jan Beulich wrote:
>> Make sure that field actually matches the specified operands. Don't
>> follow existing F_PSEUDO checking in using assertions, though. Print
>> meaingful error messages, thus - while not having a line number
>> available - at least providing some indication of where things are
>> wrong.
> 
> This new check should be helpful.  However, some mnemonics have a lot of
> variants, so could you also add the opcode (and maybe the mask) to the new
> error messages? For example:
> 
> extq (0x05602400,0xfff0fc00): operands 1 and 2 match, but tied=0

Hmm, yes, I could do that for disambiguation; not sure if the mask value
really is needed there - I fear overloading the message. What I was
instead hoping for though is an idea how to report the line number. Imo
that would be far better, just that the way the header file is used
doesn't easily lend itself to doing so.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] Arm64: correct B16B16 indexed bf{mla,mls,mul}
  2024-02-23 11:28 ` [PATCH 1/6] Arm64: correct B16B16 indexed bf{mla,mls,mul} Jan Beulich
@ 2024-03-20 15:54   ` Richard Earnshaw (lists)
  2024-03-20 16:09     ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Earnshaw (lists) @ 2024-03-20 15:54 UTC (permalink / raw)
  To: Jan Beulich, Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft

On 23/02/2024 11:28, Jan Beulich wrote:
> Their index is in bits 19, 20, and 22. Bit 11 in particular is already
> set in the base opcode. Note also how disassembler output didn't match
> assembler input in the respective testcase.

This is OK.

I realize you didn't write the tests, but something I generally recommend is that, whenever, possible, each test entry should test one field in the instruction, with the other fields being set to fill with zero.

We then get, for these tests, things like:

bfmla	z0.h, z0.h, z0.h[0]   // base opcode
bfmla   z31.h, z0.h, z0.h[0]  // Zd register
bfmla   z0.h, z31.h, z0.h[0]  // Zn register
bfmla   z0.h, z0.h, z7.h[0]   // Zm register
bfmla   z0.h, z0.h, z0.h[7]   // index

and then, finally, one instruction that tests a random combination of registers, just to check that we combine things properly:

bfmla   z23.h, z9.h, z5.h[3]

R.

> 
> --- a/gas/testsuite/gas/aarch64/bfloat16-1.d
> +++ b/gas/testsuite/gas/aarch64/bfloat16-1.d
> @@ -56,24 +56,24 @@
>  .*:	65221084 	bfmla	z4.h, p4\/m, z4.h, z2.h
>  .*:	65211908 	bfmla	z8.h, p6\/m, z8.h, z1.h
>  .*:	65201e10 	bfmla	z16.h, p7\/m, z16.h, z0.h
> -.*:	643e0a00 	bfmla	z0.h, z16.h, z6.h\[7\]
> -.*:	643d0901 	bfmla	z1.h, z8.h, z5.h\[7\]
> -.*:	643409c2 	bfmla	z2.h, z14.h, z4.h\[5\]
> -.*:	642a0aa4 	bfmla	z4.h, z21.h, z2.h\[3\]
> -.*:	64210988 	bfmla	z8.h, z12.h, z1.h\[1\]
> -.*:	64200950 	bfmla	z16.h, z10.h, z0.h\[1\]
> +.*:	647e0a00 	bfmla	z0.h, z16.h, z6.h\[7\]
> +.*:	64750901 	bfmla	z1.h, z8.h, z5.h\[6\]
> +.*:	646409c2 	bfmla	z2.h, z14.h, z4.h\[4\]
> +.*:	64320aa4 	bfmla	z4.h, z21.h, z2.h\[2\]
> +.*:	64290988 	bfmla	z8.h, z12.h, z1.h\[1\]
> +.*:	64200950 	bfmla	z16.h, z10.h, z0.h\[0\]
>  .*:	65302000 	bfmls	z0.h, p0\/m, z0.h, z16.h
>  .*:	65282421 	bfmls	z1.h, p1\/m, z1.h, z8.h
>  .*:	65242842 	bfmls	z2.h, p2\/m, z2.h, z4.h
>  .*:	65223084 	bfmls	z4.h, p4\/m, z4.h, z2.h
>  .*:	65213908 	bfmls	z8.h, p6\/m, z8.h, z1.h
>  .*:	65203e10 	bfmls	z16.h, p7\/m, z16.h, z0.h
> -.*:	643e0e00 	bfmls	z0.h, z16.h, z6.h\[7\]
> -.*:	643d0d01 	bfmls	z1.h, z8.h, z5.h\[7\]
> -.*:	64340dc2 	bfmls	z2.h, z14.h, z4.h\[5\]
> -.*:	642a0ea4 	bfmls	z4.h, z21.h, z2.h\[3\]
> -.*:	64210d88 	bfmls	z8.h, z12.h, z1.h\[1\]
> -.*:	64200d50 	bfmls	z16.h, z10.h, z0.h\[1\]
> +.*:	647e0e00 	bfmls	z0.h, z16.h, z6.h\[7\]
> +.*:	64750d01 	bfmls	z1.h, z8.h, z5.h\[6\]
> +.*:	64640dc2 	bfmls	z2.h, z14.h, z4.h\[4\]
> +.*:	64320ea4 	bfmls	z4.h, z21.h, z2.h\[2\]
> +.*:	64290d88 	bfmls	z8.h, z12.h, z1.h\[1\]
> +.*:	64200d50 	bfmls	z16.h, z10.h, z0.h\[0\]
>  .*:	65028200 	bfmul	z0.h, p0\/m, z0.h, z16.h
>  .*:	65028501 	bfmul	z1.h, p1\/m, z1.h, z8.h
>  .*:	65028882 	bfmul	z2.h, p2\/m, z2.h, z4.h
> @@ -86,12 +86,12 @@
>  .*:	65020a04 	bfmul	z4.h, z16.h, z2.h
>  .*:	65010a88 	bfmul	z8.h, z20.h, z1.h
>  .*:	65000b10 	bfmul	z16.h, z24.h, z0.h
> -.*:	643e2a00 	bfmul	z0.h, z16.h, z6.h\[7\]
> -.*:	643d2901 	bfmul	z1.h, z8.h, z5.h\[7\]
> -.*:	643429c2 	bfmul	z2.h, z14.h, z4.h\[5\]
> -.*:	642a2aa4 	bfmul	z4.h, z21.h, z2.h\[3\]
> -.*:	64212988 	bfmul	z8.h, z12.h, z1.h\[1\]
> -.*:	64202950 	bfmul	z16.h, z10.h, z0.h\[1\]
> +.*:	647e2a00 	bfmul	z0.h, z16.h, z6.h\[7\]
> +.*:	64752901 	bfmul	z1.h, z8.h, z5.h\[6\]
> +.*:	646429c2 	bfmul	z2.h, z14.h, z4.h\[4\]
> +.*:	64322aa4 	bfmul	z4.h, z21.h, z2.h\[2\]
> +.*:	64292988 	bfmul	z8.h, z12.h, z1.h\[1\]
> +.*:	64202950 	bfmul	z16.h, z10.h, z0.h\[0\]
>  .*:	65018200 	bfsub	z0.h, p0\/m, z0.h, z16.h
>  .*:	65018501 	bfsub	z1.h, p1\/m, z1.h, z8.h
>  .*:	65018882 	bfsub	z2.h, p2\/m, z2.h, z4.h
> --- a/opcodes/aarch64-tbl.h
> +++ b/opcodes/aarch64-tbl.h
> @@ -6344,9 +6344,9 @@ const struct aarch64_opcode aarch64_opco
>    B16B16_INSN("bfmul", 0x65000800, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
>    B16B16_INSNC("bfsub", 0x65018000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
>    B16B16_INSN("bfsub", 0x65000400, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
> -  B16B16_INSN("bfmla", 0x64200800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_11_INDEX), OP_SVE_VVV_H, 0, 0),
> -  B16B16_INSN("bfmls", 0x64200c00, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_11_INDEX), OP_SVE_VVV_H, 0, 0),
> -  B16B16_INSN("bfmul", 0x64202800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_11_INDEX), OP_SVE_VVV_H, 0, 0),
> +  B16B16_INSN("bfmla", 0x64200800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
> +  B16B16_INSN("bfmls", 0x64200c00, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
> +  B16B16_INSN("bfmul", 0x64202800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
>  
>  /* SME2.1 movaz instructions.  */
>    SME2p1_INSN ("movaz", 0xc0060600, 0xffff1f83, sme2_movaz, 0, OP2 (SME_Zdnx4, SME_ZA_array_vrsb_2), OP_SVE_BB, 0, 0),
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] Arm64: correct B16B16 indexed bf{mla,mls,mul}
  2024-03-20 15:54   ` Richard Earnshaw (lists)
@ 2024-03-20 16:09     ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2024-03-20 16:09 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: Richard Earnshaw, Marcus Shawcroft, Binutils

On 20.03.2024 16:54, Richard Earnshaw (lists) wrote:
> On 23/02/2024 11:28, Jan Beulich wrote:
>> Their index is in bits 19, 20, and 22. Bit 11 in particular is already
>> set in the base opcode. Note also how disassembler output didn't match
>> assembler input in the respective testcase.
> 
> This is OK.
> 
> I realize you didn't write the tests, but something I generally recommend is that, whenever, possible, each test entry should test one field in the instruction, with the other fields being set to fill with zero.
> 
> We then get, for these tests, things like:
> 
> bfmla	z0.h, z0.h, z0.h[0]   // base opcode
> bfmla   z31.h, z0.h, z0.h[0]  // Zd register
> bfmla   z0.h, z31.h, z0.h[0]  // Zn register
> bfmla   z0.h, z0.h, z7.h[0]   // Zm register
> bfmla   z0.h, z0.h, z0.h[7]   // index

Indeed that's how I'm writing tests for my own disassembler library. Whereas
here, as you say, I merely had to alter what was there already.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] Arm64: check matching operands for predicated B16B16 insns
  2024-02-23 11:28 ` [PATCH 2/6] Arm64: check matching operands for predicated B16B16 insns Jan Beulich
@ 2024-03-20 16:19   ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Earnshaw (lists) @ 2024-03-20 16:19 UTC (permalink / raw)
  To: Jan Beulich, Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft

On 23/02/2024 11:28, Jan Beulich wrote:
> Except for bfml{a,s} their 1st and 3rd operands need to match - pass
> the TIED macro argument accordingly. While doing that also slightly
> re-arrange table entries, such that all predicated insns are close
> together.
> 
> At the same time change the existing test source to actually use non-
> matching operands for the respective bfml{a,s} forms.

OK.

R.

> 
> --- a/gas/testsuite/gas/aarch64/bfloat16-1.d
> +++ b/gas/testsuite/gas/aarch64/bfloat16-1.d
> @@ -50,24 +50,24 @@
>  .*:	64222604 	bfclamp	z4.h, z16.h, z2.h
>  .*:	64212688 	bfclamp	z8.h, z20.h, z1.h
>  .*:	64202710 	bfclamp	z16.h, z24.h, z0.h
> -.*:	65300000 	bfmla	z0.h, p0\/m, z0.h, z16.h
> -.*:	65280421 	bfmla	z1.h, p1\/m, z1.h, z8.h
> -.*:	65240842 	bfmla	z2.h, p2\/m, z2.h, z4.h
> -.*:	65221084 	bfmla	z4.h, p4\/m, z4.h, z2.h
> -.*:	65211908 	bfmla	z8.h, p6\/m, z8.h, z1.h
> -.*:	65201e10 	bfmla	z16.h, p7\/m, z16.h, z0.h
> +.*:	65300080 	bfmla	z0.h, p0\/m, z4.h, z16.h
> +.*:	65280501 	bfmla	z1.h, p1\/m, z8.h, z8.h
> +.*:	65240982 	bfmla	z2.h, p2\/m, z12.h, z4.h
> +.*:	65221204 	bfmla	z4.h, p4\/m, z16.h, z2.h
> +.*:	65211a88 	bfmla	z8.h, p6\/m, z20.h, z1.h
> +.*:	65201f10 	bfmla	z16.h, p7\/m, z24.h, z0.h
>  .*:	647e0a00 	bfmla	z0.h, z16.h, z6.h\[7\]
>  .*:	64750901 	bfmla	z1.h, z8.h, z5.h\[6\]
>  .*:	646409c2 	bfmla	z2.h, z14.h, z4.h\[4\]
>  .*:	64320aa4 	bfmla	z4.h, z21.h, z2.h\[2\]
>  .*:	64290988 	bfmla	z8.h, z12.h, z1.h\[1\]
>  .*:	64200950 	bfmla	z16.h, z10.h, z0.h\[0\]
> -.*:	65302000 	bfmls	z0.h, p0\/m, z0.h, z16.h
> -.*:	65282421 	bfmls	z1.h, p1\/m, z1.h, z8.h
> -.*:	65242842 	bfmls	z2.h, p2\/m, z2.h, z4.h
> -.*:	65223084 	bfmls	z4.h, p4\/m, z4.h, z2.h
> -.*:	65213908 	bfmls	z8.h, p6\/m, z8.h, z1.h
> -.*:	65203e10 	bfmls	z16.h, p7\/m, z16.h, z0.h
> +.*:	65302080 	bfmls	z0.h, p0\/m, z4.h, z16.h
> +.*:	65282501 	bfmls	z1.h, p1\/m, z8.h, z8.h
> +.*:	65242982 	bfmls	z2.h, p2\/m, z12.h, z4.h
> +.*:	65223204 	bfmls	z4.h, p4\/m, z16.h, z2.h
> +.*:	65213a88 	bfmls	z8.h, p6\/m, z20.h, z1.h
> +.*:	65203f10 	bfmls	z16.h, p7\/m, z24.h, z0.h
>  .*:	647e0e00 	bfmls	z0.h, z16.h, z6.h\[7\]
>  .*:	64750d01 	bfmls	z1.h, z8.h, z5.h\[6\]
>  .*:	64640dc2 	bfmls	z2.h, z14.h, z4.h\[4\]
> --- a/gas/testsuite/gas/aarch64/bfloat16-1.s
> +++ b/gas/testsuite/gas/aarch64/bfloat16-1.s
> @@ -46,12 +46,13 @@ bfclamp z2.h, z12.h, z4.h
>  bfclamp z4.h, z16.h, z2.h
>  bfclamp z8.h, z20.h, z1.h
>  bfclamp z16.h, z24.h, z0.h
> -bfmla z0.h, p0/m, z0.h, z16.h
> -bfmla z1.h, p1/m, z1.h, z8.h
> -bfmla z2.h, p2/m, z2.h, z4.h
> -bfmla z4.h, p4/m, z4.h, z2.h
> -bfmla z8.h, p6/m, z8.h, z1.h
> -bfmla z16.h, p7/m, z16.h, z0.h
> +
> +bfmla z0.h, p0/m, z4.h, z16.h
> +bfmla z1.h, p1/m, z8.h, z8.h
> +bfmla z2.h, p2/m, z12.h, z4.h
> +bfmla z4.h, p4/m, z16.h, z2.h
> +bfmla z8.h, p6/m, z20.h, z1.h
> +bfmla z16.h, p7/m, z24.h, z0.h
>  
>  bfmla z0.h, z16.h, z6.h[7]
>  bfmla z1.h, z8.h, z5.h[6]
> @@ -60,12 +61,12 @@ bfmla z4.h, z21.h, z2.h[2]
>  bfmla z8.h, z12.h, z1.h[1]
>  bfmla z16.h, z10.h, z0.h[0]
>  
> -bfmls z0.h, p0/m, z0.h, z16.h
> -bfmls z1.h, p1/m, z1.h, z8.h
> -bfmls z2.h, p2/m, z2.h, z4.h
> -bfmls z4.h, p4/m, z4.h, z2.h
> -bfmls z8.h, p6/m, z8.h, z1.h
> -bfmls z16.h, p7/m, z16.h, z0.h
> +bfmls z0.h, p0/m, z4.h, z16.h
> +bfmls z1.h, p1/m, z8.h, z8.h
> +bfmls z2.h, p2/m, z12.h, z4.h
> +bfmls z4.h, p4/m, z16.h, z2.h
> +bfmls z8.h, p6/m, z20.h, z1.h
> +bfmls z16.h, p7/m, z24.h, z0.h
>  
>  bfmls z0.h, z16.h, z6.h[7]
>  bfmls z1.h, z8.h, z5.h[6]
> --- a/gas/testsuite/gas/aarch64/bfloat16-bad.l
> +++ b/gas/testsuite/gas/aarch64/bfloat16-bad.l
> @@ -41,24 +41,24 @@
>  .*: Error: selected processor does not support `bfclamp z4.h,z16.h,z2.h'
>  .*: Error: selected processor does not support `bfclamp z8.h,z20.h,z1.h'
>  .*: Error: selected processor does not support `bfclamp z16.h,z24.h,z0.h'
> -.*: Error: selected processor does not support `bfmla z0.h,p0\/m,z0.h,z16.h'
> -.*: Error: selected processor does not support `bfmla z1.h,p1\/m,z1.h,z8.h'
> -.*: Error: selected processor does not support `bfmla z2.h,p2\/m,z2.h,z4.h'
> -.*: Error: selected processor does not support `bfmla z4.h,p4\/m,z4.h,z2.h'
> -.*: Error: selected processor does not support `bfmla z8.h,p6\/m,z8.h,z1.h'
> -.*: Error: selected processor does not support `bfmla z16.h,p7\/m,z16.h,z0.h'
> +.*: Error: selected processor does not support `bfmla .*
> +.*: Error: selected processor does not support `bfmla .*
> +.*: Error: selected processor does not support `bfmla .*
> +.*: Error: selected processor does not support `bfmla .*
> +.*: Error: selected processor does not support `bfmla .*
> +.*: Error: selected processor does not support `bfmla .*
>  .*: Error: selected processor does not support `bfmla z0.h,z16.h,z6.h\[7\]'
>  .*: Error: selected processor does not support `bfmla z1.h,z8.h,z5.h\[6\]'
>  .*: Error: selected processor does not support `bfmla z2.h,z14.h,z4.h\[4\]'
>  .*: Error: selected processor does not support `bfmla z4.h,z21.h,z2.h\[2\]'
>  .*: Error: selected processor does not support `bfmla z8.h,z12.h,z1.h\[1\]'
>  .*: Error: selected processor does not support `bfmla z16.h,z10.h,z0.h\[0\]'
> -.*: Error: selected processor does not support `bfmls z0.h,p0\/m,z0.h,z16.h'
> -.*: Error: selected processor does not support `bfmls z1.h,p1\/m,z1.h,z8.h'
> -.*: Error: selected processor does not support `bfmls z2.h,p2\/m,z2.h,z4.h'
> -.*: Error: selected processor does not support `bfmls z4.h,p4\/m,z4.h,z2.h'
> -.*: Error: selected processor does not support `bfmls z8.h,p6\/m,z8.h,z1.h'
> -.*: Error: selected processor does not support `bfmls z16.h,p7\/m,z16.h,z0.h'
> +.*: Error: selected processor does not support `bfmls .*
> +.*: Error: selected processor does not support `bfmls .*
> +.*: Error: selected processor does not support `bfmls .*
> +.*: Error: selected processor does not support `bfmls .*
> +.*: Error: selected processor does not support `bfmls .*
> +.*: Error: selected processor does not support `bfmls .*
>  .*: Error: selected processor does not support `bfmls z0.h,z16.h,z6.h\[7\]'
>  .*: Error: selected processor does not support `bfmls z1.h,z8.h,z5.h\[6\]'
>  .*: Error: selected processor does not support `bfmls z2.h,z14.h,z4.h\[4\]'
> --- /dev/null
> +++ b/gas/testsuite/gas/aarch64/bfloat16-invalid.d
> @@ -0,0 +1,4 @@
> +#name: Test Bfloat16 instructions with wrong operand combinations
> +#as: -march=armv9.4-a
> +#source: bfloat16-invalid.s
> +#error_output: bfloat16-invalid.l
> --- /dev/null
> +++ b/gas/testsuite/gas/aarch64/bfloat16-invalid.l
> @@ -0,0 +1,8 @@
> +.*: Assembler messages:
> +[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfadd .*
> +[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfmax .*
> +[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfmaxnm .*
> +[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfmin .*
> +[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfminnm .*
> +[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfmul .*
> +[^ :]+:[0-9]+: Error: operand 3 must be the same register as operand 1 -- `bfsub .*
> --- /dev/null
> +++ b/gas/testsuite/gas/aarch64/bfloat16-invalid.s
> @@ -0,0 +1,13 @@
> +bfadd z0.h, p0/m, z1.h, z0.h
> +
> +bfmax z0.h, p0/m, z1.h, z0.h
> +
> +bfmaxnm z0.h, p0/m, z1.h, z0.h
> +
> +bfmin z0.h, p0/m, z1.h, z0.h
> +
> +bfminnm z0.h, p0/m, z1.h, z0.h
> +
> +bfmul z0.h, p0/m, z1.h, z0.h
> +
> +bfsub z0.h, p0/m, z1.h, z0.h
> --- a/opcodes/aarch64-dis-2.c
> +++ b/opcodes/aarch64-dis-2.c
> @@ -32211,14 +32211,14 @@ aarch64_find_next_opcode (const aarch64_
>      case 1705: return NULL;		/* ldff1h --> NULL.  */
>      case 1659: value = 3313; break;	/* ld2h --> ld2q.  */
>      case 3313: return NULL;		/* ld2q --> NULL.  */
> -    case 2464: value = 3279; break;	/* fclamp --> bfclamp.  */
> -    case 3279: return NULL;		/* bfclamp --> NULL.  */
> +    case 2464: value = 3281; break;	/* fclamp --> bfclamp.  */
> +    case 3281: return NULL;		/* bfclamp --> NULL.  */
>      case 1778: value = 1779; break;	/* ldr --> ldr.  */
>      case 1779: return NULL;		/* ldr --> NULL.  */
> -    case 1434: value = 3278; break;	/* fadd --> bfadd.  */
> -    case 3278: return NULL;		/* bfadd --> NULL.  */
> -    case 1501: value = 3281; break;	/* fmul --> bfmul.  */
> -    case 3281: return NULL;		/* bfmul --> NULL.  */
> +    case 1434: value = 3280; break;	/* fadd --> bfadd.  */
> +    case 3280: return NULL;		/* bfadd --> NULL.  */
> +    case 1501: value = 3282; break;	/* fmul --> bfmul.  */
> +    case 3282: return NULL;		/* bfmul --> NULL.  */
>      case 1527: value = 3283; break;	/* fsub --> bfsub.  */
>      case 3283: return NULL;		/* bfsub --> NULL.  */
>      case 1492: value = 3276; break;	/* fmla --> bfmla.  */
> @@ -32251,12 +32251,12 @@ aarch64_find_next_opcode (const aarch64_
>      case 3271: return NULL;		/* bfadd --> NULL.  */
>      case 1482: value = 3273; break;	/* fmaxnm --> bfmaxnm.  */
>      case 3273: return NULL;		/* bfmaxnm --> NULL.  */
> -    case 1502: value = 3280; break;	/* fmul --> bfmul.  */
> -    case 3280: return NULL;		/* bfmul --> NULL.  */
> +    case 1502: value = 3278; break;	/* fmul --> bfmul.  */
> +    case 3278: return NULL;		/* bfmul --> NULL.  */
>      case 1480: value = 3272; break;	/* fmax --> bfmax.  */
>      case 3272: return NULL;		/* bfmax --> NULL.  */
> -    case 1528: value = 3282; break;	/* fsub --> bfsub.  */
> -    case 3282: return NULL;		/* bfsub --> NULL.  */
> +    case 1528: value = 3279; break;	/* fsub --> bfsub.  */
> +    case 3279: return NULL;		/* bfsub --> NULL.  */
>      case 1488: value = 3275; break;	/* fminnm --> bfminnm.  */
>      case 3275: return NULL;		/* bfminnm --> NULL.  */
>      case 1486: value = 3274; break;	/* fmin --> bfmin.  */
> --- a/opcodes/aarch64-tbl.h
> +++ b/opcodes/aarch64-tbl.h
> @@ -6331,18 +6331,18 @@ const struct aarch64_opcode aarch64_opco
>    D128_THE_INSN("rcwsswppl", 0x5960a000, 0xffe0fc00, OP3 (Rt, Rs, ADDR_SIMPLE), QL_X2NIL, 0),
>  
>  /* BFloat16 SVE Instructions.  */
> -  B16B16_INSNC("bfadd", 0x65008000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
> -  B16B16_INSNC("bfmax", 0x65068000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
> -  B16B16_INSNC("bfmaxnm", 0x65048000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
> -  B16B16_INSNC("bfmin", 0x65078000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
> -  B16B16_INSNC("bfminnm", 0x65058000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
> +  B16B16_INSNC("bfadd", 0x65008000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
> +  B16B16_INSNC("bfmax", 0x65068000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
> +  B16B16_INSNC("bfmaxnm", 0x65048000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
> +  B16B16_INSNC("bfmin", 0x65078000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
> +  B16B16_INSNC("bfminnm", 0x65058000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
>    B16B16_INSNC("bfmla", 0x65200000, 0xffe0e000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zn, SVE_Zm_16), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
>    B16B16_INSNC("bfmls", 0x65202000, 0xffe0e000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zn, SVE_Zm_16), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
> +  B16B16_INSNC("bfmul", 0x65028000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
> +  B16B16_INSNC("bfsub", 0x65018000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 2),
>    B16B16_INSN("bfadd", 0x65000000, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
>    B16B16_INSN("bfclamp", 0x64202400, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
> -  B16B16_INSNC("bfmul", 0x65028000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
>    B16B16_INSN("bfmul", 0x65000800, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
> -  B16B16_INSNC("bfsub", 0x65018000, 0xffffe000, sve_misc, 0, OP4 (SVE_Zd, SVE_Pg3, SVE_Zd, SVE_Zm_5), OP_SVE_SMSS, 0, C_SCAN_MOVPRFX, 0),
>    B16B16_INSN("bfsub", 0x65000400, 0xffe0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm_16), OP_SVE_HHH, 0, 0),
>    B16B16_INSN("bfmla", 0x64200800, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
>    B16B16_INSN("bfmls", 0x64200c00, 0xffa0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zn, SVE_Zm3_22_INDEX), OP_SVE_VVV_H, 0, 0),
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen
  2024-02-23 11:29 ` [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen Jan Beulich
  2024-03-15 16:09   ` Andrew Carlotti
@ 2024-03-20 16:51   ` Richard Earnshaw (lists)
  2024-03-21  7:38     ` Jan Beulich
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Earnshaw (lists) @ 2024-03-20 16:51 UTC (permalink / raw)
  To: Jan Beulich, Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton

On 23/02/2024 11:29, Jan Beulich wrote:
> Make sure that field actually matches the specified operands. Don't
> follow existing F_PSEUDO checking in using assertions, though. Print
> meaingful error messages, thus - while not having a line number
> available - at least providing some indication of where things are
> wrong.
> 
> Fix SVE2.1's extq accordingly, but don't extend the testsuite there:
> There are further issues with its operands (SVE_Zm_imm4 doesn't look to
> be correct to use there, as that describes an indexed vector register,
> while here a separate vector register and immediate operand are to be
> specified).
> 
> --- a/opcodes/aarch64-gen.c
> +++ b/opcodes/aarch64-gen.c
> @@ -129,6 +129,7 @@ read_table (const struct aarch64_opcode*
>    const struct aarch64_opcode *ent = table;
>    opcode_node **new_ent;
>    unsigned int index = initialize_index (table);
> +  unsigned int errors = 0;
>  
>    if (!ent->name)
>      return;
> @@ -140,6 +141,8 @@ read_table (const struct aarch64_opcode*
>  
>    do
>      {
> +      bool match = false;
> +
>        /* F_PSEUDO needs to be used together with F_ALIAS to indicate an alias
>  	 opcode is a programmer friendly pseudo instruction available only in
>  	 the assembly code (thus will not show up in the disassembly).  */
> @@ -150,12 +153,45 @@ read_table (const struct aarch64_opcode*
>  	  index++;
>  	  continue;
>  	}
> +
> +      /* Check tied_operand against operands[].  */
> +      for (unsigned int i = 1; i < ARRAY_SIZE (ent->operands); ++i)
> +	{
> +	  if (ent->operands[i] == AARCH64_OPND_NIL)
> +	    break;
> +
> +	  if (ent->operands[i] != ent->operands[0])
> +	    continue;
> +	  match = true;
> +
> +	  if (i != ent->tied_operand)
> +	    {
> +	      fprintf (stderr, "%s: operands 1 and %u match, but tied=%u\n",
> +		       ent->name, i + 1, ent->tied_operand);
> +	      ++errors;
> +	    }

I'm not sure I follow this.  It looks like you're testing that if one operand is tied to operand 0, then no other operand may overlap that, eg that

	extq z3.b, z3.b, z3.b, #5

is an illegal instruction.  But I don't see anything in the instruction description that prohibits that.  While it may not be sensible, it's not obvious to me that it's prohibited.

   
> +	}
> +      if (!match && ent->tied_operand
> +	  /* SME LDR/STR (array vector) tie together inner immediates only.  */
> +	  && ent->iclass != sme_ldr && ent->iclass != sme_str)
> +	{
> +	  fprintf (stderr, "%s: no operands match, but tied=%u\n",
> +		   ent->name, ent->tied_operand);
> +	  ++errors;
> +	}
> +
>        *new_ent = new_opcode_node ();
>        (*new_ent)->opcode = ent->opcode;
>        (*new_ent)->mask = ent->mask;
>        (*new_ent)->index = index++;
>        new_ent = &((*new_ent)->next);
>      } while ((++ent)->name);
> +
> +  if (errors)
> +    {
> +      fprintf (stderr, "%u errors, exiting\n", errors);
> +      xexit (3);
> +    }
>  }
>  
>  static inline void
> --- a/opcodes/aarch64-tbl.h
> +++ b/opcodes/aarch64-tbl.h
> @@ -6375,7 +6375,7 @@ const struct aarch64_opcode aarch64_opco
>    SVE2p1_INSNC("fminqv",0x6417a000, 0xff3fe000, sve2_urqvs, 0, OP3 (Vd, SVE_Pg3, SVE_Zn), OP_SVE_vUS_HSD_HSD, F_OPD_SIZE, C_SCAN_MOVPRFX, 0),
>  
>    SVE2p1_INSN("dupq",0x05202400, 0xffe0fc00, sve_index1, 0, OP2 (SVE_Zd, SVE_Zn_5_INDEX), OP_SVE_VV_BHSD, 0, 0),
> -  SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 0),
> +  SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 1),
>    SVE2p1_INSNC("ld1q",0xc400a000, 0xffe0e000, sve_misc, 0, OP3 (SVE_Zt, SVE_Pg3, SVE_ADDR_ZX), OP_SVE_SZS_QD, 0, C_SCAN_MOVPRFX, 0),
>    SVE2p1_INSNC("ld2q",0xa490e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>    SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
> 

R.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen
  2024-03-20 16:51   ` Richard Earnshaw (lists)
@ 2024-03-21  7:38     ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2024-03-21  7:38 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton, Binutils

On 20.03.2024 17:51, Richard Earnshaw (lists) wrote:
> On 23/02/2024 11:29, Jan Beulich wrote:
>> Make sure that field actually matches the specified operands. Don't
>> follow existing F_PSEUDO checking in using assertions, though. Print
>> meaingful error messages, thus - while not having a line number
>> available - at least providing some indication of where things are
>> wrong.
>>
>> Fix SVE2.1's extq accordingly, but don't extend the testsuite there:
>> There are further issues with its operands (SVE_Zm_imm4 doesn't look to
>> be correct to use there, as that describes an indexed vector register,
>> while here a separate vector register and immediate operand are to be
>> specified).
>>
>> --- a/opcodes/aarch64-gen.c
>> +++ b/opcodes/aarch64-gen.c
>> @@ -129,6 +129,7 @@ read_table (const struct aarch64_opcode*
>>    const struct aarch64_opcode *ent = table;
>>    opcode_node **new_ent;
>>    unsigned int index = initialize_index (table);
>> +  unsigned int errors = 0;
>>  
>>    if (!ent->name)
>>      return;
>> @@ -140,6 +141,8 @@ read_table (const struct aarch64_opcode*
>>  
>>    do
>>      {
>> +      bool match = false;
>> +
>>        /* F_PSEUDO needs to be used together with F_ALIAS to indicate an alias
>>  	 opcode is a programmer friendly pseudo instruction available only in
>>  	 the assembly code (thus will not show up in the disassembly).  */
>> @@ -150,12 +153,45 @@ read_table (const struct aarch64_opcode*
>>  	  index++;
>>  	  continue;
>>  	}
>> +
>> +      /* Check tied_operand against operands[].  */
>> +      for (unsigned int i = 1; i < ARRAY_SIZE (ent->operands); ++i)
>> +	{
>> +	  if (ent->operands[i] == AARCH64_OPND_NIL)
>> +	    break;
>> +
>> +	  if (ent->operands[i] != ent->operands[0])
>> +	    continue;
>> +	  match = true;
>> +
>> +	  if (i != ent->tied_operand)
>> +	    {
>> +	      fprintf (stderr, "%s: operands 1 and %u match, but tied=%u\n",
>> +		       ent->name, i + 1, ent->tied_operand);
>> +	      ++errors;
>> +	    }
> 
> I'm not sure I follow this.  It looks like you're testing that if one operand is tied to operand 0, then no other operand may overlap that, eg that
> 
> 	extq z3.b, z3.b, z3.b, #5
> 
> is an illegal instruction.  But I don't see anything in the instruction description that prohibits that.  While it may not be sensible, it's not obvious to me that it's prohibited.

No, that's no what is being tested. Here we're looking at operand types, i.e.
for extq with

SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 1),

the type of the first two operands is the same, while that of the 3rd is
different. Actual register types / numbers don't come into play here.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Arm64: (mostly) SVE adjustments
  2024-03-18  8:23   ` Jan Beulich
@ 2024-05-09 14:17     ` Richard Earnshaw (lists)
  2024-05-14  6:57       ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Earnshaw (lists) @ 2024-05-09 14:17 UTC (permalink / raw)
  To: Jan Beulich, Andrew Carlotti
  Cc: Binutils, Richard Earnshaw, Marcus Shawcroft, Nick Clifton,
	Srinath Parvathaneni

On 18/03/2024 08:23, Jan Beulich wrote:
> I'm aware of aarch64 being the "arch identifier". I'm not alone though in
> preferring Arm64 in textual uses - see Linux sources for a prominent
> example.

This isn't about personal preferences, though, it's about the port name; 
and that's aarch64.

There are good technical reasons for not using arm or anything with that 
in the name in that this is an entirely separate identifier.  There are 
many configure scripts out there that match arm* (or even, in some 
cases, arm6* which was a very early implementation of the Arm 32-bit 
architecture).  Having a distinct name really helps with avoiding 
problems stemming from that.

Mixing aarch64 and arm64 in commit tags also makes it more difficult to 
identify patches relating to the port (and also adds false matches for 
those searching for the 32-bit arm port).

Anyway, please can you use the official tag name in commits in future.

R.

PS: I'd point out that the x86 port is not called 'intel' either, 
perhaps for similar reasons.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/6] Arm64: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate)
  2024-02-23 11:29 ` [PATCH 4/6] Arm64: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate) Jan Beulich
@ 2024-05-09 14:31   ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Earnshaw (lists) @ 2024-05-09 14:31 UTC (permalink / raw)
  To: Jan Beulich, Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton



On 23/02/2024 11:29, Jan Beulich wrote:
> Like their byte, half, word, and doubleword counterparts their
> immediates are multiples of 3 / 4 respectively.

OK, but please change the summary tag to aarch64.

R.

> 
> --- a/gas/testsuite/gas/aarch64/sve2p1-1.d
> +++ b/gas/testsuite/gas/aarch64/sve2p1-1.d
> @@ -1,4 +1,4 @@
> -#name: Test of SVE2.1 min max instructions.
> +#name: Test of SVE2.1 instructions
>   #as: -march=armv9.4-a+sve2p1
>   #objdump: -dr
>   
> @@ -91,15 +91,15 @@
>   .*:	6497bc10 	fminqv	v16.4s, p7, z0.s
>   .*:	c400b200 	ld1q	z0.q, p4/z, \[z16.d, x0\]
>   .*:	a49ef000 	ld2q	{z0.q, z1.q}, p4/z, \[x0, #-4, mul vl\]
> -.*:	a51ef000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, #-4, mul vl\]
> -.*:	a59ef000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, #-4, mul vl\]
> +.*:	a51ef000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, #-6, mul vl\]
> +.*:	a59ef000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, #-8, mul vl\]
>   .*:	a4a2f000 	ld2h	{z0.h-z1.h}, p4/z, \[x0, #4, mul vl\]
>   .*:	a5249000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, x4, lsl #4\]
>   .*:	a5a69000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, x6, lsl #4\]
>   .*:	e4203200 	st1q	z0.q, p4, \[z16.d, x0\]
>   .*:	e44e1000 	st2q	{z0.q, z1.q}, p4, \[x0, #-4, mul vl\]
> -.*:	e48e1000 	st3q	{z0.q, z1.q, z2.q}, p4, \[x0, #-4, mul vl\]
> -.*:	e4ce1000 	st4q	{z0.q, z1.q, z2.q, z3.q}, p4, \[x0, #-4, mul vl\]
> +.*:	e48e1000 	st3q	{z0.q, z1.q, z2.q}, p4, \[x0, #-6, mul vl\]
> +.*:	e4ce1000 	st4q	{z0.q, z1.q, z2.q, z3.q}, p4, \[x0, #-8, mul vl\]
>   .*:	e4621000 	st2q	{z0.q, z1.q}, p4, \[x0, x2, lsl #4\]
>   .*:	e4a41000 	st3q	{z0.q, z1.q, z2.q}, p4, \[x0, x4, lsl #4\]
>   .*:	e4e61000 	st4q	{z0.q, z1.q, z2.q, z3.q}, p4, \[x0, x6, lsl #4\]
> --- a/gas/testsuite/gas/aarch64/sve2p1-1.s
> +++ b/gas/testsuite/gas/aarch64/sve2p1-1.s
> @@ -92,16 +92,16 @@ fminqv v8.2d, p4, z1.d
>   fminqv v16.4s, p7, z0.s
>   ld1q Z0.Q, p4/Z, [Z16.D, x0]
>   ld2q {Z0.Q, Z1.Q}, p4/Z, [x0,  #-4, MUL VL]
> -ld3q {Z0.Q, Z1.Q, Z2.Q}, p4/Z, [x0,  #-4, MUL VL]
> -ld4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4/Z, [x0,  #-4, MUL VL]
> +ld3q {Z0.Q, Z1.Q, Z2.Q}, p4/Z, [x0,  #-6, MUL VL]
> +ld4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4/Z, [x0,  #-8, MUL VL]
>   ld2q {Z0.Q, Z1.Q}, p4/Z, [x0, x2, lsl  #4]
>   ld3q {Z0.Q, Z1.Q, Z2.Q}, p4/Z, [x0, x4, lsl  #4]
>   ld4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4/Z, [x0, x6, lsl  #4]
>   
>   st1q Z0.Q, p4, [Z16.D, x0]
>   st2q {Z0.Q, Z1.Q}, p4, [x0,  #-4, MUL VL]
> -st3q {Z0.Q, Z1.Q, Z2.Q}, p4, [x0,  #-4, MUL VL]
> -st4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4, [x0,  #-4, MUL VL]
> +st3q {Z0.Q, Z1.Q, Z2.Q}, p4, [x0,  #-6, MUL VL]
> +st4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4, [x0,  #-8, MUL VL]
>   st2q {Z0.Q, Z1.Q}, p4, [x0, x2, lsl  #4]
>   st3q {Z0.Q, Z1.Q, Z2.Q}, p4, [x0, x4, lsl  #4]
>   st4q {Z0.Q, Z1.Q, Z2.Q, Z3.Q}, p4, [x0, x6, lsl  #4]
> --- a/gas/testsuite/gas/aarch64/sve2p1-1-bad.l
> +++ b/gas/testsuite/gas/aarch64/sve2p1-1-bad.l
> @@ -82,15 +82,15 @@
>   .*: Error: selected processor does not support `fminqv v16.4s,p7,z0.s'
>   .*: Error: selected processor does not support `ld1q Z0.Q,p4/Z,\[Z16.D,x0\]'
>   .*: Error: selected processor does not support `ld2q {Z0.Q,Z1.Q},p4/Z,\[x0,#-4,MUL VL\]'
> -.*: Error: selected processor does not support `ld3q {Z0.Q,Z1.Q,Z2.Q},p4/Z,\[x0,#-4,MUL VL\]'
> -.*: Error: selected processor does not support `ld4q {Z0.Q,Z1.Q,Z2.Q,Z3.Q},p4/Z,\[x0,#-4,MUL VL\]'
> +.*: Error: selected processor does not support `ld3q .*
> +.*: Error: selected processor does not support `ld4q .*
>   .*: Error: selected processor does not support `ld2q {Z0.Q,Z1.Q},p4/Z,\[x0,x2,lsl#4\]'
>   .*: Error: selected processor does not support `ld3q {Z0.Q,Z1.Q,Z2.Q},p4/Z,\[x0,x4,lsl#4\]'
>   .*: Error: selected processor does not support `ld4q {Z0.Q,Z1.Q,Z2.Q,Z3.Q},p4/Z,\[x0,x6,lsl#4\]'
>   .*: Error: selected processor does not support `st1q Z0.Q,p4,\[Z16.D,x0\]'
>   .*: Error: selected processor does not support `st2q {Z0.Q,Z1.Q},p4,\[x0,#-4,MUL VL\]'
> -.*: Error: selected processor does not support `st3q {Z0.Q,Z1.Q,Z2.Q},p4,\[x0,#-4,MUL VL\]'
> -.*: Error: selected processor does not support `st4q {Z0.Q,Z1.Q,Z2.Q,Z3.Q},p4,\[x0,#-4,MUL VL\]'
> +.*: Error: selected processor does not support `st3q .*
> +.*: Error: selected processor does not support `st4q .*
>   .*: Error: selected processor does not support `st2q {Z0.Q,Z1.Q},p4,\[x0,x2,lsl#4\]'
>   .*: Error: selected processor does not support `st3q {Z0.Q,Z1.Q,Z2.Q},p4,\[x0,x4,lsl#4\]'
>   .*: Error: selected processor does not support `st4q {Z0.Q,Z1.Q,Z2.Q,Z3.Q},p4,\[x0,x6,lsl#4\]'
> --- a/opcodes/aarch64-tbl.h
> +++ b/opcodes/aarch64-tbl.h
> @@ -6378,16 +6378,16 @@ const struct aarch64_opcode aarch64_opco
>     SVE2p1_INSN("extq",0x05602400, 0xfff0fc00, sve_misc, 0, OP3 (SVE_Zd, SVE_Zd, SVE_Zm_imm4), OP_SVE_BBB, 0, 1),
>     SVE2p1_INSNC("ld1q",0xc400a000, 0xffe0e000, sve_misc, 0, OP3 (SVE_Zt, SVE_Pg3, SVE_ADDR_ZX), OP_SVE_SZS_QD, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("ld2q",0xa490e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
> -  SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
> -  SVE2p1_INSNC("ld4q",0xa590e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
> +  SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x3xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
> +  SVE2p1_INSNC("ld4q",0xa590e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x4xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("ld2q",0xa4a0e000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("ld3q",0xa5208000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("ld4q",0xa5a08000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>   
>     SVE2p1_INSNC("st1q",0xe4202000, 0xffe0e000, sve_misc, 0, OP3 (SVE_Zt, SVE_Pg3, SVE_ADDR_ZX), OP_SVE_SUS_QD, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("st2q",0xe4400000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
> -  SVE2p1_INSNC("st3q",0xe4800000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
> -  SVE2p1_INSNC("st4q",0xe4c00000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
> +  SVE2p1_INSNC("st3q",0xe4800000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x3xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
> +  SVE2p1_INSNC("st4q",0xe4c00000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x4xVL), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("st2q",0xe4600000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("st3q",0xe4a00000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("st4q",0xe4e00000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QUU, 0, C_SCAN_MOVPRFX, 0),
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] Arm64: correct SVE2.1 ld2q (scalar plus scalar)
  2024-02-23 11:30 ` [PATCH 5/6] Arm64: correct SVE2.1 ld2q (scalar plus scalar) Jan Beulich
@ 2024-05-09 14:34   ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Earnshaw (lists) @ 2024-05-09 14:34 UTC (permalink / raw)
  To: Jan Beulich, Binutils; +Cc: Richard Earnshaw, Marcus Shawcroft, Nick Clifton



On 23/02/2024 11:30, Jan Beulich wrote:
> It's opcode was wrong, as was e.g. easily visible from the inappropriate
> testcase expectation.

OK with fixed tag in commit summary.

R.

> 
> --- a/gas/testsuite/gas/aarch64/sve2p1-1.d
> +++ b/gas/testsuite/gas/aarch64/sve2p1-1.d
> @@ -93,7 +93,7 @@
>   .*:	a49ef000 	ld2q	{z0.q, z1.q}, p4/z, \[x0, #-4, mul vl\]
>   .*:	a51ef000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, #-6, mul vl\]
>   .*:	a59ef000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, #-8, mul vl\]
> -.*:	a4a2f000 	ld2h	{z0.h-z1.h}, p4/z, \[x0, #4, mul vl\]
> +.*:	a4a29000 	ld2q	{z0.q, z1.q}, p4/z, \[x0, x2, lsl #4\]
>   .*:	a5249000 	ld3q	{z0.q, z1.q, z2.q}, p4/z, \[x0, x4, lsl #4\]
>   .*:	a5a69000 	ld4q	{z0.q, z1.q, z2.q, z3.q}, p4/z, \[x0, x6, lsl #4\]
>   .*:	e4203200 	st1q	z0.q, p4, \[z16.d, x0\]
> --- a/opcodes/aarch64-tbl.h
> +++ b/opcodes/aarch64-tbl.h
> @@ -6380,7 +6380,7 @@ const struct aarch64_opcode aarch64_opco
>     SVE2p1_INSNC("ld2q",0xa490e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RI_S4x2xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("ld3q",0xa510e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RI_S4x3xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("ld4q",0xa590e000, 0xfff0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RI_S4x4xVL), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
> -  SVE2p1_INSNC("ld2q",0xa4a0e000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
> +  SVE2p1_INSNC("ld2q",0xa4a08000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt2, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("ld3q",0xa5208000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt3, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>     SVE2p1_INSNC("ld4q",0xa5a08000, 0xffe0e000, sve_misc, 0, OP3 (SME_Zt4, SVE_Pg3, SVE_ADDR_RR_LSL4), OP_SVE_QZU, 0, C_SCAN_MOVPRFX, 0),
>   
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] Arm64: (mostly) SVE adjustments
  2024-05-09 14:17     ` Richard Earnshaw (lists)
@ 2024-05-14  6:57       ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2024-05-14  6:57 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Binutils, Richard Earnshaw, Marcus Shawcroft, Nick Clifton,
	Srinath Parvathaneni, Andrew Carlotti

On 09.05.2024 16:17, Richard Earnshaw (lists) wrote:
> 
> 
> On 18/03/2024 08:23, Jan Beulich wrote:
>> I'm aware of aarch64 being the "arch identifier". I'm not alone though in
>> preferring Arm64 in textual uses - see Linux sources for a prominent
>> example.
> 
> This isn't about personal preferences, though, it's about the port name; 
> and that's aarch64.
> 
> There are good technical reasons for not using arm or anything with that 
> in the name in that this is an entirely separate identifier.  There are 
> many configure scripts out there that match arm* (or even, in some 
> cases, arm6* which was a very early implementation of the Arm 32-bit 
> architecture).  Having a distinct name really helps with avoiding 
> problems stemming from that.
> 
> Mixing aarch64 and arm64 in commit tags also makes it more difficult to 
> identify patches relating to the port (and also adds false matches for 
> those searching for the 32-bit arm port).
> 
> Anyway, please can you use the official tag name in commits in future.

I'll try to keep that in mind, sure.

> PS: I'd point out that the x86 port is not called 'intel' either, 
> perhaps for similar reasons.

I find this comparison odd: There are various x86 parts from other vendors.
x86-64 wasn't even "invented" by Intel.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-05-14  6:57 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-23 11:26 [PATCH 0/6] Arm64: (mostly) SVE adjustments Jan Beulich
2024-02-23 11:28 ` [PATCH 1/6] Arm64: correct B16B16 indexed bf{mla,mls,mul} Jan Beulich
2024-03-20 15:54   ` Richard Earnshaw (lists)
2024-03-20 16:09     ` Jan Beulich
2024-02-23 11:28 ` [PATCH 2/6] Arm64: check matching operands for predicated B16B16 insns Jan Beulich
2024-03-20 16:19   ` Richard Earnshaw (lists)
2024-02-23 11:29 ` [PATCH 3/6] Arm64: check tied operand specifier in aarch64-gen Jan Beulich
2024-03-15 16:09   ` Andrew Carlotti
2024-03-18  8:35     ` Jan Beulich
2024-03-20 16:51   ` Richard Earnshaw (lists)
2024-03-21  7:38     ` Jan Beulich
2024-02-23 11:29 ` [PATCH 4/6] Arm64: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate) Jan Beulich
2024-05-09 14:31   ` Richard Earnshaw (lists)
2024-02-23 11:30 ` [PATCH 5/6] Arm64: correct SVE2.1 ld2q (scalar plus scalar) Jan Beulich
2024-05-09 14:34   ` Richard Earnshaw (lists)
2024-02-23 11:30 ` [PATCH 6/6] gas/NEWS: drop mention of Arm64's SVE2.1 and SME2.1 Jan Beulich
2024-03-15 16:20 ` [PATCH 0/6] Arm64: (mostly) SVE adjustments Andrew Carlotti
2024-03-18  8:23   ` Jan Beulich
2024-05-09 14:17     ` Richard Earnshaw (lists)
2024-05-14  6:57       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).