public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] aarch64: Add initial support for +fp8 arch extensions
@ 2024-07-26 16:32 Claudio Bantaloukas
  2024-07-26 16:32 ` [PATCH v3 1/3] aarch64: Add march flags " Claudio Bantaloukas
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Claudio Bantaloukas @ 2024-07-26 16:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudio Bantaloukas


This series introduces initial flags and functionality for the fp8 feature.

Specifically, the following are added:
- functions that enable constructing valid fpm register values.
- support for the '+fp8' -march modifier.
- support for reading and writing the new system register FPMR (Floating Point Mode
  Register) which configures the new FP8 features

Tested against aarch64-unknown-linux-gnu.

V1 of this patch series had "aarch64: Add march flags for +fp8 arch extensions" as
cover letter title. Since then, changes in V2 are:

aarch64: Add march flags for +fp8 arch extensions
- Removed __ARM_FEATURE_FP8 define: will be added once the relevant features are in.
- Some unnecessary whitespace changes were removed.
- Helper function names now begin with __arm.

aarch64: Add support for moving fpm system register
- Removed a misleading comment.
- Removed unnecessary modifier in .md

aarch64: Add fpm register helper functions.
- Helper functions and fpm_t types are available unconditionally when including arm_acle.h

Changes in V3 are:

aarch64: Add march flags for +fp8 arch extensions
- removed unnecessary check-function-bodies check

aarch64: Add support for moving fpm system register
- added check-function-bodies check

aarch64: Add fpm register helper functions.
- moved fp8 types and helper functions into a new private header file arm_private_fp8.h
- arm_neon.h and arm_sve.h now include the new header
- added tests that check the helpers are available when including arm_neon.h
  arm_sve.h or arm_sme.h 

Is this ok for master? I do not have merge permissions. Can someone merge this for me please?

Thanks,
Claudio Bantaloukas


Claudio Bantaloukas (3):
  aarch64: Add march flags for +fp8 arch extensions
  aarch64: Add support for moving fpm system register
  aarch64: Add fpm register helper functions.

 gcc/config.gcc                                |   2 +-
 .../aarch64/aarch64-option-extensions.def     |   2 +
 gcc/config/aarch64/aarch64.cc                 |   8 ++
 gcc/config/aarch64/aarch64.h                  |  17 ++-
 gcc/config/aarch64/aarch64.md                 |  30 +++--
 gcc/config/aarch64/arm_neon.h                 |   1 +
 gcc/config/aarch64/arm_private_fp8.h          |  80 +++++++++++
 gcc/config/aarch64/arm_sve.h                  |   1 +
 gcc/config/aarch64/constraints.md             |   3 +
 gcc/doc/invoke.texi                           |   2 +
 .../aarch64/acle/fp8-helpers-neon.c           |  53 ++++++++
 .../gcc.target/aarch64/acle/fp8-helpers-sme.c |  12 ++
 .../gcc.target/aarch64/acle/fp8-helpers-sve.c |  12 ++
 gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 124 ++++++++++++++++++
 14 files changed, 332 insertions(+), 15 deletions(-)
 create mode 100644 gcc/config/aarch64/arm_private_fp8.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/3] aarch64: Add march flags for +fp8 arch extensions
  2024-07-26 16:32 [PATCH v3 0/3] aarch64: Add initial support for +fp8 arch extensions Claudio Bantaloukas
@ 2024-07-26 16:32 ` Claudio Bantaloukas
  2024-07-29  7:30   ` Kyrylo Tkachov
  2024-07-26 16:32 ` [PATCH v3 2/3] aarch64: Add support for moving fpm system register Claudio Bantaloukas
  2024-07-26 16:32 ` [PATCH v3 3/3] aarch64: Add fpm register helper functions Claudio Bantaloukas
  2 siblings, 1 reply; 9+ messages in thread
From: Claudio Bantaloukas @ 2024-07-26 16:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudio Bantaloukas

[-- Attachment #1: Type: text/plain, Size: 751 bytes --]


This introduces the relevant flags to enable access to the fpmr register and fp8 intrinsics, which will be added subsequently.

gcc/ChangeLog:

	* config/aarch64/aarch64-option-extensions.def (fp8): New.
	* config/aarch64/aarch64.h (TARGET_FP8): Likewise.
	* doc/invoke.texi (AArch64 Options): Document new -march flags
	and extensions.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/acle/fp8.c: New test.
---
 .../aarch64/aarch64-option-extensions.def     |  2 ++
 gcc/config/aarch64/aarch64.h                  |  3 +++
 gcc/doc/invoke.texi                           |  2 ++
 gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 20 +++++++++++++++++++
 4 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v3-0001-aarch64-Add-march-flags-for-fp8-arch-extensions.patch --]
[-- Type: text/x-patch; name="v3-0001-aarch64-Add-march-flags-for-fp8-arch-extensions.patch", Size: 2262 bytes --]

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 42ec0eec31e..6998627f377 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
 
 AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
 
+AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8")
+
 #undef AARCH64_OPT_FMV_EXTENSION
 #undef AARCH64_OPT_EXTENSION
 #undef AARCH64_FMV_FEATURE
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b7e330438d9..2e75c6b81e2 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -463,6 +463,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
 				 && (aarch64_tune_params.extra_tuning_flags \
 				     & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
 
+/* fp8 instructions are enabled through +fp8.  */
+#define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
+
 /* Standard register usage.  */
 
 /* 31 64-bit general purpose registers R0-R30:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9fb0925ed29..7cbcd8ad1b4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21848,6 +21848,8 @@ Enable support for Armv9.4-a Guarded Control Stack extension.
 Enable support for Armv8.9-a/9.4-a translation hardening extension.
 @item rcpc3
 Enable the RCpc3 (Release Consistency) extension.
+@item fp8
+Enable the fp8 (8-bit floating point) extension.
 
 @end table
 
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
new file mode 100644
index 00000000000..459442be155
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
@@ -0,0 +1,20 @@
+/* Test the fp8 ACLE intrinsics family.  */
+/* { dg-do compile } */
+/* { dg-options "-O1 -march=armv8-a" } */
+
+#include <arm_acle.h>
+
+#ifdef __ARM_FEATURE_FP8
+#error "__ARM_FEATURE_FP8 feature macro defined."
+#endif
+
+#pragma GCC push_options
+#pragma GCC target("arch=armv9.4-a+fp8")
+
+/* We do not define __ARM_FEATURE_FP8 until all
+   relevant features have been added. */
+#ifdef __ARM_FEATURE_FP8
+#error "__ARM_FEATURE_FP8 feature macro defined."
+#endif
+
+#pragma GCC pop_options

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/3] aarch64: Add support for moving fpm system register
  2024-07-26 16:32 [PATCH v3 0/3] aarch64: Add initial support for +fp8 arch extensions Claudio Bantaloukas
  2024-07-26 16:32 ` [PATCH v3 1/3] aarch64: Add march flags " Claudio Bantaloukas
@ 2024-07-26 16:32 ` Claudio Bantaloukas
  2024-07-29 12:13   ` Richard Sandiford
  2024-07-26 16:32 ` [PATCH v3 3/3] aarch64: Add fpm register helper functions Claudio Bantaloukas
  2 siblings, 1 reply; 9+ messages in thread
From: Claudio Bantaloukas @ 2024-07-26 16:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudio Bantaloukas

[-- Attachment #1: Type: text/plain, Size: 2413 bytes --]


Unlike most system registers, fpmr can be heavily written to in code that
exercises the fp8 functionality. That is because every fp8 instrinsic call
can potentially change the value of fpmr.
Rather than just use a an unspec, we treat the fpmr system register like
all other registers and use a move operation to read and write to it.

We introduce a new class of moveable system registers that, currently,
only accepts fpmr and a new constraint, Umv, that allows us to
selectively use mrs and msr instructions when expanding rtl for them.
Given that there is code that depends on "real" registers coming before
"fake" ones, we introduce a new constant FPM_REGNUM that uses an
existing value and renumber registers below that.
This requires us to update the bitmaps that describe which registers
belong to each register class.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
	support for MOVEABLE_SYSREGS class.
	(aarch64_hard_regno_mode_ok): Allow reads and writes to fpmr.
	(aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
	(aarch64_class_max_nregs): Likewise.
	* config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
	(CALL_REALLY_USED_REGISTERS): Likewise.
	(REGISTER_NAMES): Likewise.
	(enum reg_class): Add MOVEABLE_SYSREGS class.
	(REG_CLASS_NAMES): Likewise.
	(REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
	the new MOVEABLE_REGS class and renumbering of registers.
	* config/aarch64/aarch64.md: (FPM_REGNUM): added new register
	number, reusing old value.
	(FFR_REGNUM): Renumber.
	(FFRT_REGNUM): Likewise.
	(LOWERING_REGNUM): Likewise.
	(TPIDR2_BLOCK_REGNUM): Likewise.
	(SME_STATE_REGNUM): Likewise.
	(TPIDR2_SETUP_REGNUM): Likewise.
	(ZA_FREE_REGNUM): Likewise.
	(ZA_SAVED_REGNUM): Likewise.
	(ZA_REGNUM): Likewise.
	(ZT0_REGNUM): Likewise.
	(*mov<mode>_aarch64): Add support for moveable sysregs.
	(*movsi_aarch64): Likewise.
	(*movdi_aarch64): Likewise.
	* config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/acle/fp8.c: New tests.
---
 gcc/config/aarch64/aarch64.cc               |   8 ++
 gcc/config/aarch64/aarch64.h                |  14 ++-
 gcc/config/aarch64/aarch64.md               |  30 ++++--
 gcc/config/aarch64/constraints.md           |   3 +
 gcc/testsuite/gcc.target/aarch64/acle/fp8.c | 104 ++++++++++++++++++++
 5 files changed, 145 insertions(+), 14 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v3-0002-aarch64-Add-support-for-moving-fpm-system-registe.patch --]
[-- Type: text/x-patch; name="v3-0002-aarch64-Add-support-for-moving-fpm-system-registe.patch", Size: 10777 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e0cf382998c..9810f2c0390 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -2018,6 +2018,7 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode)
     case PR_HI_REGS:
       return mode == VNx32BImode ? 2 : 1;
 
+    case MOVEABLE_SYSREGS:
     case FFR_REGS:
     case PR_AND_FFR_REGS:
     case FAKE_REGS:
@@ -2045,6 +2046,9 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
     /* This must have the same size as _Unwind_Word.  */
     return mode == DImode;
 
+  if (regno == FPM_REGNUM)
+    return mode == QImode || mode == HImode || mode == SImode || mode == DImode;
+
   unsigned int vec_flags = aarch64_classify_vector_mode (mode);
   if (vec_flags == VEC_SVE_PRED)
     return pr_or_ffr_regnum_p (regno);
@@ -12680,6 +12684,9 @@ aarch64_regno_regclass (unsigned regno)
   if (PR_REGNUM_P (regno))
     return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
 
+  if (regno == FPM_REGNUM)
+    return MOVEABLE_SYSREGS;
+
   if (regno == FFR_REGNUM || regno == FFRT_REGNUM)
     return FFR_REGS;
 
@@ -13068,6 +13075,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
     case PR_HI_REGS:
       return mode == VNx32BImode ? 2 : 1;
 
+    case MOVEABLE_SYSREGS:
     case STACK_REG:
     case FFR_REGS:
     case PR_AND_FFR_REGS:
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 2e75c6b81e2..2dfb999bea5 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -523,6 +523,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
     1, 1, 1, 1,			/* SFP, AP, CC, VG */	\
     0, 0, 0, 0,   0, 0, 0, 0,   /* P0 - P7 */           \
     0, 0, 0, 0,   0, 0, 0, 0,   /* P8 - P15 */          \
+    1,				/* FPMR */		\
     1, 1,			/* FFR and FFRT */	\
     1, 1, 1, 1, 1, 1, 1, 1	/* Fake registers */	\
   }
@@ -547,6 +548,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
     1, 1, 1, 0,			/* SFP, AP, CC, VG */	\
     1, 1, 1, 1,   1, 1, 1, 1,	/* P0 - P7 */		\
     1, 1, 1, 1,   1, 1, 1, 1,	/* P8 - P15 */		\
+    1,				/* FPMR */		\
     1, 1,			/* FFR and FFRT */	\
     0, 0, 0, 0, 0, 0, 0, 0	/* Fake registers */	\
   }
@@ -564,6 +566,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
     "sfp", "ap",  "cc",  "vg",					\
     "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",	\
     "p8",  "p9",  "p10", "p11", "p12", "p13", "p14", "p15",	\
+    "fpmr",							\
     "ffr", "ffrt",						\
     "lowering", "tpidr2_block", "sme_state", "tpidr2_setup",	\
     "za_free", "za_saved", "za", "zt0"				\
@@ -775,6 +778,7 @@ enum reg_class
   PR_REGS,
   FFR_REGS,
   PR_AND_FFR_REGS,
+  MOVEABLE_SYSREGS,
   FAKE_REGS,
   ALL_REGS,
   LIM_REG_CLASSES		/* Last */
@@ -801,6 +805,7 @@ enum reg_class
   "PR_REGS",					\
   "FFR_REGS",					\
   "PR_AND_FFR_REGS",				\
+  "MOVEABLE_SYSREGS",				\
   "FAKE_REGS",					\
   "ALL_REGS"					\
 }
@@ -822,10 +827,11 @@ enum reg_class
   { 0x00000000, 0x00000000, 0x00000ff0 },	/* PR_LO_REGS */	\
   { 0x00000000, 0x00000000, 0x000ff000 },	/* PR_HI_REGS */	\
   { 0x00000000, 0x00000000, 0x000ffff0 },	/* PR_REGS */		\
-  { 0x00000000, 0x00000000, 0x00300000 },	/* FFR_REGS */		\
-  { 0x00000000, 0x00000000, 0x003ffff0 },	/* PR_AND_FFR_REGS */	\
-  { 0x00000000, 0x00000000, 0x3fc00000 },	/* FAKE_REGS */		\
-  { 0xffffffff, 0xffffffff, 0x000fffff }	/* ALL_REGS */		\
+  { 0x00000000, 0x00000000, 0x00600000 },	/* FFR_REGS */		\
+  { 0x00000000, 0x00000000, 0x006ffff0 },	/* PR_AND_FFR_REGS */	\
+  { 0x00000000, 0x00000000, 0x00100000 },	/* MOVEABLE_SYSREGS */	\
+  { 0x00000000, 0x00000000, 0x7f800000 },	/* FAKE_REGS */		\
+  { 0xffffffff, 0xffffffff, 0x001fffff }	/* ALL_REGS */		\
 }
 
 #define REGNO_REG_CLASS(REGNO)	aarch64_regno_regclass (REGNO)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 94ff0eefa77..22e57ee7ccf 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -107,10 +107,14 @@ (define_constants
     (P14_REGNUM		82)
     (P15_REGNUM		83)
     (LAST_SAVED_REGNUM	83)
-    (FFR_REGNUM		84)
+
+    ;; Floating Point Mode Register, used in FP8 insns.
+    (FPM_REGNUM		84)
+
+    (FFR_REGNUM		85)
     ;; "FFR token": a fake register used for representing the scheduling
     ;; restrictions on FFR-related operations.
-    (FFRT_REGNUM	85)
+    (FFRT_REGNUM	86)
 
     ;; ----------------------------------------------------------------
     ;; Fake registers
@@ -122,17 +126,17 @@ (define_constants
     ;; ABI-related lowering is needed.  These placeholders read and
     ;; write this register.  Instructions that depend on the lowering
     ;; read the register.
-    (LOWERING_REGNUM 86)
+    (LOWERING_REGNUM 87)
 
     ;; Represents the contents of the current function's TPIDR2 block,
     ;; in abstract form.
-    (TPIDR2_BLOCK_REGNUM 87)
+    (TPIDR2_BLOCK_REGNUM 88)
 
     ;; Holds the value that the current function wants PSTATE.ZA to be.
     ;; The actual value can sometimes vary, because it does not track
     ;; changes to PSTATE.ZA that happen during a lazy save and restore.
     ;; Those effects are instead tracked by ZA_SAVED_REGNUM.
-    (SME_STATE_REGNUM 88)
+    (SME_STATE_REGNUM 89)
 
     ;; Instructions write to this register if they set TPIDR2_EL0 to a
     ;; well-defined value.  Instructions read from the register if they
@@ -140,14 +144,14 @@ (define_constants
     ;;
     ;; The register does not model the architected TPIDR2_ELO, just the
     ;; current function's management of it.
-    (TPIDR2_SETUP_REGNUM 89)
+    (TPIDR2_SETUP_REGNUM 90)
 
     ;; Represents the property "has an incoming lazy save been committed?".
-    (ZA_FREE_REGNUM 90)
+    (ZA_FREE_REGNUM 91)
 
     ;; Represents the property "are the current function's ZA contents
     ;; stored in the lazy save buffer, rather than in ZA itself?".
-    (ZA_SAVED_REGNUM 91)
+    (ZA_SAVED_REGNUM 92)
 
     ;; Represents the contents of the current function's ZA state in
     ;; abstract form.  At various times in the function, these contents
@@ -155,10 +159,10 @@ (define_constants
     ;;
     ;; The contents persist even when the architected ZA is off.  Private-ZA
     ;; functions have no effect on its contents.
-    (ZA_REGNUM 92)
+    (ZA_REGNUM 93)
 
     ;; Similarly represents the contents of the current function's ZT0 state.
-    (ZT0_REGNUM 93)
+    (ZT0_REGNUM 94)
 
     (FIRST_FAKE_REGNUM	LOWERING_REGNUM)
     (LAST_FAKE_REGNUM	ZT0_REGNUM)
@@ -1405,6 +1409,8 @@ (define_insn "*mov<mode>_aarch64"
      [w, r Z  ; neon_from_gp<q>, nosimd     ] fmov\t%s0, %w1
      [w, w    ; neon_dup       , simd       ] dup\t%<Vetype>0, %1.<v>[0]
      [w, w    ; neon_dup       , nosimd     ] fmov\t%s0, %s1
+     [Umv, r  ; mrs            , *          ] msr\t%0, %x1
+     [r, Umv  ; mrs            , *          ] mrs\t%x0, %1
   }
 )
 
@@ -1467,6 +1473,8 @@ (define_insn_and_split "*movsi_aarch64"
      [r  , w  ; f_mrc    , fp  , 4] fmov\t%w0, %s1
      [w  , w  ; fmov     , fp  , 4] fmov\t%s0, %s1
      [w  , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);
+     [Umv, r  ; mrs      , *   , 8] msr\t%0, %x1
+     [r, Umv  ; mrs      , *   , 8] mrs\t%x0, %1
   }
   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
     && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
@@ -1505,6 +1513,8 @@ (define_insn_and_split "*movdi_aarch64"
      [w, w  ; fmov     , fp  , 4] fmov\t%d0, %d1
      [w, Dd ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);
      [w, Dx ; neon_move, simd, 8] #
+     [Umv, r; mrs      , *   , 8] msr\t%0, %1
+     [r, Umv; mrs      , *   , 8] mrs\t%0, %1
   }
   "CONST_INT_P (operands[1])
    && REG_P (operands[0])
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index a2569cea510..0c81fb28f7e 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -77,6 +77,9 @@ (define_register_constraint "Upl" "PR_LO_REGS"
 (define_register_constraint "Uph" "PR_HI_REGS"
   "SVE predicate registers p8 - p15.")
 
+(define_register_constraint "Umv" "MOVEABLE_SYSREGS"
+  "@internal System Registers suitable for moving rather than requiring an unspec msr")
+
 (define_constraint "c"
  "@internal The condition code register."
   (match_operand 0 "cc_register"))
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
index 459442be155..1a5c3d7e8fd 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
@@ -1,6 +1,7 @@
 /* Test the fp8 ACLE intrinsics family.  */
 /* { dg-do compile } */
 /* { dg-options "-O1 -march=armv8-a" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
 
 #include <arm_acle.h>
 
@@ -17,4 +18,107 @@
 #error "__ARM_FEATURE_FP8 feature macro defined."
 #endif
 
+/*
+**test_write_fpmr_sysreg_asm_64:
+**	msr	fpmr, x0
+**	ret
+*/
+void
+test_write_fpmr_sysreg_asm_64 (uint64_t val)
+{
+  register uint64_t fpmr asm ("fpmr") = val;
+  asm volatile ("" ::"Umv"(fpmr));
+}
+
+/*
+**test_write_fpmr_sysreg_asm_32:
+**	uxtw	x0, w0
+**	msr	fpmr, x0
+**	ret
+*/
+void
+test_write_fpmr_sysreg_asm_32 (uint32_t val)
+{
+  register uint64_t fpmr asm ("fpmr") = val;
+  asm volatile ("" ::"Umv"(fpmr));
+}
+
+/*
+**test_write_fpmr_sysreg_asm_16:
+**	and	x0, x0, 65535
+**	msr	fpmr, x0
+**	ret
+*/
+void
+test_write_fpmr_sysreg_asm_16 (uint16_t val)
+{
+  register uint64_t fpmr asm ("fpmr") = val;
+  asm volatile ("" ::"Umv"(fpmr));
+}
+
+/*
+**test_write_fpmr_sysreg_asm_8:
+**	and	x0, x0, 255
+**	msr	fpmr, x0
+**	ret
+*/
+void
+test_write_fpmr_sysreg_asm_8 (uint8_t val)
+{
+  register uint64_t fpmr asm ("fpmr") = val;
+  asm volatile ("" ::"Umv"(fpmr));
+}
+
+/*
+**test_read_fpmr_sysreg_asm_64:
+**	mrs	x0, fpmr
+**	ret
+*/
+uint64_t
+test_read_fpmr_sysreg_asm_64 ()
+{
+  register uint64_t fpmr asm ("fpmr");
+  asm volatile ("" : "=Umv"(fpmr) :);
+  return fpmr;
+}
+
+/*
+**test_read_fpmr_sysreg_asm_32:
+**	mrs	x0, fpmr
+**	ret
+*/
+uint32_t
+test_read_fpmr_sysreg_asm_32 ()
+{
+  register uint32_t fpmr asm ("fpmr");
+  asm volatile ("" : "=Umv"(fpmr) :);
+  return fpmr;
+}
+
+/*
+**test_read_fpmr_sysreg_asm_16:
+**	mrs	x0, fpmr
+**	ret
+*/
+uint16_t
+test_read_fpmr_sysreg_asm_16 ()
+{
+  register uint16_t fpmr asm ("fpmr");
+  asm volatile ("" : "=Umv"(fpmr) :);
+  return fpmr;
+}
+
+/*
+**test_read_fpmr_sysreg_asm_8:
+**	mrs	x0, fpmr
+**	ret
+*/
+uint8_t
+test_read_fpmr_sysreg_asm_8 ()
+{
+  register uint8_t fpmr asm ("fpmr");
+  asm volatile ("" : "=Umv"(fpmr) :);
+  return fpmr;
+}
+
 #pragma GCC pop_options

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 3/3] aarch64: Add fpm register helper functions.
  2024-07-26 16:32 [PATCH v3 0/3] aarch64: Add initial support for +fp8 arch extensions Claudio Bantaloukas
  2024-07-26 16:32 ` [PATCH v3 1/3] aarch64: Add march flags " Claudio Bantaloukas
  2024-07-26 16:32 ` [PATCH v3 2/3] aarch64: Add support for moving fpm system register Claudio Bantaloukas
@ 2024-07-26 16:32 ` Claudio Bantaloukas
  2024-07-29  7:34   ` Kyrylo Tkachov
  2 siblings, 1 reply; 9+ messages in thread
From: Claudio Bantaloukas @ 2024-07-26 16:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: Claudio Bantaloukas

[-- Attachment #1: Type: text/plain, Size: 2390 bytes --]


The ACLE declares several helper types and functions to facilitate construction
of `fpm` arguments. These are available when one of the arm_neon.h, arm_sve.h,
or arm_sme.h headers is included. These helpers don't map to specific FP8
instructions and there's no expectation that they will produce a given code
sequence, they're just an abstraction and an aid to the programmer. Thus they are
implemented in a new header file arm_private_fp8.h
Users are not expected to include this file, as it is a mere implementation detail,
subject to change. A check is included to guard against direct inclusion.

gcc/ChangeLog:

	* config.gcc (extra_headers): Install arm_private_fp8.h.
	* config/aarch64/arm_neon.h: Include arm_private_fp8.h.
	* config/aarch64/arm_sve.h: Likewise.
	* config/aarch64/arm_private_fp8.h: New file
	(fpm_t): New type representing fpmr values.
	(enum __ARM_FPM_FORMAT): New enum representing valid fp8 formats.
	(enum __ARM_FPM_OVERFLOW): New enum representing how some fp8
	calculations work.
	(__arm_fpm_init): New.
	(__arm_set_fpm_src1_format): Likewise.
	(__arm_set_fpm_src2_format): Likewise.
	(__arm_set_fpm_dst_format): Likewise.
	(__arm_set_fpm_overflow_cvt): Likewise.
	(__arm_set_fpm_overflow_mul): Likewise.
	(__arm_set_fpm_lscale): Likewise.
	(__arm_set_fpm_lscale2): Likewise.
	(__arm_set_fpm_nscale): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/acle/fp8-helpers-neon.c: New test of fpmr helper
	functions.
	* gcc.target/aarch64/acle/fp8-helpers-sve.c: New test of fpmr helper
	functions presence.
	* gcc.target/aarch64/acle/fp8-helpers-sme.c: New test of fpmr helper
	functions presence.
---
 gcc/config.gcc                                |  2 +-
 gcc/config/aarch64/arm_neon.h                 |  1 +
 gcc/config/aarch64/arm_private_fp8.h          | 80 +++++++++++++++++++
 gcc/config/aarch64/arm_sve.h                  |  1 +
 .../aarch64/acle/fp8-helpers-neon.c           | 53 ++++++++++++
 .../gcc.target/aarch64/acle/fp8-helpers-sme.c | 12 +++
 .../gcc.target/aarch64/acle/fp8-helpers-sve.c | 12 +++
 7 files changed, 160 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/aarch64/arm_private_fp8.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v3-0003-aarch64-Add-fpm-register-helper-functions.patch --]
[-- Type: text/x-patch; name="v3-0003-aarch64-Add-fpm-register-helper-functions.patch", Size: 7697 bytes --]

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7453ade0782..a36dd1bcbc6 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -347,7 +347,7 @@ m32c*-*-*)
         ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h arm_sme.h arm_neon_sve_bridge.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h arm_sme.h arm_neon_sve_bridge.h arm_private_fp8.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index c4a09528ffd..e376685489d 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -30,6 +30,7 @@
 #pragma GCC push_options
 #pragma GCC target ("+nothing+simd")
 
+#include <arm_private_fp8.h>
 #pragma GCC aarch64 "arm_neon.h"
 
 #include <stdint.h>
diff --git a/gcc/config/aarch64/arm_private_fp8.h b/gcc/config/aarch64/arm_private_fp8.h
new file mode 100644
index 00000000000..ba93bc526c1
--- /dev/null
+++ b/gcc/config/aarch64/arm_private_fp8.h
@@ -0,0 +1,80 @@
+/* AArch64 FP8 helper functions.
+   Do not include this file directly. Use one of arm_neon.h
+   arm_sme.h arm_sve.h instead.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_PRIVATE_FP8_H
+#define _GCC_ARM_PRIVATE_FP8_H
+
+#if !defined(_AARCH64_NEON_H_) && !defined(_ARM_SVE_H_)
+#error "This file should not be used standalone. Please include arm_neon.h or arm_sve.h instead."
+#endif
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+  typedef uint64_t fpm_t;
+
+  enum __ARM_FPM_FORMAT
+  {
+    __ARM_FPM_E5M2,
+    __ARM_FPM_E4M3,
+  };
+
+  enum __ARM_FPM_OVERFLOW
+  {
+    __ARM_FPM_INFNAN,
+    __ARM_FPM_SATURATE,
+  };
+
+#define __arm_fpm_init() (0)
+
+#define __arm_set_fpm_src1_format(__fpm, __format)                             \
+  ((__fpm & ~(uint64_t)0x7) | (__format & (uint64_t)0x7))
+#define __arm_set_fpm_src2_format(__fpm, __format)                             \
+  ((__fpm & ~((uint64_t)0x7 << 3)) | ((__format & (uint64_t)0x7) << 3))
+#define __arm_set_fpm_dst_format(__fpm, __format)                              \
+  ((__fpm & ~((uint64_t)0x7 << 6)) | ((__format & (uint64_t)0x7) << 6))
+#define __arm_set_fpm_overflow_cvt(__fpm, __behaviour)                         \
+  ((__fpm & ~((uint64_t)0x1 << 15)) | ((__behaviour & (uint64_t)0x1) << 15))
+#define __arm_set_fpm_overflow_mul(__fpm, __behaviour)                         \
+  ((__fpm & ~((uint64_t)0x1 << 14)) | ((__behaviour & (uint64_t)0x1) << 14))
+#define __arm_set_fpm_lscale(__fpm, __scale)                                   \
+  ((__fpm & ~((uint64_t)0x7f << 16)) | ((__scale & (uint64_t)0x7f) << 16))
+#define __arm_set_fpm_lscale2(__fpm, __scale)                                  \
+  ((__fpm & ~((uint64_t)0x3f << 32)) | ((__scale & (uint64_t)0x3f) << 32))
+#define __arm_set_fpm_nscale(__fpm, __scale)                                   \
+  ((__fpm & ~((uint64_t)0xff << 24)) | ((__scale & (uint64_t)0xff) << 24))
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/gcc/config/aarch64/arm_sve.h b/gcc/config/aarch64/arm_sve.h
index c2db63736a1..aa0bd9909f9 100644
--- a/gcc/config/aarch64/arm_sve.h
+++ b/gcc/config/aarch64/arm_sve.h
@@ -26,6 +26,7 @@
 #define _ARM_SVE_H_
 
 #include <stdint.h>
+#include <arm_private_fp8.h>
 #include <arm_bf16.h>
 
 typedef __fp16 float16_t;
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
new file mode 100644
index 00000000000..ade99557a29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
@@ -0,0 +1,53 @@
+/* Test the fp8 ACLE helper functions including that they are available.
+   unconditionally when including arm_neon.h */
+/* { dg-do compile } */
+/* { dg-options "-std=c90 -pedantic-errors -O1 -march=armv8-a" } */
+
+#include <arm_neon.h>
+
+void
+test_prepare_fpmr_sysreg ()
+{
+
+#define _S_EQ(expr, expected)                                                  \
+  _Static_assert (expr == expected, #expr " == " #expected)
+
+  _S_EQ (__arm_fpm_init (), 0);
+
+  /* Bits [2:0] */
+  _S_EQ (__arm_set_fpm_src1_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
+  _S_EQ (__arm_set_fpm_src1_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x1);
+
+  /* Bits [5:3] */
+  _S_EQ (__arm_set_fpm_src2_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
+  _S_EQ (__arm_set_fpm_src2_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x8);
+
+  /* Bits [8:6] */
+  _S_EQ (__arm_set_fpm_dst_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
+  _S_EQ (__arm_set_fpm_dst_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x40);
+
+  /* Bit 14 */
+  _S_EQ (__arm_set_fpm_overflow_mul (__arm_fpm_init (), __ARM_FPM_INFNAN), 0);
+  _S_EQ (__arm_set_fpm_overflow_mul (__arm_fpm_init (), __ARM_FPM_SATURATE),
+	 0x4000);
+
+  /* Bit 15 */
+  _S_EQ (__arm_set_fpm_overflow_cvt (__arm_fpm_init (), __ARM_FPM_INFNAN), 0);
+  _S_EQ (__arm_set_fpm_overflow_cvt (__arm_fpm_init (), __ARM_FPM_SATURATE),
+	 0x8000);
+
+  /* Bits [22:16] */
+  _S_EQ (__arm_set_fpm_lscale (__arm_fpm_init (), 0), 0);
+  _S_EQ (__arm_set_fpm_lscale (__arm_fpm_init (), 127), 0x7F0000);
+
+  /* Bits [37:32] */
+  _S_EQ (__arm_set_fpm_lscale2 (__arm_fpm_init (), 0), 0);
+  _S_EQ (__arm_set_fpm_lscale2 (__arm_fpm_init (), 63), 0x3F00000000);
+
+  /* Bits [31:24] */
+  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), 0), 0);
+  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), 127), 0x7F000000);
+  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), -128), 0x80000000);
+
+#undef _S_EQ
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
new file mode 100644
index 00000000000..5daab730fbe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
@@ -0,0 +1,12 @@
+/* Test availability of the fp8 ACLE helper functions when including arm_sme.h.
+ */
+/* { dg-do compile } */
+/* { dg-options "-std=c90 -pedantic-errors -O1 -march=armv8-a" } */
+
+#include <arm_sme.h>
+
+void
+test_fpmr_helpers_present ()
+{
+  (__arm_fpm_init ());
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
new file mode 100644
index 00000000000..99c5aa90cf4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
@@ -0,0 +1,12 @@
+/* Test availability of the fp8 ACLE helper functions when including arm_sve.h.
+ */
+/* { dg-do compile } */
+/* { dg-options "-std=c90 -pedantic-errors -O1 -march=armv8-a" } */
+
+#include <arm_sve.h>
+
+void
+test_fpmr_helpers_present ()
+{
+  (__arm_fpm_init ());
+}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] aarch64: Add march flags for +fp8 arch extensions
  2024-07-26 16:32 ` [PATCH v3 1/3] aarch64: Add march flags " Claudio Bantaloukas
@ 2024-07-29  7:30   ` Kyrylo Tkachov
  2024-07-30 13:41     ` Claudio Bantaloukas
  0 siblings, 1 reply; 9+ messages in thread
From: Kyrylo Tkachov @ 2024-07-29  7:30 UTC (permalink / raw)
  To: Claudio Bantaloukas; +Cc: gcc-patches

Hi Claudio,

> On 26 Jul 2024, at 18:32, Claudio Bantaloukas <claudio.bantaloukas@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> This introduces the relevant flags to enable access to the fpmr register and fp8 intrinsics, which will be added subsequently.
> 
> gcc/ChangeLog:
> 
>        * config/aarch64/aarch64-option-extensions.def (fp8): New.
>        * config/aarch64/aarch64.h (TARGET_FP8): Likewise.
>        * doc/invoke.texi (AArch64 Options): Document new -march flags
>        and extensions.
> 
> gcc/testsuite/ChangeLog:
> 
>        * gcc.target/aarch64/acle/fp8.c: New test.

Thanks, this looks ok to me now.
One question about the command-line flag.
FP8 defines instructions for Advanced SIMD, SVE and SME.
Is the “+fp8” option in this patch intended to combine with the +sve and +sme options to indicate the presence of these ISA-specific subsets? That is, you’re not planning to introduce something like +sve-fp8, +sme-fp8?
Kyrill


> ---
> .../aarch64/aarch64-option-extensions.def     |  2 ++
> gcc/config/aarch64/aarch64.h                  |  3 +++
> gcc/doc/invoke.texi                           |  2 ++
> gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 20 +++++++++++++++++++
> 4 files changed, 27 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> 
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
> index 42ec0eec31e..6998627f377 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
> 
> AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
> 
> +AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8")
> +
> #undef AARCH64_OPT_FMV_EXTENSION
> #undef AARCH64_OPT_EXTENSION
> #undef AARCH64_FMV_FEATURE
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index b7e330438d9..2e75c6b81e2 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -463,6 +463,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
> && (aarch64_tune_params.extra_tuning_flags \
>     & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
> 
> +/* fp8 instructions are enabled through +fp8.  */
> +#define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
> +
> /* Standard register usage.  */
> 
> /* 31 64-bit general purpose registers R0-R30:
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 9fb0925ed29..7cbcd8ad1b4 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21848,6 +21848,8 @@ Enable support for Armv9.4-a Guarded Control Stack extension.
> Enable support for Armv8.9-a/9.4-a translation hardening extension.
> @item rcpc3
> Enable the RCpc3 (Release Consistency) extension.
> +@item fp8
> +Enable the fp8 (8-bit floating point) extension.
> 
> @end table
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> new file mode 100644
> index 00000000000..459442be155
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> @@ -0,0 +1,20 @@
> +/* Test the fp8 ACLE intrinsics family.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -march=armv8-a" } */
> +
> +#include <arm_acle.h>
> +
> +#ifdef __ARM_FEATURE_FP8
> +#error "__ARM_FEATURE_FP8 feature macro defined."
> +#endif
> +
> +#pragma GCC push_options
> +#pragma GCC target("arch=armv9.4-a+fp8")
> +
> +/* We do not define __ARM_FEATURE_FP8 until all
> +   relevant features have been added. */
> +#ifdef __ARM_FEATURE_FP8
> +#error "__ARM_FEATURE_FP8 feature macro defined."
> +#endif
> +
> +#pragma GCC pop_options


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] aarch64: Add fpm register helper functions.
  2024-07-26 16:32 ` [PATCH v3 3/3] aarch64: Add fpm register helper functions Claudio Bantaloukas
@ 2024-07-29  7:34   ` Kyrylo Tkachov
  0 siblings, 0 replies; 9+ messages in thread
From: Kyrylo Tkachov @ 2024-07-29  7:34 UTC (permalink / raw)
  To: Claudio Bantaloukas; +Cc: gcc-patches

Hi Claudio,

> On 26 Jul 2024, at 18:32, Claudio Bantaloukas <claudio.bantaloukas@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> The ACLE declares several helper types and functions to facilitate construction
> of `fpm` arguments. These are available when one of the arm_neon.h, arm_sve.h,
> or arm_sme.h headers is included. These helpers don't map to specific FP8
> instructions and there's no expectation that they will produce a given code
> sequence, they're just an abstraction and an aid to the programmer. Thus they are
> implemented in a new header file arm_private_fp8.h
> Users are not expected to include this file, as it is a mere implementation detail,
> subject to change. A check is included to guard against direct inclusion.
> 
> gcc/ChangeLog:
> 
>        * config.gcc (extra_headers): Install arm_private_fp8.h.
>        * config/aarch64/arm_neon.h: Include arm_private_fp8.h.
>        * config/aarch64/arm_sve.h: Likewise.
>        * config/aarch64/arm_private_fp8.h: New file
>        (fpm_t): New type representing fpmr values.
>        (enum __ARM_FPM_FORMAT): New enum representing valid fp8 formats.
>        (enum __ARM_FPM_OVERFLOW): New enum representing how some fp8
>        calculations work.
>        (__arm_fpm_init): New.
>        (__arm_set_fpm_src1_format): Likewise.
>        (__arm_set_fpm_src2_format): Likewise.
>        (__arm_set_fpm_dst_format): Likewise.
>        (__arm_set_fpm_overflow_cvt): Likewise.
>        (__arm_set_fpm_overflow_mul): Likewise.
>        (__arm_set_fpm_lscale): Likewise.
>        (__arm_set_fpm_lscale2): Likewise.
>        (__arm_set_fpm_nscale): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>        * gcc.target/aarch64/acle/fp8-helpers-neon.c: New test of fpmr helper
>        functions.
>        * gcc.target/aarch64/acle/fp8-helpers-sve.c: New test of fpmr helper
>        functions presence.
>        * gcc.target/aarch64/acle/fp8-helpers-sme.c: New test of fpmr helper
>        functions presence.
> ---
> gcc/config.gcc                                |  2 +-
> gcc/config/aarch64/arm_neon.h                 |  1 +
> gcc/config/aarch64/arm_private_fp8.h          | 80 +++++++++++++++++++
> gcc/config/aarch64/arm_sve.h                  |  1 +
> .../aarch64/acle/fp8-helpers-neon.c           | 53 ++++++++++++
> .../gcc.target/aarch64/acle/fp8-helpers-sme.c | 12 +++
> .../gcc.target/aarch64/acle/fp8-helpers-sve.c | 12 +++
> 7 files changed, 160 insertions(+), 1 deletion(-)
> create mode 100644 gcc/config/aarch64/arm_private_fp8.h
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 7453ade0782..a36dd1bcbc6 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -347,7 +347,7 @@ m32c*-*-*)
>         ;;
> aarch64*-*-*)
> cpu_type=aarch64
> - extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h arm_sme.h arm_neon_sve_bridge.h"
> + extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h arm_sme.h arm_neon_sve_bridge.h arm_private_fp8.h"
> c_target_objs="aarch64-c.o"
> cxx_target_objs="aarch64-c.o"
> d_target_objs="aarch64-d.o"
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index c4a09528ffd..e376685489d 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -30,6 +30,7 @@
> #pragma GCC push_options
> #pragma GCC target ("+nothing+simd")
> 
> +#include <arm_private_fp8.h>
> #pragma GCC aarch64 "arm_neon.h"
> 
> #include <stdint.h>
> diff --git a/gcc/config/aarch64/arm_private_fp8.h b/gcc/config/aarch64/arm_private_fp8.h
> new file mode 100644
> index 00000000000..ba93bc526c1
> --- /dev/null
> +++ b/gcc/config/aarch64/arm_private_fp8.h
> @@ -0,0 +1,80 @@
> +/* AArch64 FP8 helper functions.
> +   Do not include this file directly. Use one of arm_neon.h
> +   arm_sme.h arm_sve.h instead.
> +
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +   Contributed by ARM Ltd.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef _GCC_ARM_PRIVATE_FP8_H
> +#define _GCC_ARM_PRIVATE_FP8_H
> +
> +#if !defined(_AARCH64_NEON_H_) && !defined(_ARM_SVE_H_)
> +#error "This file should not be used standalone. Please include arm_neon.h or arm_sve.h instead."
> +#endif

The message should also mention arm_sme.h as a potential solution (even though it implicitly includes arm_sve.h)

Ok with that change once the rest is approved.
Thanks,
Kyrill


> +
> +#include <stdint.h>
> +
> +#ifdef __cplusplus
> +extern "C"
> +{
> +#endif
> +
> +  typedef uint64_t fpm_t;
> +
> +  enum __ARM_FPM_FORMAT
> +  {
> +    __ARM_FPM_E5M2,
> +    __ARM_FPM_E4M3,
> +  };
> +
> +  enum __ARM_FPM_OVERFLOW
> +  {
> +    __ARM_FPM_INFNAN,
> +    __ARM_FPM_SATURATE,
> +  };
> +
> +#define __arm_fpm_init() (0)
> +
> +#define __arm_set_fpm_src1_format(__fpm, __format)                             \
> +  ((__fpm & ~(uint64_t)0x7) | (__format & (uint64_t)0x7))
> +#define __arm_set_fpm_src2_format(__fpm, __format)                             \
> +  ((__fpm & ~((uint64_t)0x7 << 3)) | ((__format & (uint64_t)0x7) << 3))
> +#define __arm_set_fpm_dst_format(__fpm, __format)                              \
> +  ((__fpm & ~((uint64_t)0x7 << 6)) | ((__format & (uint64_t)0x7) << 6))
> +#define __arm_set_fpm_overflow_cvt(__fpm, __behaviour)                         \
> +  ((__fpm & ~((uint64_t)0x1 << 15)) | ((__behaviour & (uint64_t)0x1) << 15))
> +#define __arm_set_fpm_overflow_mul(__fpm, __behaviour)                         \
> +  ((__fpm & ~((uint64_t)0x1 << 14)) | ((__behaviour & (uint64_t)0x1) << 14))
> +#define __arm_set_fpm_lscale(__fpm, __scale)                                   \
> +  ((__fpm & ~((uint64_t)0x7f << 16)) | ((__scale & (uint64_t)0x7f) << 16))
> +#define __arm_set_fpm_lscale2(__fpm, __scale)                                  \
> +  ((__fpm & ~((uint64_t)0x3f << 32)) | ((__scale & (uint64_t)0x3f) << 32))
> +#define __arm_set_fpm_nscale(__fpm, __scale)                                   \
> +  ((__fpm & ~((uint64_t)0xff << 24)) | ((__scale & (uint64_t)0xff) << 24))
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/gcc/config/aarch64/arm_sve.h b/gcc/config/aarch64/arm_sve.h
> index c2db63736a1..aa0bd9909f9 100644
> --- a/gcc/config/aarch64/arm_sve.h
> +++ b/gcc/config/aarch64/arm_sve.h
> @@ -26,6 +26,7 @@
> #define _ARM_SVE_H_
> 
> #include <stdint.h>
> +#include <arm_private_fp8.h>
> #include <arm_bf16.h>
> 
> typedef __fp16 float16_t;
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
> new file mode 100644
> index 00000000000..ade99557a29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
> @@ -0,0 +1,53 @@
> +/* Test the fp8 ACLE helper functions including that they are available.
> +   unconditionally when including arm_neon.h */
> +/* { dg-do compile } */
> +/* { dg-options "-std=c90 -pedantic-errors -O1 -march=armv8-a" } */
> +
> +#include <arm_neon.h>
> +
> +void
> +test_prepare_fpmr_sysreg ()
> +{
> +
> +#define _S_EQ(expr, expected)                                                  \
> +  _Static_assert (expr == expected, #expr " == " #expected)
> +
> +  _S_EQ (__arm_fpm_init (), 0);
> +
> +  /* Bits [2:0] */
> +  _S_EQ (__arm_set_fpm_src1_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
> +  _S_EQ (__arm_set_fpm_src1_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x1);
> +
> +  /* Bits [5:3] */
> +  _S_EQ (__arm_set_fpm_src2_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
> +  _S_EQ (__arm_set_fpm_src2_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x8);
> +
> +  /* Bits [8:6] */
> +  _S_EQ (__arm_set_fpm_dst_format (__arm_fpm_init (), __ARM_FPM_E5M2), 0);
> +  _S_EQ (__arm_set_fpm_dst_format (__arm_fpm_init (), __ARM_FPM_E4M3), 0x40);
> +
> +  /* Bit 14 */
> +  _S_EQ (__arm_set_fpm_overflow_mul (__arm_fpm_init (), __ARM_FPM_INFNAN), 0);
> +  _S_EQ (__arm_set_fpm_overflow_mul (__arm_fpm_init (), __ARM_FPM_SATURATE),
> + 0x4000);
> +
> +  /* Bit 15 */
> +  _S_EQ (__arm_set_fpm_overflow_cvt (__arm_fpm_init (), __ARM_FPM_INFNAN), 0);
> +  _S_EQ (__arm_set_fpm_overflow_cvt (__arm_fpm_init (), __ARM_FPM_SATURATE),
> + 0x8000);
> +
> +  /* Bits [22:16] */
> +  _S_EQ (__arm_set_fpm_lscale (__arm_fpm_init (), 0), 0);
> +  _S_EQ (__arm_set_fpm_lscale (__arm_fpm_init (), 127), 0x7F0000);
> +
> +  /* Bits [37:32] */
> +  _S_EQ (__arm_set_fpm_lscale2 (__arm_fpm_init (), 0), 0);
> +  _S_EQ (__arm_set_fpm_lscale2 (__arm_fpm_init (), 63), 0x3F00000000);
> +
> +  /* Bits [31:24] */
> +  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), 0), 0);
> +  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), 127), 0x7F000000);
> +  _S_EQ (__arm_set_fpm_nscale (__arm_fpm_init (), -128), 0x80000000);
> +
> +#undef _S_EQ
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
> new file mode 100644
> index 00000000000..5daab730fbe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
> @@ -0,0 +1,12 @@
> +/* Test availability of the fp8 ACLE helper functions when including arm_sme.h.
> + */
> +/* { dg-do compile } */
> +/* { dg-options "-std=c90 -pedantic-errors -O1 -march=armv8-a" } */
> +
> +#include <arm_sme.h>
> +
> +void
> +test_fpmr_helpers_present ()
> +{
> +  (__arm_fpm_init ());
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
> new file mode 100644
> index 00000000000..99c5aa90cf4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
> @@ -0,0 +1,12 @@
> +/* Test availability of the fp8 ACLE helper functions when including arm_sve.h.
> + */
> +/* { dg-do compile } */
> +/* { dg-options "-std=c90 -pedantic-errors -O1 -march=armv8-a" } */
> +
> +#include <arm_sve.h>
> +
> +void
> +test_fpmr_helpers_present ()
> +{
> +  (__arm_fpm_init ());
> +}


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/3] aarch64: Add support for moving fpm system register
  2024-07-26 16:32 ` [PATCH v3 2/3] aarch64: Add support for moving fpm system register Claudio Bantaloukas
@ 2024-07-29 12:13   ` Richard Sandiford
  2024-07-30 13:27     ` Claudio Bantaloukas
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2024-07-29 12:13 UTC (permalink / raw)
  To: Claudio Bantaloukas; +Cc: gcc-patches

Claudio Bantaloukas <claudio.bantaloukas@arm.com> writes:
> Unlike most system registers, fpmr can be heavily written to in code that
> exercises the fp8 functionality. That is because every fp8 instrinsic call
> can potentially change the value of fpmr.
> Rather than just use a an unspec, we treat the fpmr system register like

Typo: s/a an/an/

> all other registers and use a move operation to read and write to it.
>
> We introduce a new class of moveable system registers that, currently,
> only accepts fpmr and a new constraint, Umv, that allows us to
> selectively use mrs and msr instructions when expanding rtl for them.
> Given that there is code that depends on "real" registers coming before
> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
> existing value and renumber registers below that.
> This requires us to update the bitmaps that describe which registers
> belong to each register class.
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
> 	support for MOVEABLE_SYSREGS class.
> 	(aarch64_hard_regno_mode_ok): Allow reads and writes to fpmr.
> 	(aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
> 	(aarch64_class_max_nregs): Likewise.
> 	* config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
> 	(CALL_REALLY_USED_REGISTERS): Likewise.
> 	(REGISTER_NAMES): Likewise.
> 	(enum reg_class): Add MOVEABLE_SYSREGS class.
> 	(REG_CLASS_NAMES): Likewise.
> 	(REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
> 	the new MOVEABLE_REGS class and renumbering of registers.
> 	* config/aarch64/aarch64.md: (FPM_REGNUM): added new register
> 	number, reusing old value.
> 	(FFR_REGNUM): Renumber.
> 	(FFRT_REGNUM): Likewise.
> 	(LOWERING_REGNUM): Likewise.
> 	(TPIDR2_BLOCK_REGNUM): Likewise.
> 	(SME_STATE_REGNUM): Likewise.
> 	(TPIDR2_SETUP_REGNUM): Likewise.
> 	(ZA_FREE_REGNUM): Likewise.
> 	(ZA_SAVED_REGNUM): Likewise.
> 	(ZA_REGNUM): Likewise.
> 	(ZT0_REGNUM): Likewise.
> 	(*mov<mode>_aarch64): Add support for moveable sysregs.
> 	(*movsi_aarch64): Likewise.
> 	(*movdi_aarch64): Likewise.
> 	* config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/acle/fp8.c: New tests.
> [...]
> @@ -1405,6 +1409,8 @@ (define_insn "*mov<mode>_aarch64"
>       [w, r Z  ; neon_from_gp<q>, nosimd     ] fmov\t%s0, %w1
>       [w, w    ; neon_dup       , simd       ] dup\t%<Vetype>0, %1.<v>[0]
>       [w, w    ; neon_dup       , nosimd     ] fmov\t%s0, %s1
> +     [Umv, r  ; mrs            , *          ] msr\t%0, %x1
> +     [r, Umv  ; mrs            , *          ] mrs\t%x0, %1
>    }
>  )
>  
> @@ -1467,6 +1473,8 @@ (define_insn_and_split "*movsi_aarch64"
>       [r  , w  ; f_mrc    , fp  , 4] fmov\t%w0, %s1
>       [w  , w  ; fmov     , fp  , 4] fmov\t%s0, %s1
>       [w  , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);
> +     [Umv, r  ; mrs      , *   , 8] msr\t%0, %x1
> +     [r, Umv  ; mrs      , *   , 8] mrs\t%x0, %1

The lengths should be 4 rather than 8.

>    }
>    "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
>      && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
> @@ -1505,6 +1513,8 @@ (define_insn_and_split "*movdi_aarch64"
>       [w, w  ; fmov     , fp  , 4] fmov\t%d0, %d1
>       [w, Dd ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);
>       [w, Dx ; neon_move, simd, 8] #
> +     [Umv, r; mrs      , *   , 8] msr\t%0, %1
> +     [r, Umv; mrs      , *   , 8] mrs\t%0, %1

Similarly here.

>    }
>    "CONST_INT_P (operands[1])
>     && REG_P (operands[0])
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> index 459442be155..1a5c3d7e8fd 100644
> --- a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
> @@ -1,6 +1,7 @@
>  /* Test the fp8 ACLE intrinsics family.  */
>  /* { dg-do compile } */
>  /* { dg-options "-O1 -march=armv8-a" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
>  
>  #include <arm_acle.h>
>  
> @@ -17,4 +18,107 @@
>  #error "__ARM_FEATURE_FP8 feature macro defined."
>  #endif
>  
> +/*
> +**test_write_fpmr_sysreg_asm_64:
> +**	msr	fpmr, x0
> +**	ret
> +*/
> +void
> +test_write_fpmr_sysreg_asm_64 (uint64_t val)
> +{
> +  register uint64_t fpmr asm ("fpmr") = val;
> +  asm volatile ("" ::"Umv"(fpmr));
> +}
> +
> +/*
> +**test_write_fpmr_sysreg_asm_32:
> +**	uxtw	x0, w0
> +**	msr	fpmr, x0
> +**	ret
> +*/
> +void
> +test_write_fpmr_sysreg_asm_32 (uint32_t val)
> +{
> +  register uint64_t fpmr asm ("fpmr") = val;

By using uint64_t rather than uint32_t, these tests are testing movdi
rather than the smaller move patterns.  I think it should be uint32_t
instead.  We should then have just an MSR, without an extension.

Similarly for the 16-bit and 8-bit cases.

LGTM with those changes, but please give others a day or so to comment.

Thanks,
Richard


> +  asm volatile ("" ::"Umv"(fpmr));
> +}
> +
> +/*
> +**test_write_fpmr_sysreg_asm_16:
> +**	and	x0, x0, 65535
> +**	msr	fpmr, x0
> +**	ret
> +*/
> +void
> +test_write_fpmr_sysreg_asm_16 (uint16_t val)
> +{
> +  register uint64_t fpmr asm ("fpmr") = val;
> +  asm volatile ("" ::"Umv"(fpmr));
> +}
> +
> +/*
> +**test_write_fpmr_sysreg_asm_8:
> +**	and	x0, x0, 255
> +**	msr	fpmr, x0
> +**	ret
> +*/
> +void
> +test_write_fpmr_sysreg_asm_8 (uint8_t val)
> +{
> +  register uint64_t fpmr asm ("fpmr") = val;
> +  asm volatile ("" ::"Umv"(fpmr));
> +}
> +
> +/*
> +**test_read_fpmr_sysreg_asm_64:
> +**	mrs	x0, fpmr
> +**	ret
> +*/
> +uint64_t
> +test_read_fpmr_sysreg_asm_64 ()
> +{
> +  register uint64_t fpmr asm ("fpmr");
> +  asm volatile ("" : "=Umv"(fpmr) :);
> +  return fpmr;
> +}
> +
> +/*
> +**test_read_fpmr_sysreg_asm_32:
> +**	mrs	x0, fpmr
> +**	ret
> +*/
> +uint32_t
> +test_read_fpmr_sysreg_asm_32 ()
> +{
> +  register uint32_t fpmr asm ("fpmr");
> +  asm volatile ("" : "=Umv"(fpmr) :);
> +  return fpmr;
> +}
> +
> +/*
> +**test_read_fpmr_sysreg_asm_16:
> +**	mrs	x0, fpmr
> +**	ret
> +*/
> +uint16_t
> +test_read_fpmr_sysreg_asm_16 ()
> +{
> +  register uint16_t fpmr asm ("fpmr");
> +  asm volatile ("" : "=Umv"(fpmr) :);
> +  return fpmr;
> +}
> +
> +/*
> +**test_read_fpmr_sysreg_asm_8:
> +**	mrs	x0, fpmr
> +**	ret
> +*/
> +uint8_t
> +test_read_fpmr_sysreg_asm_8 ()
> +{
> +  register uint8_t fpmr asm ("fpmr");
> +  asm volatile ("" : "=Umv"(fpmr) :);
> +  return fpmr;
> +}
> +
>  #pragma GCC pop_options

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/3] aarch64: Add support for moving fpm system register
  2024-07-29 12:13   ` Richard Sandiford
@ 2024-07-30 13:27     ` Claudio Bantaloukas
  0 siblings, 0 replies; 9+ messages in thread
From: Claudio Bantaloukas @ 2024-07-30 13:27 UTC (permalink / raw)
  To: gcc-patches, Richard Sandiford



On 29/07/2024 13:13, Richard Sandiford wrote:
> Claudio Bantaloukas <claudio.bantaloukas@arm.com> writes:
>> Unlike most system registers, fpmr can be heavily written to in code that
>> exercises the fp8 functionality. That is because every fp8 instrinsic call
>> can potentially change the value of fpmr.
>> Rather than just use a an unspec, we treat the fpmr system register like
> 
> Typo: s/a an/an/
Thanks for the catch, will repost along with the requested changes below

Cheers,
Claudio

> 
>> all other registers and use a move operation to read and write to it.
>>
>> We introduce a new class of moveable system registers that, currently,
>> only accepts fpmr and a new constraint, Umv, that allows us to
>> selectively use mrs and msr instructions when expanding rtl for them.
>> Given that there is code that depends on "real" registers coming before
>> "fake" ones, we introduce a new constant FPM_REGNUM that uses an
>> existing value and renumber registers below that.
>> This requires us to update the bitmaps that describe which registers
>> belong to each register class.
>>
>> gcc/ChangeLog:
>>
>> 	* config/aarch64/aarch64.cc (aarch64_hard_regno_nregs): Add
>> 	support for MOVEABLE_SYSREGS class.
>> 	(aarch64_hard_regno_mode_ok): Allow reads and writes to fpmr.
>> 	(aarch64_regno_regclass): Support MOVEABLE_SYSREGS class.
>> 	(aarch64_class_max_nregs): Likewise.
>> 	* config/aarch64/aarch64.h (FIXED_REGISTERS): add fpmr.
>> 	(CALL_REALLY_USED_REGISTERS): Likewise.
>> 	(REGISTER_NAMES): Likewise.
>> 	(enum reg_class): Add MOVEABLE_SYSREGS class.
>> 	(REG_CLASS_NAMES): Likewise.
>> 	(REG_CLASS_CONTENTS): Update class bitmaps to deal with fpmr,
>> 	the new MOVEABLE_REGS class and renumbering of registers.
>> 	* config/aarch64/aarch64.md: (FPM_REGNUM): added new register
>> 	number, reusing old value.
>> 	(FFR_REGNUM): Renumber.
>> 	(FFRT_REGNUM): Likewise.
>> 	(LOWERING_REGNUM): Likewise.
>> 	(TPIDR2_BLOCK_REGNUM): Likewise.
>> 	(SME_STATE_REGNUM): Likewise.
>> 	(TPIDR2_SETUP_REGNUM): Likewise.
>> 	(ZA_FREE_REGNUM): Likewise.
>> 	(ZA_SAVED_REGNUM): Likewise.
>> 	(ZA_REGNUM): Likewise.
>> 	(ZT0_REGNUM): Likewise.
>> 	(*mov<mode>_aarch64): Add support for moveable sysregs.
>> 	(*movsi_aarch64): Likewise.
>> 	(*movdi_aarch64): Likewise.
>> 	* config/aarch64/constraints.md (MOVEABLE_SYSREGS): New constraint.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/aarch64/acle/fp8.c: New tests.
>> [...]
>> @@ -1405,6 +1409,8 @@ (define_insn "*mov<mode>_aarch64"
>>        [w, r Z  ; neon_from_gp<q>, nosimd     ] fmov\t%s0, %w1
>>        [w, w    ; neon_dup       , simd       ] dup\t%<Vetype>0, %1.<v>[0]
>>        [w, w    ; neon_dup       , nosimd     ] fmov\t%s0, %s1
>> +     [Umv, r  ; mrs            , *          ] msr\t%0, %x1
>> +     [r, Umv  ; mrs            , *          ] mrs\t%x0, %1
>>     }
>>   )
>>   
>> @@ -1467,6 +1473,8 @@ (define_insn_and_split "*movsi_aarch64"
>>        [r  , w  ; f_mrc    , fp  , 4] fmov\t%w0, %s1
>>        [w  , w  ; fmov     , fp  , 4] fmov\t%s0, %s1
>>        [w  , Ds ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], SImode);
>> +     [Umv, r  ; mrs      , *   , 8] msr\t%0, %x1
>> +     [r, Umv  ; mrs      , *   , 8] mrs\t%x0, %1
> 
> The lengths should be 4 rather than 8.
> 
>>     }
>>     "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)
>>       && REG_P (operands[0]) && GP_REGNUM_P (REGNO (operands[0]))"
>> @@ -1505,6 +1513,8 @@ (define_insn_and_split "*movdi_aarch64"
>>        [w, w  ; fmov     , fp  , 4] fmov\t%d0, %d1
>>        [w, Dd ; neon_move, simd, 4] << aarch64_output_scalar_simd_mov_immediate (operands[1], DImode);
>>        [w, Dx ; neon_move, simd, 8] #
>> +     [Umv, r; mrs      , *   , 8] msr\t%0, %1
>> +     [r, Umv; mrs      , *   , 8] mrs\t%0, %1
> 
> Similarly here.
> 
>>     }
>>     "CONST_INT_P (operands[1])
>>      && REG_P (operands[0])
>> [...]
>> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> index 459442be155..1a5c3d7e8fd 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> @@ -1,6 +1,7 @@
>>   /* Test the fp8 ACLE intrinsics family.  */
>>   /* { dg-do compile } */
>>   /* { dg-options "-O1 -march=armv8-a" } */
>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>>   
>>   #include <arm_acle.h>
>>   
>> @@ -17,4 +18,107 @@
>>   #error "__ARM_FEATURE_FP8 feature macro defined."
>>   #endif
>>   
>> +/*
>> +**test_write_fpmr_sysreg_asm_64:
>> +**	msr	fpmr, x0
>> +**	ret
>> +*/
>> +void
>> +test_write_fpmr_sysreg_asm_64 (uint64_t val)
>> +{
>> +  register uint64_t fpmr asm ("fpmr") = val;
>> +  asm volatile ("" ::"Umv"(fpmr));
>> +}
>> +
>> +/*
>> +**test_write_fpmr_sysreg_asm_32:
>> +**	uxtw	x0, w0
>> +**	msr	fpmr, x0
>> +**	ret
>> +*/
>> +void
>> +test_write_fpmr_sysreg_asm_32 (uint32_t val)
>> +{
>> +  register uint64_t fpmr asm ("fpmr") = val;
> 
> By using uint64_t rather than uint32_t, these tests are testing movdi
> rather than the smaller move patterns.  I think it should be uint32_t
> instead.  We should then have just an MSR, without an extension.
> 
> Similarly for the 16-bit and 8-bit cases.
> 
> LGTM with those changes, but please give others a day or so to comment.
> 
> Thanks,
> Richard
> 
> 
>> +  asm volatile ("" ::"Umv"(fpmr));
>> +}
>> +
>> +/*
>> +**test_write_fpmr_sysreg_asm_16:
>> +**	and	x0, x0, 65535
>> +**	msr	fpmr, x0
>> +**	ret
>> +*/
>> +void
>> +test_write_fpmr_sysreg_asm_16 (uint16_t val)
>> +{
>> +  register uint64_t fpmr asm ("fpmr") = val;
>> +  asm volatile ("" ::"Umv"(fpmr));
>> +}
>> +
>> +/*
>> +**test_write_fpmr_sysreg_asm_8:
>> +**	and	x0, x0, 255
>> +**	msr	fpmr, x0
>> +**	ret
>> +*/
>> +void
>> +test_write_fpmr_sysreg_asm_8 (uint8_t val)
>> +{
>> +  register uint64_t fpmr asm ("fpmr") = val;
>> +  asm volatile ("" ::"Umv"(fpmr));
>> +}
>> +
>> +/*
>> +**test_read_fpmr_sysreg_asm_64:
>> +**	mrs	x0, fpmr
>> +**	ret
>> +*/
>> +uint64_t
>> +test_read_fpmr_sysreg_asm_64 ()
>> +{
>> +  register uint64_t fpmr asm ("fpmr");
>> +  asm volatile ("" : "=Umv"(fpmr) :);
>> +  return fpmr;
>> +}
>> +
>> +/*
>> +**test_read_fpmr_sysreg_asm_32:
>> +**	mrs	x0, fpmr
>> +**	ret
>> +*/
>> +uint32_t
>> +test_read_fpmr_sysreg_asm_32 ()
>> +{
>> +  register uint32_t fpmr asm ("fpmr");
>> +  asm volatile ("" : "=Umv"(fpmr) :);
>> +  return fpmr;
>> +}
>> +
>> +/*
>> +**test_read_fpmr_sysreg_asm_16:
>> +**	mrs	x0, fpmr
>> +**	ret
>> +*/
>> +uint16_t
>> +test_read_fpmr_sysreg_asm_16 ()
>> +{
>> +  register uint16_t fpmr asm ("fpmr");
>> +  asm volatile ("" : "=Umv"(fpmr) :);
>> +  return fpmr;
>> +}
>> +
>> +/*
>> +**test_read_fpmr_sysreg_asm_8:
>> +**	mrs	x0, fpmr
>> +**	ret
>> +*/
>> +uint8_t
>> +test_read_fpmr_sysreg_asm_8 ()
>> +{
>> +  register uint8_t fpmr asm ("fpmr");
>> +  asm volatile ("" : "=Umv"(fpmr) :);
>> +  return fpmr;
>> +}
>> +
>>   #pragma GCC pop_options

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] aarch64: Add march flags for +fp8 arch extensions
  2024-07-29  7:30   ` Kyrylo Tkachov
@ 2024-07-30 13:41     ` Claudio Bantaloukas
  0 siblings, 0 replies; 9+ messages in thread
From: Claudio Bantaloukas @ 2024-07-30 13:41 UTC (permalink / raw)
  To: Kyrylo Tkachov; +Cc: gcc-patches



On 29/07/2024 08:30, Kyrylo Tkachov wrote:
> Hi Claudio,
> 
>> On 26 Jul 2024, at 18:32, Claudio Bantaloukas <claudio.bantaloukas@arm.com> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> This introduces the relevant flags to enable access to the fpmr register and fp8 intrinsics, which will be added subsequently.
>>
>> gcc/ChangeLog:
>>
>>         * config/aarch64/aarch64-option-extensions.def (fp8): New.
>>         * config/aarch64/aarch64.h (TARGET_FP8): Likewise.
>>         * doc/invoke.texi (AArch64 Options): Document new -march flags
>>         and extensions.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.target/aarch64/acle/fp8.c: New test.
> 
> Thanks, this looks ok to me now.
> One question about the command-line flag.
> FP8 defines instructions for Advanced SIMD, SVE and SME.
> Is the “+fp8” option in this patch intended to combine with the +sve and +sme options to indicate the presence of these ISA-specific subsets? That is, you’re not planning to introduce something like +sve-fp8, +sme-fp8?
> Kyrill

Hi Kyrill, thanks!
The plan is to have more specific feature flags like +fp8fma 
+ssve-fp8fma and +sme-lutv. +fp8 will only be used for conversion and 
scaling operations and my undestanding is that it will not combine as 
you propose.

See also the relevant binutils features in 
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gas/config/tc-aarch64.c;h=e94a0cff406aaaf1800979a27991ccbb7e92e917;hb=HEAD#l10731

Cheers,
Claudio

> 
>> ---
>> .../aarch64/aarch64-option-extensions.def     |  2 ++
>> gcc/config/aarch64/aarch64.h                  |  3 +++
>> gcc/doc/invoke.texi                           |  2 ++
>> gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 20 +++++++++++++++++++
>> 4 files changed, 27 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>>
>> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
>> index 42ec0eec31e..6998627f377 100644
>> --- a/gcc/config/aarch64/aarch64-option-extensions.def
>> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
>> @@ -232,6 +232,8 @@ AARCH64_OPT_EXTENSION("the", THE, (), (), (), "the")
>>
>> AARCH64_OPT_EXTENSION("gcs", GCS, (), (), (), "gcs")
>>
>> +AARCH64_OPT_EXTENSION("fp8", FP8, (SIMD), (), (), "fp8")
>> +
>> #undef AARCH64_OPT_FMV_EXTENSION
>> #undef AARCH64_OPT_EXTENSION
>> #undef AARCH64_FMV_FEATURE
>> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> index b7e330438d9..2e75c6b81e2 100644
>> --- a/gcc/config/aarch64/aarch64.h
>> +++ b/gcc/config/aarch64/aarch64.h
>> @@ -463,6 +463,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE ATTRIBUTE_UNUSED
>> && (aarch64_tune_params.extra_tuning_flags \
>>      & AARCH64_EXTRA_TUNE_AVOID_PRED_RMW))
>>
>> +/* fp8 instructions are enabled through +fp8.  */
>> +#define TARGET_FP8 AARCH64_HAVE_ISA (FP8)
>> +
>> /* Standard register usage.  */
>>
>> /* 31 64-bit general purpose registers R0-R30:
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 9fb0925ed29..7cbcd8ad1b4 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -21848,6 +21848,8 @@ Enable support for Armv9.4-a Guarded Control Stack extension.
>> Enable support for Armv8.9-a/9.4-a translation hardening extension.
>> @item rcpc3
>> Enable the RCpc3 (Release Consistency) extension.
>> +@item fp8
>> +Enable the fp8 (8-bit floating point) extension.
>>
>> @end table
>>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/fp8.c b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> new file mode 100644
>> index 00000000000..459442be155
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/acle/fp8.c
>> @@ -0,0 +1,20 @@
>> +/* Test the fp8 ACLE intrinsics family.  */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O1 -march=armv8-a" } */
>> +
>> +#include <arm_acle.h>
>> +
>> +#ifdef __ARM_FEATURE_FP8
>> +#error "__ARM_FEATURE_FP8 feature macro defined."
>> +#endif
>> +
>> +#pragma GCC push_options
>> +#pragma GCC target("arch=armv9.4-a+fp8")
>> +
>> +/* We do not define __ARM_FEATURE_FP8 until all
>> +   relevant features have been added. */
>> +#ifdef __ARM_FEATURE_FP8
>> +#error "__ARM_FEATURE_FP8 feature macro defined."
>> +#endif
>> +
>> +#pragma GCC pop_options
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-07-30 13:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-26 16:32 [PATCH v3 0/3] aarch64: Add initial support for +fp8 arch extensions Claudio Bantaloukas
2024-07-26 16:32 ` [PATCH v3 1/3] aarch64: Add march flags " Claudio Bantaloukas
2024-07-29  7:30   ` Kyrylo Tkachov
2024-07-30 13:41     ` Claudio Bantaloukas
2024-07-26 16:32 ` [PATCH v3 2/3] aarch64: Add support for moving fpm system register Claudio Bantaloukas
2024-07-29 12:13   ` Richard Sandiford
2024-07-30 13:27     ` Claudio Bantaloukas
2024-07-26 16:32 ` [PATCH v3 3/3] aarch64: Add fpm register helper functions Claudio Bantaloukas
2024-07-29  7:34   ` Kyrylo Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).