public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] target_version and aarch64 function multiversioning
@ 2023-11-17  2:49 Andrew Carlotti
  2023-11-17  2:51 ` [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc Andrew Carlotti
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Andrew Carlotti @ 2023-11-17  2:49 UTC (permalink / raw)
  To: gcc-patches
  Cc: ebotcazou, poulhies, ibuclaw, jason, nathan, rguenther,
	richard.sandiford, richard.earnshaw

This series adds support for function multiversioning on aarch64.

Patch 1/5 is a repost of my copy of Pavel's aarch64 cpu feature detection code
to libgcc. This is slightly refactored in a later patch, but I've preserved
this patch as-is to make the attribution clearer.

Patches 2/5 and 3/5 are minor cleanups in the c-family and Ada attribute
exclusion handling, to support further tweaks to attribute exclusion handling
for c-family, Ada and D in patch 4.

Patch 4/5 adds support for the target_version attribute to the middle end and
C++ frontend, but should otherwise have no functional changes.

Patch 5/5 uses this support to implement function multiversioning in aarch64.

I plan to improve the existing documentation and tests, including covering the
new functionality, in subsequent commits (perhaps after fixing some of the
current ABI issues).

I'm happy with the state of patches 2-4. Patches 1 and 5 have various
outstanding issues, most of which require fixes to the ACLE as well.  It might
be best to push these patches in something like their current form, and then
push incremental fixes once we've agreed on the relevant specification changes.

The series passes regression testing on both x86 and aarch64 for C and C++. I
haven't got an Ada or D compiler on my build machine, so I haven't tested these
languages; however, I tested using the same code and making equivalent changes
in the C++ frontend, to verify their (minimal) impact upon attribute processing
functionality.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc
  2023-11-17  2:49 [PATCH v2 0/5] target_version and aarch64 function multiversioning Andrew Carlotti
@ 2023-11-17  2:51 ` Andrew Carlotti
  2023-11-20 15:46   ` Richard Sandiford
  2023-11-17  2:53 ` [PATCH v2 2/5] c-family: Simplify attribute exclusion handling Andrew Carlotti
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Carlotti @ 2023-11-17  2:51 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford, richard.earnshaw

This is added to enable function multiversioning, but can also be used
directly.  The interface is chosen to match that used in LLVM's
compiler-rt, to facilitate cross-compiler compatibility.

The content of the patch is derived almost entirely from Pavel's prior
contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
changes to align more closely with GCC coding style, and to exclude any code
from other LLVM contributors, and am adding this to GCC with Pavel's approval.

libgcc/ChangeLog:

	* config/aarch64/t-aarch64: Include cpuinfo.c
	* config/aarch64/cpuinfo.c: New file
	(__init_cpu_features_constructor) New.
	(__init_cpu_features_resolver) New.
	(__init_cpu_features) New.

Co-authored-by: Pavel Iliin <Pavel.Iliin@arm.com>


diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
new file mode 100644
index 0000000000000000000000000000000000000000..0888ca4ed058430f524b99cb0e204bd996fa0e55
--- /dev/null
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -0,0 +1,502 @@
+/* CPU feature detection for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+  
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#if defined(__has_include)
+#if __has_include(<sys/auxv.h>)
+#include <sys/auxv.h>
+
+#if __has_include(<sys/ifunc.h>)
+#include <sys/ifunc.h>
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+#endif
+
+#if __has_include(<asm/hwcap.h>)
+#include <asm/hwcap.h>
+
+/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
+enum CPUFeatures {
+  FEAT_RNG,
+  FEAT_FLAGM,
+  FEAT_FLAGM2,
+  FEAT_FP16FML,
+  FEAT_DOTPROD,
+  FEAT_SM4,
+  FEAT_RDM,
+  FEAT_LSE,
+  FEAT_FP,
+  FEAT_SIMD,
+  FEAT_CRC,
+  FEAT_SHA1,
+  FEAT_SHA2,
+  FEAT_SHA3,
+  FEAT_AES,
+  FEAT_PMULL,
+  FEAT_FP16,
+  FEAT_DIT,
+  FEAT_DPB,
+  FEAT_DPB2,
+  FEAT_JSCVT,
+  FEAT_FCMA,
+  FEAT_RCPC,
+  FEAT_RCPC2,
+  FEAT_FRINTTS,
+  FEAT_DGH,
+  FEAT_I8MM,
+  FEAT_BF16,
+  FEAT_EBF16,
+  FEAT_RPRES,
+  FEAT_SVE,
+  FEAT_SVE_BF16,
+  FEAT_SVE_EBF16,
+  FEAT_SVE_I8MM,
+  FEAT_SVE_F32MM,
+  FEAT_SVE_F64MM,
+  FEAT_SVE2,
+  FEAT_SVE_AES,
+  FEAT_SVE_PMULL128,
+  FEAT_SVE_BITPERM,
+  FEAT_SVE_SHA3,
+  FEAT_SVE_SM4,
+  FEAT_SME,
+  FEAT_MEMTAG,
+  FEAT_MEMTAG2,
+  FEAT_MEMTAG3,
+  FEAT_SB,
+  FEAT_PREDRES,
+  FEAT_SSBS,
+  FEAT_SSBS2,
+  FEAT_BTI,
+  FEAT_LS64,
+  FEAT_LS64_V,
+  FEAT_LS64_ACCDATA,
+  FEAT_WFXT,
+  FEAT_SME_F64,
+  FEAT_SME_I64,
+  FEAT_SME2,
+  FEAT_RCPC3,
+  FEAT_MAX,
+  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
+		    in __aarch64_cpu_features.  */
+  FEAT_INIT      /* Used as flag of features initialization completion.  */
+};
+
+/* Architecture features used in Function Multi Versioning.  */
+struct {
+  unsigned long long features;
+  /* As features grows new fields could be added.  */
+} __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
+
+#ifndef _IFUNC_ARG_HWCAP
+#define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+#ifndef AT_HWCAP
+#define AT_HWCAP 16
+#endif
+#ifndef HWCAP_CPUID
+#define HWCAP_CPUID (1 << 11)
+#endif
+#ifndef HWCAP_FP
+#define HWCAP_FP (1 << 0)
+#endif
+#ifndef HWCAP_ASIMD
+#define HWCAP_ASIMD (1 << 1)
+#endif
+#ifndef HWCAP_AES
+#define HWCAP_AES (1 << 3)
+#endif
+#ifndef HWCAP_PMULL
+#define HWCAP_PMULL (1 << 4)
+#endif
+#ifndef HWCAP_SHA1
+#define HWCAP_SHA1 (1 << 5)
+#endif
+#ifndef HWCAP_SHA2
+#define HWCAP_SHA2 (1 << 6)
+#endif
+#ifndef HWCAP_ATOMICS
+#define HWCAP_ATOMICS (1 << 8)
+#endif
+#ifndef HWCAP_FPHP
+#define HWCAP_FPHP (1 << 9)
+#endif
+#ifndef HWCAP_ASIMDHP
+#define HWCAP_ASIMDHP (1 << 10)
+#endif
+#ifndef HWCAP_ASIMDRDM
+#define HWCAP_ASIMDRDM (1 << 12)
+#endif
+#ifndef HWCAP_JSCVT
+#define HWCAP_JSCVT (1 << 13)
+#endif
+#ifndef HWCAP_FCMA
+#define HWCAP_FCMA (1 << 14)
+#endif
+#ifndef HWCAP_LRCPC
+#define HWCAP_LRCPC (1 << 15)
+#endif
+#ifndef HWCAP_DCPOP
+#define HWCAP_DCPOP (1 << 16)
+#endif
+#ifndef HWCAP_SHA3
+#define HWCAP_SHA3 (1 << 17)
+#endif
+#ifndef HWCAP_SM3
+#define HWCAP_SM3 (1 << 18)
+#endif
+#ifndef HWCAP_SM4
+#define HWCAP_SM4 (1 << 19)
+#endif
+#ifndef HWCAP_ASIMDDP
+#define HWCAP_ASIMDDP (1 << 20)
+#endif
+#ifndef HWCAP_SHA512
+#define HWCAP_SHA512 (1 << 21)
+#endif
+#ifndef HWCAP_SVE
+#define HWCAP_SVE (1 << 22)
+#endif
+#ifndef HWCAP_ASIMDFHM
+#define HWCAP_ASIMDFHM (1 << 23)
+#endif
+#ifndef HWCAP_DIT
+#define HWCAP_DIT (1 << 24)
+#endif
+#ifndef HWCAP_ILRCPC
+#define HWCAP_ILRCPC (1 << 26)
+#endif
+#ifndef HWCAP_FLAGM
+#define HWCAP_FLAGM (1 << 27)
+#endif
+#ifndef HWCAP_SSBS
+#define HWCAP_SSBS (1 << 28)
+#endif
+#ifndef HWCAP_SB
+#define HWCAP_SB (1 << 29)
+#endif
+
+#ifndef HWCAP2_DCPODP
+#define HWCAP2_DCPODP (1 << 0)
+#endif
+#ifndef HWCAP2_SVE2
+#define HWCAP2_SVE2 (1 << 1)
+#endif
+#ifndef HWCAP2_SVEAES
+#define HWCAP2_SVEAES (1 << 2)
+#endif
+#ifndef HWCAP2_SVEPMULL
+#define HWCAP2_SVEPMULL (1 << 3)
+#endif
+#ifndef HWCAP2_SVEBITPERM
+#define HWCAP2_SVEBITPERM (1 << 4)
+#endif
+#ifndef HWCAP2_SVESHA3
+#define HWCAP2_SVESHA3 (1 << 5)
+#endif
+#ifndef HWCAP2_SVESM4
+#define HWCAP2_SVESM4 (1 << 6)
+#endif
+#ifndef HWCAP2_FLAGM2
+#define HWCAP2_FLAGM2 (1 << 7)
+#endif
+#ifndef HWCAP2_FRINT
+#define HWCAP2_FRINT (1 << 8)
+#endif
+#ifndef HWCAP2_SVEI8MM
+#define HWCAP2_SVEI8MM (1 << 9)
+#endif
+#ifndef HWCAP2_SVEF32MM
+#define HWCAP2_SVEF32MM (1 << 10)
+#endif
+#ifndef HWCAP2_SVEF64MM
+#define HWCAP2_SVEF64MM (1 << 11)
+#endif
+#ifndef HWCAP2_SVEBF16
+#define HWCAP2_SVEBF16 (1 << 12)
+#endif
+#ifndef HWCAP2_I8MM
+#define HWCAP2_I8MM (1 << 13)
+#endif
+#ifndef HWCAP2_BF16
+#define HWCAP2_BF16 (1 << 14)
+#endif
+#ifndef HWCAP2_DGH
+#define HWCAP2_DGH (1 << 15)
+#endif
+#ifndef HWCAP2_RNG
+#define HWCAP2_RNG (1 << 16)
+#endif
+#ifndef HWCAP2_BTI
+#define HWCAP2_BTI (1 << 17)
+#endif
+#ifndef HWCAP2_MTE
+#define HWCAP2_MTE (1 << 18)
+#endif
+#ifndef HWCAP2_RPRES
+#define HWCAP2_RPRES (1 << 21)
+#endif
+#ifndef HWCAP2_MTE3
+#define HWCAP2_MTE3 (1 << 22)
+#endif
+#ifndef HWCAP2_SME
+#define HWCAP2_SME (1 << 23)
+#endif
+#ifndef HWCAP2_SME_I16I64
+#define HWCAP2_SME_I16I64 (1 << 24)
+#endif
+#ifndef HWCAP2_SME_F64F64
+#define HWCAP2_SME_F64F64 (1 << 25)
+#endif
+#ifndef HWCAP2_WFXT
+#define HWCAP2_WFXT (1UL << 31)
+#endif
+#ifndef HWCAP2_EBF16
+#define HWCAP2_EBF16 (1UL << 32)
+#endif
+#ifndef HWCAP2_SVE_EBF16
+#define HWCAP2_SVE_EBF16 (1UL << 33)
+#endif
+
+static void
+__init_cpu_features_constructor(unsigned long hwcap,
+				const __ifunc_arg_t *arg) {
+#define setCPUFeature(F) __aarch64_cpu_features.features |= 1ULL << F
+#define getCPUFeature(id, ftr) __asm__("mrs %0, " #id : "=r"(ftr))
+#define extractBits(val, start, number) \
+  (val & ((1ULL << number) - 1ULL) << start) >> start
+  unsigned long hwcap2 = 0;
+  if (hwcap & _IFUNC_ARG_HWCAP)
+    hwcap2 = arg->_hwcap2;
+  if (hwcap & HWCAP_CRC32)
+    setCPUFeature(FEAT_CRC);
+  if (hwcap & HWCAP_PMULL)
+    setCPUFeature(FEAT_PMULL);
+  if (hwcap & HWCAP_FLAGM)
+    setCPUFeature(FEAT_FLAGM);
+  if (hwcap2 & HWCAP2_FLAGM2) {
+    setCPUFeature(FEAT_FLAGM);
+    setCPUFeature(FEAT_FLAGM2);
+  }
+  if (hwcap & HWCAP_SM3 && hwcap & HWCAP_SM4)
+    setCPUFeature(FEAT_SM4);
+  if (hwcap & HWCAP_ASIMDDP)
+    setCPUFeature(FEAT_DOTPROD);
+  if (hwcap & HWCAP_ASIMDFHM)
+    setCPUFeature(FEAT_FP16FML);
+  if (hwcap & HWCAP_FPHP) {
+    setCPUFeature(FEAT_FP16);
+    setCPUFeature(FEAT_FP);
+  }
+  if (hwcap & HWCAP_DIT)
+    setCPUFeature(FEAT_DIT);
+  if (hwcap & HWCAP_ASIMDRDM)
+    setCPUFeature(FEAT_RDM);
+  if (hwcap & HWCAP_ILRCPC)
+    setCPUFeature(FEAT_RCPC2);
+  if (hwcap & HWCAP_AES)
+    setCPUFeature(FEAT_AES);
+  if (hwcap & HWCAP_SHA1)
+    setCPUFeature(FEAT_SHA1);
+  if (hwcap & HWCAP_SHA2)
+    setCPUFeature(FEAT_SHA2);
+  if (hwcap & HWCAP_JSCVT)
+    setCPUFeature(FEAT_JSCVT);
+  if (hwcap & HWCAP_FCMA)
+    setCPUFeature(FEAT_FCMA);
+  if (hwcap & HWCAP_SB)
+    setCPUFeature(FEAT_SB);
+  if (hwcap & HWCAP_SSBS)
+    setCPUFeature(FEAT_SSBS2);
+  if (hwcap2 & HWCAP2_MTE) {
+    setCPUFeature(FEAT_MEMTAG);
+    setCPUFeature(FEAT_MEMTAG2);
+  }
+  if (hwcap2 & HWCAP2_MTE3) {
+    setCPUFeature(FEAT_MEMTAG);
+    setCPUFeature(FEAT_MEMTAG2);
+    setCPUFeature(FEAT_MEMTAG3);
+  }
+  if (hwcap2 & HWCAP2_SVEAES)
+    setCPUFeature(FEAT_SVE_AES);
+  if (hwcap2 & HWCAP2_SVEPMULL) {
+    setCPUFeature(FEAT_SVE_AES);
+    setCPUFeature(FEAT_SVE_PMULL128);
+  }
+  if (hwcap2 & HWCAP2_SVEBITPERM)
+    setCPUFeature(FEAT_SVE_BITPERM);
+  if (hwcap2 & HWCAP2_SVESHA3)
+    setCPUFeature(FEAT_SVE_SHA3);
+  if (hwcap2 & HWCAP2_SVESM4)
+    setCPUFeature(FEAT_SVE_SM4);
+  if (hwcap2 & HWCAP2_DCPODP)
+    setCPUFeature(FEAT_DPB2);
+  if (hwcap & HWCAP_ATOMICS)
+    setCPUFeature(FEAT_LSE);
+  if (hwcap2 & HWCAP2_RNG)
+    setCPUFeature(FEAT_RNG);
+  if (hwcap2 & HWCAP2_I8MM)
+    setCPUFeature(FEAT_I8MM);
+  if (hwcap2 & HWCAP2_EBF16)
+    setCPUFeature(FEAT_EBF16);
+  if (hwcap2 & HWCAP2_SVE_EBF16)
+    setCPUFeature(FEAT_SVE_EBF16);
+  if (hwcap2 & HWCAP2_DGH)
+    setCPUFeature(FEAT_DGH);
+  if (hwcap2 & HWCAP2_FRINT)
+    setCPUFeature(FEAT_FRINTTS);
+  if (hwcap2 & HWCAP2_SVEI8MM)
+    setCPUFeature(FEAT_SVE_I8MM);
+  if (hwcap2 & HWCAP2_SVEF32MM)
+    setCPUFeature(FEAT_SVE_F32MM);
+  if (hwcap2 & HWCAP2_SVEF64MM)
+    setCPUFeature(FEAT_SVE_F64MM);
+  if (hwcap2 & HWCAP2_BTI)
+    setCPUFeature(FEAT_BTI);
+  if (hwcap2 & HWCAP2_RPRES)
+    setCPUFeature(FEAT_RPRES);
+  if (hwcap2 & HWCAP2_WFXT)
+    setCPUFeature(FEAT_WFXT);
+  if (hwcap2 & HWCAP2_SME)
+    setCPUFeature(FEAT_SME);
+  if (hwcap2 & HWCAP2_SME_I16I64)
+    setCPUFeature(FEAT_SME_I64);
+  if (hwcap2 & HWCAP2_SME_F64F64)
+    setCPUFeature(FEAT_SME_F64);
+  if (hwcap & HWCAP_CPUID) {
+    unsigned long ftr;
+    getCPUFeature(ID_AA64PFR1_EL1, ftr);
+    /* ID_AA64PFR1_EL1.MTE >= 0b0001  */
+    if (extractBits(ftr, 8, 4) >= 0x1)
+      setCPUFeature(FEAT_MEMTAG);
+    /* ID_AA64PFR1_EL1.SSBS == 0b0001  */
+    if (extractBits(ftr, 4, 4) == 0x1)
+      setCPUFeature(FEAT_SSBS);
+    /* ID_AA64PFR1_EL1.SME == 0b0010  */
+    if (extractBits(ftr, 24, 4) == 0x2)
+      setCPUFeature(FEAT_SME2);
+    getCPUFeature(ID_AA64PFR0_EL1, ftr);
+    /* ID_AA64PFR0_EL1.FP != 0b1111  */
+    if (extractBits(ftr, 16, 4) != 0xF) {
+      setCPUFeature(FEAT_FP);
+      /* ID_AA64PFR0_EL1.AdvSIMD has the same value as ID_AA64PFR0_EL1.FP  */
+      setCPUFeature(FEAT_SIMD);
+    }
+    /* ID_AA64PFR0_EL1.SVE != 0b0000  */
+    if (extractBits(ftr, 32, 4) != 0x0) {
+      /* get ID_AA64ZFR0_EL1, that name supported if sve enabled only  */
+      getCPUFeature(S3_0_C0_C4_4, ftr);
+      /* ID_AA64ZFR0_EL1.SVEver == 0b0000  */
+      if (extractBits(ftr, 0, 4) == 0x0)
+	setCPUFeature(FEAT_SVE);
+      /* ID_AA64ZFR0_EL1.SVEver == 0b0001  */
+      if (extractBits(ftr, 0, 4) == 0x1)
+	setCPUFeature(FEAT_SVE2);
+      /* ID_AA64ZFR0_EL1.BF16 != 0b0000  */
+      if (extractBits(ftr, 20, 4) != 0x0)
+	setCPUFeature(FEAT_SVE_BF16);
+    }
+    getCPUFeature(ID_AA64ISAR0_EL1, ftr);
+    /* ID_AA64ISAR0_EL1.SHA3 != 0b0000  */
+    if (extractBits(ftr, 32, 4) != 0x0)
+      setCPUFeature(FEAT_SHA3);
+    getCPUFeature(ID_AA64ISAR1_EL1, ftr);
+    /* ID_AA64ISAR1_EL1.DPB >= 0b0001  */
+    if (extractBits(ftr, 0, 4) >= 0x1)
+      setCPUFeature(FEAT_DPB);
+    /* ID_AA64ISAR1_EL1.LRCPC != 0b0000  */
+    if (extractBits(ftr, 20, 4) != 0x0)
+      setCPUFeature(FEAT_RCPC);
+    /* ID_AA64ISAR1_EL1.LRCPC == 0b0011  */
+    if (extractBits(ftr, 20, 4) == 0x3)
+      setCPUFeature(FEAT_RCPC3);
+    /* ID_AA64ISAR1_EL1.SPECRES == 0b0001  */
+    if (extractBits(ftr, 40, 4) == 0x2)
+      setCPUFeature(FEAT_PREDRES);
+    /* ID_AA64ISAR1_EL1.BF16 != 0b0000  */
+    if (extractBits(ftr, 44, 4) != 0x0)
+      setCPUFeature(FEAT_BF16);
+    /* ID_AA64ISAR1_EL1.LS64 >= 0b0001  */
+    if (extractBits(ftr, 60, 4) >= 0x1)
+      setCPUFeature(FEAT_LS64);
+    /* ID_AA64ISAR1_EL1.LS64 >= 0b0010  */
+    if (extractBits(ftr, 60, 4) >= 0x2)
+      setCPUFeature(FEAT_LS64_V);
+    /* ID_AA64ISAR1_EL1.LS64 >= 0b0011  */
+    if (extractBits(ftr, 60, 4) >= 0x3)
+      setCPUFeature(FEAT_LS64_ACCDATA);
+  } else {
+    /* Set some features in case of no CPUID support.  */
+    if (hwcap & (HWCAP_FP | HWCAP_FPHP)) {
+      setCPUFeature(FEAT_FP);
+      /* FP and AdvSIMD fields have the same value.  */
+      setCPUFeature(FEAT_SIMD);
+    }
+    if (hwcap & HWCAP_DCPOP || hwcap2 & HWCAP2_DCPODP)
+      setCPUFeature(FEAT_DPB);
+    if (hwcap & HWCAP_LRCPC || hwcap & HWCAP_ILRCPC)
+      setCPUFeature(FEAT_RCPC);
+    if (hwcap2 & HWCAP2_BF16 || hwcap2 & HWCAP2_EBF16)
+      setCPUFeature(FEAT_BF16);
+    if (hwcap2 & HWCAP2_SVEBF16)
+      setCPUFeature(FEAT_SVE_BF16);
+    if (hwcap2 & HWCAP2_SVE2 && hwcap & HWCAP_SVE)
+      setCPUFeature(FEAT_SVE2);
+    if (hwcap & HWCAP_SHA3)
+      setCPUFeature(FEAT_SHA3);
+  }
+  setCPUFeature(FEAT_INIT);
+}
+
+void
+__init_cpu_features_resolver(unsigned long hwcap, const __ifunc_arg_t *arg) {
+  if (__aarch64_cpu_features.features)
+    return;
+  __init_cpu_features_constructor(hwcap, arg);
+}
+
+void __attribute__ ((constructor))
+__init_cpu_features(void) {
+  unsigned long hwcap;
+  unsigned long hwcap2;
+  /* CPU features already initialized.  */
+  if (__aarch64_cpu_features.features)
+    return;
+  hwcap = getauxval(AT_HWCAP);
+  hwcap2 = getauxval(AT_HWCAP2);
+  __ifunc_arg_t arg;
+  arg._size = sizeof(__ifunc_arg_t);
+  arg._hwcap = hwcap;
+  arg._hwcap2 = hwcap2;
+  __init_cpu_features_constructor(hwcap | _IFUNC_ARG_HWCAP, &arg);
+#undef extractBits
+#undef getCPUFeature
+#undef setCPUFeature
+}
+#endif /* __has_include(<asm/hwcap.h>)  */
+#endif /* __has_include(<sys/auxv.h>)  */
+#endif /* defined(__has_include)  */
diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64
index a40b6241c86ecc4007b5cfd28aa989ee894aa410..8bc1a4ca0c2eb75c17e62a25aa45a875bfd472f8 100644
--- a/libgcc/config/aarch64/t-aarch64
+++ b/libgcc/config/aarch64/t-aarch64
@@ -19,3 +19,4 @@
 # <http://www.gnu.org/licenses/>.
 
 LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c
+LIB2ADD += $(srcdir)/config/aarch64/cpuinfo.c

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 2/5] c-family: Simplify attribute exclusion handling
  2023-11-17  2:49 [PATCH v2 0/5] target_version and aarch64 function multiversioning Andrew Carlotti
  2023-11-17  2:51 ` [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc Andrew Carlotti
@ 2023-11-17  2:53 ` Andrew Carlotti
  2023-11-19 21:45   ` Jeff Law
  2023-11-17  2:54 ` [PATCH v2 3/5] ada: Improve " Andrew Carlotti
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Carlotti @ 2023-11-17  2:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: jason, nathan, rguenther, richard.sandiford, richard.earnshaw

This patch changes the handling of mutual exclusions involving the
target and target_clones attributes to use the generic attribute
exclusion lists.  Additionally, the duplicate handling for the
always_inline and noinline attribute exclusion is removed.

The only change in functionality is the choice of warning message
displayed - due to either a change in the wording for mutual exclusion
warnings, or a change in the order in which different checks occur.

Ok for master?

gcc/c-family/ChangeLog:

	* c-attribs.cc (attr_always_inline_exclusions): New.
	(attr_target_exclusions): Ditto.
	(attr_target_clones_exclusions): Ditto.
	(c_common_attribute_table): Add new exclusion lists.
	(handle_noinline_attribute): Remove custom exclusion handling.
	(handle_always_inline_attribute): Ditto.
	(handle_target_attribute): Ditto.
	(handle_target_clones_attribute): Ditto.

gcc/testsuite/ChangeLog:

	* g++.target/i386/mvc2.C:
	* g++.target/i386/mvc3.C:


diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 461732f60f7c4031cc6692000fbdddb9f726a035..b3b41ef123a0f171f57acb1b7f7fdde716428c00 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -214,6 +214,13 @@ static const struct attribute_spec::exclusions attr_inline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] =
+{
+  ATTR_EXCL ("noinline", true, true, true),
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
 {
   ATTR_EXCL ("always_inline", true, true, true),
@@ -221,6 +228,19 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
+{
+  ATTR_EXCL ("always_inline", true, true, true),
+  ATTR_EXCL ("target", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 extern const struct attribute_spec::exclusions attr_noreturn_exclusions[] =
 {
   ATTR_EXCL ("alloc_align", true, true, true),
@@ -332,7 +352,7 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_leaf_attribute, NULL },
   { "always_inline",          0, 0, true,  false, false, false,
 			      handle_always_inline_attribute,
-	                      attr_inline_exclusions },
+			      attr_always_inline_exclusions },
   { "gnu_inline",             0, 0, true,  false, false, false,
 			      handle_gnu_inline_attribute,
 	                      attr_inline_exclusions },
@@ -483,9 +503,11 @@ const struct attribute_spec c_common_attribute_table[] =
   { "error",		      1, 1, true,  false, false, false,
 			      handle_error_attribute, NULL },
   { "target",                 1, -1, true, false, false, false,
-			      handle_target_attribute, NULL },
+			      handle_target_attribute,
+			      attr_target_exclusions },
   { "target_clones",          1, -1, true, false, false, false,
-			      handle_target_clones_attribute, NULL },
+			      handle_target_clones_attribute,
+			      attr_target_clones_exclusions },
   { "optimize",               1, -1, true, false, false, false,
 			      handle_optimize_attribute, NULL },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -1397,16 +1419,7 @@ handle_noinline_attribute (tree *node, tree name,
 			   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-    {
-      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-	{
-	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with attribute %qs", name, "always_inline");
-	  *no_add_attrs = true;
-	}
-      else
-	DECL_UNINLINABLE (*node) = 1;
-    }
+    DECL_UNINLINABLE (*node) = 1;
   else
     {
       warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1487,22 +1500,9 @@ handle_always_inline_attribute (tree *node, tree name,
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
     {
-      if (lookup_attribute ("noinline", DECL_ATTRIBUTES (*node)))
-	{
-	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with %qs attribute", name, "noinline");
-	  *no_add_attrs = true;
-	}
-      else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
-	{
-	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with %qs attribute", name, "target_clones");
-	  *no_add_attrs = true;
-	}
-      else
-	/* Set the attribute and mark it for disregarding inline
-	   limits.  */
-	DECL_DISREGARD_INLINE_LIMITS (*node) = 1;
+      /* Set the attribute and mark it for disregarding inline
+	 limits.  */
+      DECL_DISREGARD_INLINE_LIMITS (*node) = 1;
     }
   else
     {
@@ -5650,12 +5650,6 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
-    {
-      warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with %qs attribute", name, "target_clones");
-      *no_add_attrs = true;
-    }
   else if (! targetm.target_option.valid_attribute_p (*node, name, args,
 						      flags))
     *no_add_attrs = true;
@@ -5696,19 +5690,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 	    }
 	}
 
-      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-	{
-	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with %qs attribute", name, "always_inline");
-	  *no_add_attrs = true;
-	}
-      else if (lookup_attribute ("target", DECL_ATTRIBUTES (*node)))
-	{
-	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with %qs attribute", name, "target");
-	  *no_add_attrs = true;
-	}
-      else if (get_target_clone_attr_len (args) == -1)
+      if (get_target_clone_attr_len (args) == -1)
 	{
 	  warning (OPT_Wattributes,
 		   "single %<target_clones%> attribute is ignored");
diff --git a/gcc/testsuite/g++.target/i386/mvc2.C b/gcc/testsuite/g++.target/i386/mvc2.C
index 7c1fb6518d04f404123086660c32853dcd9f65ba..04ee0573d607f6de2e7ea382e891f62884c18ea7 100644
--- a/gcc/testsuite/g++.target/i386/mvc2.C
+++ b/gcc/testsuite/g++.target/i386/mvc2.C
@@ -3,7 +3,7 @@
 
 __attribute__((target_clones("avx","arch=slm","default")))
 __attribute__((target("avx")))
-int foo (); /* { dg-warning "'target' attribute ignored due to conflict with 'target_clones' attribute" } */
+int foo (); /* { dg-warning "ignoring attribute 'target' because it conflicts with attribute 'target_clones'" } */
 
 __attribute__((target_clones("avx","arch=slm","default"),always_inline))
-int bar (); /* { dg-warning "'always_inline' attribute ignored due to conflict with 'target_clones' attribute" } */
+int bar (); /* { dg-warning "ignoring attribute 'always_inline' because it conflicts with attribute 'target_clones'" } */
diff --git a/gcc/testsuite/g++.target/i386/mvc3.C b/gcc/testsuite/g++.target/i386/mvc3.C
index 5d634fd7ea68b905a0e93ca1c25f6907bc9d2858..5ad1f88fd2d7da74fafcafcff24b77cb2d12a5a0 100644
--- a/gcc/testsuite/g++.target/i386/mvc3.C
+++ b/gcc/testsuite/g++.target/i386/mvc3.C
@@ -3,7 +3,7 @@
 
 __attribute__((target("avx")))
 __attribute__((target_clones("avx","arch=slm","default")))
-int foo (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'target' attribute" } */
+int foo (); /* { dg-warning "ignoring attribute 'target_clones' because it conflicts with attribute 'target'" } */
 
 __attribute__((always_inline,target_clones("avx","arch=slm","default")))
-int bar (); /* { dg-warning "'target_clones' attribute ignored due to conflict with 'always_inline' attribute" } */
+int bar (); /* { dg-warning "ignoring attribute 'target_clones' because it conflicts with attribute 'always_inline'" } */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 3/5] ada: Improve attribute exclusion handling
  2023-11-17  2:49 [PATCH v2 0/5] target_version and aarch64 function multiversioning Andrew Carlotti
  2023-11-17  2:51 ` [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc Andrew Carlotti
  2023-11-17  2:53 ` [PATCH v2 2/5] c-family: Simplify attribute exclusion handling Andrew Carlotti
@ 2023-11-17  2:54 ` Andrew Carlotti
  2023-11-17 10:45   ` Marc Poulhiès
  2023-11-17  2:55 ` [PATCH v2 4/5] Add support for target_version attribute Andrew Carlotti
  2023-11-17  2:56 ` [PATCH v2 5/5] aarch64: Add function multiversioning support Andrew Carlotti
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Carlotti @ 2023-11-17  2:54 UTC (permalink / raw)
  To: gcc-patches
  Cc: ebotcazou, poulhies, rguenther, richard.sandiford, richard.earnshaw

Change the handling of some attribute mutual exclusions to use the
generic attribute exclusion lists, and fix some asymmetric exclusions by
adding the exclusions for always_inline after noinline or target_clones.

Aside from the new always_inline exclusions, the only change is
functionality is the choice of warning message displayed.  All warnings
about attribute mutual exclusions now use the same message.

---

I haven't manged to test the Ada frontend, but this patch (and the following
one) contain only minimal change to functionality, which I have tested by
copying the code to the C++ frontend and verifying the behaviour of equivalent
changes there.  Is this ok to push without further testing?  If not, then could
someone test this series for me?

gcc/ada/ChangeLog:

	* gcc-interface/utils.cc (attr_noinline_exclusions): New.
	(attr_always_inline_exclusions): Ditto.
	(attr_target_exclusions): Ditto.
	(attr_target_clones_exclusions): Ditto.
	(gnat_internal_attribute_table): Add new exclusion lists.
	(handle_noinline_attribute): Remove custom exclusion handling.
	(handle_target_attribute): Ditto.
	(handle_target_clones_attribute): Ditto.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 8b2c7f99ef3060603658e438b71a3bfa3ef7f2ac..e33a63948cebdeafc3abcdd539a35141969ad978 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -130,6 +130,32 @@ static const struct attribute_spec::exclusions attr_stack_protect_exclusions[] =
   { NULL, false, false, false },
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] =
+{
+  { "noinline", true, true, true },
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
+{
+  { "always_inline", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
+{
+  { "always_inline", true, true, true },
+  { "target", true, true, true },
+  { NULL, false, false, false },
+};
+
 /* Fake handler for attributes we don't properly support, typically because
    they'd require dragging a lot of the common-c front-end circuitry.  */
 static tree fake_attribute_handler (tree *, tree, tree, int, bool *);
@@ -165,7 +191,7 @@ const struct attribute_spec gnat_internal_attribute_table[] =
   { "strub",	    0, 1, false, true, false, true,
     handle_strub_attribute, NULL },
   { "noinline",     0, 0,  true,  false, false, false,
-    handle_noinline_attribute, NULL },
+    handle_noinline_attribute, attr_noinline_exclusions },
   { "noclone",      0, 0,  true,  false, false, false,
     handle_noclone_attribute, NULL },
   { "no_icf",       0, 0,  true,  false, false, false,
@@ -175,7 +201,7 @@ const struct attribute_spec gnat_internal_attribute_table[] =
   { "leaf",         0, 0,  true,  false, false, false,
     handle_leaf_attribute, NULL },
   { "always_inline",0, 0,  true,  false, false, false,
-    handle_always_inline_attribute, NULL },
+    handle_always_inline_attribute, attr_always_inline_exclusions },
   { "malloc",       0, 0,  true,  false, false, false,
     handle_malloc_attribute, NULL },
   { "type generic", 0, 0,  false, true,  true,  false,
@@ -192,9 +218,9 @@ const struct attribute_spec gnat_internal_attribute_table[] =
   { "simd",         0, 1,  true,  false, false, false,
     handle_simd_attribute, NULL },
   { "target",       1, -1, true,  false, false, false,
-    handle_target_attribute, NULL },
+    handle_target_attribute, attr_target_exclusions },
   { "target_clones",1, -1, true,  false, false, false,
-    handle_target_clones_attribute, NULL },
+    handle_target_clones_attribute, attr_target_clones_exclusions },
 
   { "vector_size",  1, 1,  false, true,  false, false,
     handle_vector_size_attribute, NULL },
@@ -6742,16 +6768,7 @@ handle_noinline_attribute (tree *node, tree name,
 			   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-    {
-      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-	{
-	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with attribute %qs", name, "always_inline");
-	  *no_add_attrs = true;
-	}
-      else
-	DECL_UNINLINABLE (*node) = 1;
-    }
+    DECL_UNINLINABLE (*node) = 1;
   else
     {
       warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -7050,12 +7067,6 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))
-    {
-      warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with %qs attribute", name, "target_clones");
-      *no_add_attrs = true;
-    }
   else if (!targetm.target_option.valid_attribute_p (*node, name, args, flags))
     *no_add_attrs = true;
 
@@ -7083,23 +7094,8 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 {
   /* Ensure we have a function type.  */
   if (TREE_CODE (*node) == FUNCTION_DECL)
-    {
-      if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-	{
-	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with %qs attribute", name, "always_inline");
-	  *no_add_attrs = true;
-	}
-      else if (lookup_attribute ("target", DECL_ATTRIBUTES (*node)))
-	{
-	  warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-		   "with %qs attribute", name, "target");
-	  *no_add_attrs = true;
-	}
-      else
-	/* Do not inline functions with multiple clone targets.  */
-	DECL_UNINLINABLE (*node) = 1;
-    }
+    /* Do not inline functions with multiple clone targets.  */
+    DECL_UNINLINABLE (*node) = 1;
   else
     {
       warning (OPT_Wattributes, "%qE attribute ignored", name);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 4/5] Add support for target_version attribute
  2023-11-17  2:49 [PATCH v2 0/5] target_version and aarch64 function multiversioning Andrew Carlotti
                   ` (2 preceding siblings ...)
  2023-11-17  2:54 ` [PATCH v2 3/5] ada: Improve " Andrew Carlotti
@ 2023-11-17  2:55 ` Andrew Carlotti
  2023-11-29 17:53   ` Richard Sandiford
  2023-11-17  2:56 ` [PATCH v2 5/5] aarch64: Add function multiversioning support Andrew Carlotti
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Carlotti @ 2023-11-17  2:55 UTC (permalink / raw)
  To: gcc-patches
  Cc: ebotcazou, poulhies, ibuclaw, jason, nathan, rguenther,
	richard.sandiford, richard.earnshaw

This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.

On targets that don't use the "target" attribute for multiversioning,
there is no conflict between the "target" and "target_clones"
attributes.  This patch therefore makes the mutual exclusion in
C-family, D and Ada conditonal upon the value of the
expanded_clones_attribute target hook.

The "target_version" attribute is only added to C++ in this patch,
because this is currently the only frontend which supports
multiversioning using the "target" attribute.  Support for the
"target_version" attribute will be extended to C at a later date.

Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.

Ok for master?

gcc/ChangeLog:

	* attribs.cc (decl_attributes): Pass attribute name to target.
	(is_function_default_version): Update comment to specify
	incompatibility with target_version attributes.
	* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
	Call valid_version_attribute_p for target_version attributes.
	* target.def (valid_version_attribute_p): New hook.
	(expanded_clones_attribute): New hook.
	* doc/tm.texi.in: Add new hooks.
	* doc/tm.texi: Regenerate.
	* multiple_target.cc (create_dispatcher_calls): Remove redundant
	is_function_default_version check.
	(expand_target_clones): Use target hook for attribute name.
	* targhooks.cc (default_target_option_valid_version_attribute_p):
	New.
	* targhooks.h (default_target_option_valid_version_attribute_p):
	New.
	* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
	target_version attributes.

gcc/c-family/ChangeLog:

	* c-attribs.cc (CLONES_USES_TARGET): New macro.
	(attr_target_exclusions): Use new macro.
	(attr_target_clones_exclusions): Ditto, and add target_version.
	(attr_target_version_exclusions): New.
	(c_common_attribute_table): Add target_version.
	(handle_target_version_attribute): New.

gcc/ada/ChangeLog:

	* gcc-interface/utils.cc (CLONES_USES_TARGET): New macro.
	(attr_target_exclusions): Use new macro.
	(attr_target_clones_exclusions): Ditto.

gcc/d/ChangeLog:

	* d-attribs.cc (CLONES_USES_TARGET): New macro.
	(attr_target_exclusions): Use new macro.
	(attr_target_clones_exclusions): Ditto.

gcc/cp/ChangeLog:

	* decl2.cc (check_classfn): Update comment to include
	target_version attributes.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index e33a63948cebdeafc3abcdd539a35141969ad978..8850943cb3326568b4679a73405f50487aa1b7c6 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -143,16 +143,21 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
   { NULL, false, false, false },
 };
 
+#define CLONES_USES_TARGET \
+  (strcmp (targetm.target_option.expanded_clones_attribute, \
+	   "target") == 0)
+
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  { "target_clones", true, true, true },
+  { "target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
+    CLONES_USES_TARGET },
   { NULL, false, false, false },
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
 {
   { "always_inline", true, true, true },
-  { "target", true, true, true },
+  { "target", CLONES_USES_TARGET, CLONES_USES_TARGET, CLONES_USES_TARGET },
   { NULL, false, false, false },
 };
 
diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index f9fd258598914ce2112ecaaeaad6c63cd69a44e2..27533023ef5c481ba085c2f0c605dfb992987b3e 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
      options to the attribute((target(...))) list.  */
   if (TREE_CODE (*node) == FUNCTION_DECL
       && current_target_pragma
-      && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
+      && targetm.target_option.valid_attribute_p (*node,
+						  get_identifier("target"),
 						  current_target_pragma, 0))
     {
       tree cur_attr = lookup_attribute ("target", attributes);
@@ -1241,8 +1242,9 @@ make_dispatcher_decl (const tree decl)
   return func_decl;  
 }
 
-/* Returns true if decl is multi-versioned and DECL is the default function,
-   that is it is not tagged with target specific optimization.  */
+/* Returns true if DECL is multi-versioned using the target attribute, and this
+   is the default version.  This function can only be used for targets that do
+   not support the "target_version" attribute.  */
 
 bool
 is_function_default_version (const tree decl)
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index b3b41ef123a0f171f57acb1b7f7fdde716428c00..8e33b7c3f4a9e7dcaa299eeff0eea92240f7ef0a 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -149,6 +149,7 @@ static tree handle_alloc_align_attribute (tree *, tree, tree, int, bool *);
 static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_attribute (tree *, tree, tree, int, bool *);
+static tree handle_target_version_attribute (tree *, tree, tree, int, bool *);
 static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
 static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
 static tree ignore_attribute (tree *, tree, tree, int, bool *);
@@ -228,16 +229,29 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+#define CLONES_USES_TARGET \
+  (strcmp (targetm.target_option.expanded_clones_attribute, \
+	   "target") == 0)
+
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL ("target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
+	     CLONES_USES_TARGET),
   ATTR_EXCL (NULL, false, false, false),
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
 {
   ATTR_EXCL ("always_inline", true, true, true),
-  ATTR_EXCL ("target", true, true, true),
+  ATTR_EXCL ("target", CLONES_USES_TARGET, CLONES_USES_TARGET,
+	     CLONES_USES_TARGET),
+  ATTR_EXCL ("target_version", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
+static const struct attribute_spec::exclusions attr_target_version_exclusions[] =
+{
+  ATTR_EXCL ("target_clones", true, true, true),
   ATTR_EXCL (NULL, false, false, false),
 };
 
@@ -505,6 +519,9 @@ const struct attribute_spec c_common_attribute_table[] =
   { "target",                 1, -1, true, false, false, false,
 			      handle_target_attribute,
 			      attr_target_exclusions },
+  { "target_version",         1, 1, true, false, false, false,
+			      handle_target_version_attribute,
+			      attr_target_version_exclusions },
   { "target_clones",          1, -1, true, false, false, false,
 			      handle_target_clones_attribute,
 			      attr_target_clones_exclusions },
@@ -5670,6 +5687,25 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
   return NULL_TREE;
 }
 
+/* Handle a "target_version" attribute.  */
+
+static tree
+handle_target_version_attribute (tree *node, tree name, tree args, int flags,
+				  bool *no_add_attrs)
+{
+  /* Ensure we have a function type.  */
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+    }
+  else if (!targetm.target_option.valid_version_attribute_p (*node, name, args,
+							     flags))
+    *no_add_attrs = true;
+
+  return NULL_TREE;
+}
+
 /* Handle a "target_clones" attribute.  */
 
 static tree
diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
index 29d28ef895a73a223695cbb86aafbc845bbe7688..8af6b23d8c0306920e0fdcb3559ef047a16689f4 100644
--- a/gcc/cgraphclones.cc
+++ b/gcc/cgraphclones.cc
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "tree-cfg.h"
 #include "tree-inline.h"
+#include "attribs.h"
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
 #include "alloc-pool.h"
@@ -1048,7 +1049,17 @@ cgraph_node::create_version_clone_with_body
       location_t saved_loc = input_location;
       tree v = TREE_VALUE (target_attributes);
       input_location = DECL_SOURCE_LOCATION (new_decl);
-      bool r = targetm.target_option.valid_attribute_p (new_decl, NULL, v, 1);
+      bool r;
+      tree name_id = get_attribute_name (target_attributes);
+      const char* name_str = IDENTIFIER_POINTER (name_id);
+      if (strcmp (name_str, "target") == 0)
+	r = targetm.target_option.valid_attribute_p (new_decl, name_id, v, 1);
+      else if (strcmp (name_str, "target_version") == 0)
+	r = targetm.target_option.valid_version_attribute_p (new_decl, name_id,
+							     v, 1);
+      else
+	gcc_assert(false);
+
       input_location = saved_loc;
       if (!r)
 	return NULL;
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 9e666e5eecee07ae7c742c3a2b27e85899945c4e..e607aa14d284d545d122e04b0eae1247fd301882 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -832,8 +832,8 @@ check_classfn (tree ctype, tree function, tree template_parms)
       tree c2 = get_constraints (fndecl);
 
       /* While finding a match, same types and params are not enough
-	 if the function is versioned.  Also check version ("target")
-	 attributes.  */
+	 if the function is versioned.  Also check for different target
+	 specific attributes.  */
       if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
 		       TREE_TYPE (TREE_TYPE (fndecl)))
 	  && compparms (p1, p2)
diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
index c0dc0e24ded871c136e54e5527e901d16cfa5ceb..7fe68565e70dd1124aac63601416dad68600a34e 100644
--- a/gcc/d/d-attribs.cc
+++ b/gcc/d/d-attribs.cc
@@ -126,16 +126,22 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+#define CLONES_USES_TARGET \
+  (strcmp (targetm.target_option.expanded_clones_attribute, \
+	   "target") == 0)
+
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL ("target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
+	     CLONES_USES_TARGET),
   ATTR_EXCL (NULL, false, false, false),
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
 {
   ATTR_EXCL ("always_inline", true, true, true),
-  ATTR_EXCL ("target", true, true, true),
+  ATTR_EXCL ("target", CLONES_USES_TARGET, CLONES_USES_TARGET,
+	     CLONES_USES_TARGET),
   ATTR_EXCL (NULL, false, false, false),
 };
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index d83ca73b1aff90d3c181436afedc162b977a4158..6f6b133803f4574fcf0112b1385eec861112ddd5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10644,6 +10644,23 @@ the function declaration to hold a pointer to a target-specific
 @code{struct cl_target_option} structure.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P (tree @var{fndecl}, tree @var{name}, tree @var{args}, int @var{flags})
+This hook is called to parse @code{attribute(target_version("..."))},
+which allows setting target-specific options on individual function versions.
+These function-specific options may differ
+from the options specified on the command line.  The hook should return
+@code{true} if the options are valid.
+
+The hook should set the @code{DECL_FUNCTION_SPECIFIC_TARGET} field in
+the function declaration to hold a pointer to a target-specific
+@code{struct cl_target_option} structure.
+@end deftypefn
+
+@deftypevr {Target Hook} {const char *} TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
+Contains the name of the attribute used for the version description string
+when expanding clones for a function with the target_clones attribute.
+@end deftypevr
+
 @deftypefn {Target Hook} void TARGET_OPTION_SAVE (struct cl_target_option *@var{ptr}, struct gcc_options *@var{opts}, struct gcc_options *@var{opts_set})
 This hook is called to save any additional target-specific information
 in the @code{struct cl_target_option} structure for function-specific
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 3d3ae12cc2ff62025b1138430de501a33961fd90..149c88f627be20a9a35ead2eaebdb704e51927fa 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7028,6 +7028,10 @@ on this implementation detail.
 
 @hook TARGET_OPTION_VALID_ATTRIBUTE_P
 
+@hook TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P
+
+@hook TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
+
 @hook TARGET_OPTION_SAVE
 
 @hook TARGET_OPTION_RESTORE
diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
index a2ed048d7dd28ec470953fcd8a0dc86817e4b7dc..3db57c2b13d612a37240d9dcf58ad21b2286633c 100644
--- a/gcc/multiple_target.cc
+++ b/gcc/multiple_target.cc
@@ -66,10 +66,6 @@ create_dispatcher_calls (struct cgraph_node *node)
 {
   ipa_ref *ref;
 
-  if (!DECL_FUNCTION_VERSIONED (node->decl)
-      || !is_function_default_version (node->decl))
-    return;
-
   if (!targetm.has_ifunc_p ())
     {
       error_at (DECL_SOURCE_LOCATION (node->decl),
@@ -377,6 +373,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
       return false;
     }
 
+  const char *new_attr_name = targetm.target_option.expanded_clones_attribute;
   cgraph_function_version_info *decl1_v = NULL;
   cgraph_function_version_info *decl2_v = NULL;
   cgraph_function_version_info *before = NULL;
@@ -392,7 +389,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
       char *attr = attrs[i];
 
       /* Create new target clone.  */
-      tree attributes = make_attribute ("target", attr,
+      tree attributes = make_attribute (new_attr_name, attr,
 					DECL_ATTRIBUTES (node->decl));
 
       char *suffix = XNEWVEC (char, strlen (attr) + 1);
@@ -430,7 +427,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
   XDELETEVEC (attr_str);
 
   /* Setting new attribute to initial function.  */
-  tree attributes = make_attribute ("target", "default",
+  tree attributes = make_attribute (new_attr_name, "default",
 				    DECL_ATTRIBUTES (node->decl));
   DECL_ATTRIBUTES (node->decl) = attributes;
   node->local = false;
diff --git a/gcc/target.def b/gcc/target.def
index 0996da0f71a85f8217a41ceb08de8b21087e4ed9..1d2e0d8bf03a8b949ec636e6a78a111308d3dd71 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6533,6 +6533,31 @@ the function declaration to hold a pointer to a target-specific\n\
  bool, (tree fndecl, tree name, tree args, int flags),
  default_target_option_valid_attribute_p)
 
+/* Function to validate the attribute((target_version(...))) strings.  If
+   the option is validated, the hook should also fill in
+   DECL_FUNCTION_SPECIFIC_TARGET in the function decl node.  */
+DEFHOOK
+(valid_version_attribute_p,
+ "This hook is called to parse @code{attribute(target_version(\"...\"))},\n\
+which allows setting target-specific options on individual function versions.\n\
+These function-specific options may differ\n\
+from the options specified on the command line.  The hook should return\n\
+@code{true} if the options are valid.\n\
+\n\
+The hook should set the @code{DECL_FUNCTION_SPECIFIC_TARGET} field in\n\
+the function declaration to hold a pointer to a target-specific\n\
+@code{struct cl_target_option} structure.",
+ bool, (tree fndecl, tree name, tree args, int flags),
+ default_target_option_valid_version_attribute_p)
+
+/* Attribute to be used when expanding clones for functions with
+   target_clones attribute.  */
+DEFHOOKPOD
+(expanded_clones_attribute,
+ "Contains the name of the attribute used for the version description string\n\
+when expanding clones for a function with the target_clones attribute.",
+ const char *, "target")
+
 /* Function to save any extra target state in the target options structure.  */
 DEFHOOK
 (save,
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 189549cb1c742c37c17623141989b492a7c2b2f8..ff2957fd9fd8389e23992281b35e8e5467072f7d 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -192,6 +192,7 @@ extern bool default_hard_regno_scratch_ok (unsigned int);
 extern bool default_mode_dependent_address_p (const_rtx, addr_space_t);
 extern bool default_new_address_profitable_p (rtx, rtx_insn *, rtx);
 extern bool default_target_option_valid_attribute_p (tree, tree, tree, int);
+extern bool default_target_option_valid_version_attribute_p (tree, tree, tree, int);
 extern bool default_target_option_pragma_parse (tree, tree);
 extern bool default_target_can_inline_p (tree, tree);
 extern bool default_update_ipa_fn_target_info (unsigned int &, const gimple *);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 4f5b240f8d65eeeaf73418c9f1e2c2684b257cfa..b693352d7eae555912477b6e431dd9c016105007 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1789,7 +1789,19 @@ default_target_option_valid_attribute_p (tree ARG_UNUSED (fndecl),
 					 int ARG_UNUSED (flags))
 {
   warning (OPT_Wattributes,
-	   "target attribute is not supported on this machine");
+	   "%<target%> attribute is not supported on this machine");
+
+  return false;
+}
+
+bool
+default_target_option_valid_version_attribute_p (tree ARG_UNUSED (fndecl),
+						 tree ARG_UNUSED (name),
+						 tree ARG_UNUSED (args),
+						 int ARG_UNUSED (flags))
+{
+  warning (OPT_Wattributes,
+	   "%<target_version%> attribute is not supported on this machine");
 
   return false;
 }
diff --git a/gcc/tree.h b/gcc/tree.h
index 086b55f0375435d53a1604b6659da4f19fce3d17..d7841af19b20b0dc0ae28b433d5150e9c4763eff 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3500,8 +3500,8 @@ extern vec<tree, va_gc> **decl_debug_args_insert (tree);
    (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
 
 /* In FUNCTION_DECL, this is set if this function has other versions generated
-   using "target" attributes.  The default version is the one which does not
-   have any "target" attribute set. */
+   to support different architecture feature sets, e.g. using "target" or
+   "target_version" attributes.  */
 #define DECL_FUNCTION_VERSIONED(NODE)\
    (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 5/5] aarch64: Add function multiversioning support
  2023-11-17  2:49 [PATCH v2 0/5] target_version and aarch64 function multiversioning Andrew Carlotti
                   ` (3 preceding siblings ...)
  2023-11-17  2:55 ` [PATCH v2 4/5] Add support for target_version attribute Andrew Carlotti
@ 2023-11-17  2:56 ` Andrew Carlotti
  2023-11-24 16:22   ` Richard Sandiford
  4 siblings, 1 reply; 16+ messages in thread
From: Andrew Carlotti @ 2023-11-17  2:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, richard.sandiford, richard.earnshaw

This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes.  This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).

Existing function multiversioning implementations are broken in various
ways when used across translation units.  This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification.  It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.

The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions.  I intend to resolve some or all of these inconsistencies at
a later stage.

The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute.  On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.

This patch also does not support the following aspects of the Beta
specification:

- The target_clones attribute should allow an implicit unlisted
  "default" version.
- There should be an option to disable function multiversioning at
  compile time.
- Unrecognised target names in a target_clones attribute should be
  ignored (with an optional warning).  This current patch raises an
  error instead.

[1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

---

I believe the support present in this patch correctly handles function
multiversioning within a single translation unit for all features in the ACLE
specification with option extension support.

Is it ok to push this patch in its current state? I'd then continue working on
incremental improvements to the supported feature extensions and the ABI issues
in followup patches, in along with corresponding changes and improvements to
the ACLE specification.


gcc/ChangeLog:

	* config/aarch64/aarch64-feature-deps.h (fmv_deps_<FEAT_NAME>):
	Define aarch64_feature_flags mask foreach FMV feature.
	* config/aarch64/aarch64-option-extensions.def: Use new macros
	to define FMV feature extensions.
	* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
	Check for target_version attribute after processing target
	attribute.
	(aarch64_fmv_feature_data): New.
	(aarch64_parse_fmv_features): New.
	(aarch64_process_target_version_attr): New.
	(aarch64_option_valid_version_attribute_p): New.
	(get_feature_mask_for_version): New.
	(compare_feature_masks): New.
	(aarch64_compare_version_priority): New.
	(build_ifunc_arg_type): New.
	(make_resolver_func): New.
	(add_condition_to_bb): New.
	(compare_feature_version_info): New.
	(dispatch_function_versions): New.
	(aarch64_generate_version_dispatcher_body): New.
	(aarch64_get_function_versions_dispatcher): New.
	(aarch64_common_function_versions): New.
	(aarch64_mangle_decl_assembler_name): New.
	(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
	(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
	(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
	(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
	* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
	  new value to report duplicate FMV feature.
	* common/config/aarch64/cpuinfo.h: New file.

libgcc/ChangeLog:

	* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
	  copy in gcc/common

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
	* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
	* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.


diff --git a/gcc/common/config/aarch64/cpuinfo.h b/gcc/common/config/aarch64/cpuinfo.h
new file mode 100644
index 0000000000000000000000000000000000000000..1690b6eee48e960d0ae675f8e8b05e6f182b56a3
--- /dev/null
+++ b/gcc/common/config/aarch64/cpuinfo.h
@@ -0,0 +1,94 @@
+/* CPU feature detection for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This enum is used in libgcc feature detection, and in the function
+   multiversioning implementation in aarch64.cc.  The enum should use the same
+   values as the corresponding enum in LLVM's compiler-rt, to faciliate
+   compatibility between compilers.  */
+
+enum CPUFeatures {
+  FEAT_RNG,
+  FEAT_FLAGM,
+  FEAT_FLAGM2,
+  FEAT_FP16FML,
+  FEAT_DOTPROD,
+  FEAT_SM4,
+  FEAT_RDM,
+  FEAT_LSE,
+  FEAT_FP,
+  FEAT_SIMD,
+  FEAT_CRC,
+  FEAT_SHA1,
+  FEAT_SHA2,
+  FEAT_SHA3,
+  FEAT_AES,
+  FEAT_PMULL,
+  FEAT_FP16,
+  FEAT_DIT,
+  FEAT_DPB,
+  FEAT_DPB2,
+  FEAT_JSCVT,
+  FEAT_FCMA,
+  FEAT_RCPC,
+  FEAT_RCPC2,
+  FEAT_FRINTTS,
+  FEAT_DGH,
+  FEAT_I8MM,
+  FEAT_BF16,
+  FEAT_EBF16,
+  FEAT_RPRES,
+  FEAT_SVE,
+  FEAT_SVE_BF16,
+  FEAT_SVE_EBF16,
+  FEAT_SVE_I8MM,
+  FEAT_SVE_F32MM,
+  FEAT_SVE_F64MM,
+  FEAT_SVE2,
+  FEAT_SVE_AES,
+  FEAT_SVE_PMULL128,
+  FEAT_SVE_BITPERM,
+  FEAT_SVE_SHA3,
+  FEAT_SVE_SM4,
+  FEAT_SME,
+  FEAT_MEMTAG,
+  FEAT_MEMTAG2,
+  FEAT_MEMTAG3,
+  FEAT_SB,
+  FEAT_PREDRES,
+  FEAT_SSBS,
+  FEAT_SSBS2,
+  FEAT_BTI,
+  FEAT_LS64,
+  FEAT_LS64_V,
+  FEAT_LS64_ACCDATA,
+  FEAT_WFXT,
+  FEAT_SME_F64,
+  FEAT_SME_I64,
+  FEAT_SME2,
+  FEAT_RCPC3,
+  FEAT_MAX,
+  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
+		    in __aarch64_cpu_features.  */
+  FEAT_INIT      /* Used as flag of features initialization completion.  */
+};
diff --git a/gcc/config/aarch64/aarch64-feature-deps.h b/gcc/config/aarch64/aarch64-feature-deps.h
index 7b85a8860de57f6727644c03296cef192ad0990c..8f20582e1efdd4817138480bee8cdb27fa7f3dfe 100644
--- a/gcc/config/aarch64/aarch64-feature-deps.h
+++ b/gcc/config/aarch64/aarch64-feature-deps.h
@@ -115,6 +115,13 @@ get_flags_off (aarch64_feature_flags mask)
   constexpr auto cpu_##CORE_IDENT = ARCH_IDENT ().enable | get_enable FEATURES;
 #include "config/aarch64/aarch64-cores.def"
 
+/* Define fmv_deps_<NAME> variables for each FMV feature, giving the transitive
+   closure of all the features that the FMV feature enables.  */
+#define AARCH64_FMV_FEATURE(A, FEAT_NAME, OPT_FLAGS) \
+  constexpr auto fmv_deps_##FEAT_NAME = get_enable OPT_FLAGS;
+#include "config/aarch64/aarch64-option-extensions.def"
+
+
 }
 }
 
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 825f3bf775899e2e5cffb1867b82766d632c8708..07df403491494d6dfe19095872ab32b9d60e9690 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -17,17 +17,22 @@
    along with GCC; see the file COPYING3.  If not see
    <http://www.gnu.org/licenses/>.  */
 
-/* This is a list of ISA extentsions in AArch64.
+/* This is a list of ISA extensions in AArch64.
 
-   Before using #include to read this file, define a macro:
+   Before using #include to read this file, define one of the following
+   macros:
 
       AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,
 			    EXPLICIT_OFF, FEATURE_STRING)
 
+      AARCH64_FMV_FEATURE(NAME, FEAT_NAME, IDENT)
+
    - NAME is the name of the extension, represented as a string constant.
 
    - IDENT is the canonical internal name for this flag.
 
+   - FEAT_NAME is the unprefixed name used in the CPUFeatures enum.
+
    - REQUIRES is a list of features that must be enabled whenever this
      feature is enabled.  The relationship is implicitly transitive:
      if A appears in B's REQUIRES and B appears in C's REQUIRES then
@@ -58,45 +63,96 @@
      that are required.  Their order is not important.  An empty string means
      do not detect this feature during auto detection.
 
-   The list of features must follow topological order wrt REQUIRES
-   and EXPLICIT_ON.  For example, if A is in B's REQUIRES list, A must
-   come before B.  This is enforced by aarch64-feature-deps.h.
+   - OPT_FLAGS is a list of feature IDENTS that should be enabled (along with
+     their transitive dependencies) when the specified FMV feature is present.
+
+   Where a feature is present as both an extension and a function
+   multiversioning feature, and IDENT matches the FEAT_NAME suffix, then these
+   can be listed here simultaneously using the macro:
+
+      AARCH64_OPT_FMV_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,
+				EXPLICIT_OFF, FEATURE_STRING)
+
+   The list of features extensions must follow topological order wrt REQUIRES
+   and EXPLICIT_ON.  For example, if A is in B's REQUIRES list, A must come
+   before B.  This is enforced by aarch64-feature-deps.h.
+
+   The list of multiversioning features must be ordered by increasing priority,
+   as defined in https://github.com/ARM-software/acle/blob/main/main/acle.md
 
    NOTE: Any changes to the AARCH64_OPT_EXTENSION macro need to be mirrored in
    config.gcc.  */
 
+#ifndef AARCH64_OPT_EXTENSION
+#define AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON, \
+			      EXPLICIT_OFF, FEATURE_STRING)
+#endif
+
+#ifndef AARCH64_FMV_FEATURE
+#define AARCH64_FMV_FEATURE(NAME, FEAT_NAME, OPT_FLAGS)
+#endif
+
+#define AARCH64_OPT_FMV_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,   \
+				  EXPLICIT_OFF, FEATURE_STRING)		\
+AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON, EXPLICIT_OFF,	\
+		      FEATURE_STRING)					\
+AARCH64_FMV_FEATURE(NAME, IDENT, (IDENT))
+
+
 AARCH64_OPT_EXTENSION("fp", FP, (), (), (), "fp")
 
 AARCH64_OPT_EXTENSION("simd", SIMD, (FP), (), (), "asimd")
 
-AARCH64_OPT_EXTENSION("crc", CRC, (), (), (), "crc32")
+AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng")
 
-AARCH64_OPT_EXTENSION("lse", LSE, (), (), (), "atomics")
+AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
 
-/* +nofp16 disables an implicit F16FML, even though an implicit F16FML
-   does not imply F16.  See F16FML for more details.  */
-AARCH64_OPT_EXTENSION("fp16", F16, (FP), (), (F16FML), "fphp asimdhp")
+AARCH64_FMV_FEATURE("flagm2", FLAGM2, (FLAGM))
+
+AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
+
+AARCH64_OPT_FMV_EXTENSION("dotprod", DOTPROD, (SIMD), (), (), "asimddp")
 
-AARCH64_OPT_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
+AARCH64_OPT_FMV_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
 
 /* An explicit +rdma implies +simd, but +rdma+nosimd still enables scalar
    RDMA instructions.  */
 AARCH64_OPT_EXTENSION("rdma", RDMA, (), (SIMD), (), "asimdrdm")
 
-AARCH64_OPT_EXTENSION("dotprod", DOTPROD, (SIMD), (), (), "asimddp")
+AARCH64_FMV_FEATURE("rmd", RDM, (RDMA))
+
+AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
+
+AARCH64_FMV_FEATURE("fp", FP, (FP))
+
+AARCH64_FMV_FEATURE("simd", SIMD, (SIMD))
+
+AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
 
-AARCH64_OPT_EXTENSION("aes", AES, (SIMD), (), (), "aes")
+AARCH64_FMV_FEATURE("sha1", SHA1, ())
 
-AARCH64_OPT_EXTENSION("sha2", SHA2, (SIMD), (), (), "sha1 sha2")
+AARCH64_OPT_FMV_EXTENSION("sha2", SHA2, (SIMD), (), (), "sha1 sha2")
+
+AARCH64_FMV_FEATURE("sha3", SHA3, (SHA3))
+
+AARCH64_OPT_FMV_EXTENSION("aes", AES, (SIMD), (), (), "aes")
+
+AARCH64_FMV_FEATURE("pmull", PMULL, ())
 
 /* +nocrypto disables AES, SHA2 and SM4, and anything that depends on them
    (such as SHA3 and the SVE2 crypto extensions).  */
 AARCH64_OPT_EXTENSION("crypto", CRYPTO, (AES, SHA2), (), (AES, SHA2, SM4),
 		      "aes pmull sha1 sha2")
 
+/* Listing sha3 after crypto means we pass "+aes+sha3" to the assembler
+   instead of "+sha3+crypto".  */
 AARCH64_OPT_EXTENSION("sha3", SHA3, (SHA2), (), (), "sha3 sha512")
 
-AARCH64_OPT_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
+/* +nofp16 disables an implicit F16FML, even though an implicit F16FML
+   does not imply F16.  See F16FML for more details.  */
+AARCH64_OPT_EXTENSION("fp16", F16, (FP), (), (F16FML), "fphp asimdhp")
+
+AARCH64_FMV_FEATURE("fp16", FP16, (F16))
 
 /* An explicit +fp16fml implies +fp16, but a dependence on it does not.
    Thus -march=armv8.4-a implies F16FML but not F16.  -march=armv8.4-a+fp16
@@ -104,51 +160,117 @@ AARCH64_OPT_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
    -march=armv8.4-a+nofp16+fp16 enables F16 but not F16FML.  */
 AARCH64_OPT_EXTENSION("fp16fml", F16FML, (), (F16), (), "asimdfhm")
 
-AARCH64_OPT_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
+AARCH64_FMV_FEATURE("dit", DIT, ())
 
-AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
+AARCH64_FMV_FEATURE("dpb", DPB, ())
 
-AARCH64_OPT_EXTENSION("rng", RNG, (), (), (), "rng")
+AARCH64_FMV_FEATURE("dpb2", DPB2, ())
 
-AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
+AARCH64_FMV_FEATURE("jscvt", JSCVT, ())
 
-AARCH64_OPT_EXTENSION("sb", SB, (), (), (), "sb")
+AARCH64_FMV_FEATURE("fcma", FCMA, (SIMD))
 
-AARCH64_OPT_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
+AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
 
-AARCH64_OPT_EXTENSION("predres", PREDRES, (), (), (), "")
+AARCH64_FMV_FEATURE("rcpc2", RCPC2, (RCPC))
 
-AARCH64_OPT_EXTENSION("sve2", SVE2, (SVE), (), (), "sve2")
+AARCH64_FMV_FEATURE("rcpc3", RCPC3, (RCPC))
 
-AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), (), (), "svesm4")
+AARCH64_FMV_FEATURE("frintts", FRINTTS, ())
+
+AARCH64_FMV_FEATURE("dgh", DGH, ())
+
+AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
+
+/* An explicit +bf16 implies +simd, but +bf16+nosimd still enables scalar BF16
+   instructions.  */
+AARCH64_OPT_FMV_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
+
+AARCH64_FMV_FEATURE("ebf16", EBF16, (BF16))
+
+AARCH64_FMV_FEATURE("rpres", RPRES, ())
+
+AARCH64_OPT_FMV_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
+
+AARCH64_FMV_FEATURE("sve-bf16", SVE_BF16, (SVE, BF16))
+
+AARCH64_FMV_FEATURE("sve-ebf16", SVE_EBF16, (SVE, BF16))
+
+AARCH64_FMV_FEATURE("sve-i8mm", SVE_I8MM, (SVE, I8MM))
+
+AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
+
+AARCH64_FMV_FEATURE("f32mm", SVE_F32MM, (F32MM))
+
+AARCH64_OPT_EXTENSION("f64mm", F64MM, (SVE), (), (), "f64mm")
+
+AARCH64_FMV_FEATURE("f64mm", SVE_F64MM, (F64MM))
+
+AARCH64_OPT_FMV_EXTENSION("sve2", SVE2, (SVE), (), (), "sve2")
 
 AARCH64_OPT_EXTENSION("sve2-aes", SVE2_AES, (SVE2, AES), (), (), "sveaes")
 
-AARCH64_OPT_EXTENSION("sve2-sha3", SVE2_SHA3, (SVE2, SHA3), (), (), "svesha3")
+AARCH64_FMV_FEATURE("sve2-aes", SVE_AES, (SVE2, AES))
+
+AARCH64_FMV_FEATURE("sve2-pmull128", SVE_PMULL128, (SVE2))
 
 AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
 		      "svebitperm")
 
-AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
+AARCH64_FMV_FEATURE("sve2-bitperm", SVE_BITPERM, (SVE2_BITPERM))
 
-AARCH64_OPT_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
+AARCH64_OPT_EXTENSION("sve2-sha3", SVE2_SHA3, (SVE2, SHA3), (), (), "svesha3")
 
-AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
+AARCH64_FMV_FEATURE("sve2-sha3", SVE_SHA3, (SVE2_SHA3))
 
-AARCH64_OPT_EXTENSION("f64mm", F64MM, (SVE), (), (), "f64mm")
+AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), (), (), "svesm4")
 
-/* An explicit +bf16 implies +simd, but +bf16+nosimd still enables scalar BF16
-   instructions.  */
-AARCH64_OPT_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
+AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4))
+
+AARCH64_FMV_FEATURE("sme", SME, ())
 
-AARCH64_OPT_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
+AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
+
+AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
+
+AARCH64_FMV_FEATURE("memtag3", MEMTAG3, (MEMTAG))
+
+AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
+
+AARCH64_OPT_FMV_EXTENSION("predres", PREDRES, (), (), (), "")
+
+AARCH64_OPT_FMV_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
+
+AARCH64_FMV_FEATURE("ssbs2", SSBS2, (SSBS))
+
+AARCH64_FMV_FEATURE("bti", BTI, ())
+
+AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
+
+AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("pauth", PAUTH, (), (), (), "paca pacg")
 
 AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "")
 
+AARCH64_FMV_FEATURE("ls64", LS64, ())
+
+AARCH64_FMV_FEATURE("ls64_v", LS64_V, ())
+
+AARCH64_FMV_FEATURE("ls64_accdata", LS64_ACCDATA, (LS64))
+
+AARCH64_FMV_FEATURE("wfxt", WFXT, ())
+
+AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, ())
+
+AARCH64_FMV_FEATURE("sme-i64i64", SME_I64, ())
+
+AARCH64_FMV_FEATURE("sme2", SME2, ())
+
 AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+#undef AARCH64_OPT_FMV_EXTENSION
 #undef AARCH64_OPT_EXTENSION
+#undef AARCH64_FMV_FEATURE
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 800a8b0e11005416fb4e4b1222717629b16f3745..8721c0a923c53af2c2413ed90ccb05fa698c1f85 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -84,6 +84,7 @@
 #include "aarch64-feature-deps.h"
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
+#include "common/config/aarch64/cpuinfo.h"
 #include "ssa.h"
 
 /* This file should be included last.  */
@@ -19525,6 +19526,8 @@ aarch64_process_target_attr (tree args)
   return true;
 }
 
+static bool aarch64_process_target_version_attr (tree args);
+
 /* Implement TARGET_OPTION_VALID_ATTRIBUTE_P.  This is used to
    process attribute ((target ("..."))).  */
 
@@ -19580,6 +19583,19 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, tree args, int)
 			      TREE_TARGET_OPTION (target_option_current_node));
 
   ret = aarch64_process_target_attr (args);
+  if (ret)
+    {
+      tree version_attr = lookup_attribute ("target_version",
+					    DECL_ATTRIBUTES (fndecl));
+      if (version_attr != NULL_TREE)
+	{
+	  /* Reapply any target_version attribute after target attribute.
+	     This should be equivalent to applying the target_version once
+	     after processing all target attributes.  */
+	  tree version_args = TREE_VALUE (version_attr);
+	  ret = aarch64_process_target_version_attr (version_args);
+	}
+    }
 
   /* Set up any additional state.  */
   if (ret)
@@ -19610,6 +19626,821 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, tree args, int)
   return ret;
 }
 
+typedef unsigned long long aarch64_fmv_feature_mask;
+
+typedef struct
+{
+  const char *name;
+  aarch64_fmv_feature_mask feature_mask;
+  aarch64_feature_flags opt_flags;
+} aarch64_fmv_feature_datum;
+
+#define AARCH64_FMV_FEATURE(NAME, FEAT_NAME, C) \
+  {NAME, 1ULL << FEAT_##FEAT_NAME, ::feature_deps::fmv_deps_##FEAT_NAME},
+
+/* FMV features are listed in priority order, to make it easier to sort target
+   strings.  */
+static aarch64_fmv_feature_datum aarch64_fmv_feature_data[] = {
+#include "config/aarch64/aarch64-option-extensions.def"
+};
+
+
+/* Parse a non-default fmv feature string, as found in a target_version or
+   target_clones attribute.  */
+
+static enum aarch_parse_opt_result
+aarch64_parse_fmv_features (const char *str, aarch64_feature_flags *isa_flags,
+			    aarch64_fmv_feature_mask *feature_mask,
+			    std::string *invalid_extension)
+{
+  if (strcmp (str, "default") == 0)
+    return AARCH_PARSE_OK;
+
+  while (str != NULL && *str != 0)
+    {
+      const char *ext;
+      size_t len;
+
+      ext = strchr (str, '+');
+
+      if (ext != NULL)
+	len = ext - str;
+      else
+	len = strlen (str);
+
+      if (len == 0)
+	return AARCH_PARSE_MISSING_ARG;
+
+      static const int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
+      int i;
+      for (i = 0; i < num_features; i++)
+	{
+	  if (strlen (aarch64_fmv_feature_data[i].name) == len
+	      && strncmp (aarch64_fmv_feature_data[i].name, str, len) == 0)
+	    {
+	      if (isa_flags)
+		*isa_flags |= aarch64_fmv_feature_data[i].opt_flags;
+	      if (feature_mask)
+		{
+		  auto old_feature_mask = *feature_mask;
+		  *feature_mask |= aarch64_fmv_feature_data[i].feature_mask;
+		  if (*feature_mask == old_feature_mask)
+		    {
+		      /* Duplicate feature.  */
+		      if (invalid_extension)
+			*invalid_extension = std::string (str, len);
+		      return AARCH_PARSE_DUPLICATE_FEATURE;
+		    }
+		}
+	      break;
+	    }
+	}
+
+      if (i == num_features)
+	{
+	  /* Feature not found in list.  */
+	  if (invalid_extension)
+	    *invalid_extension = std::string (str, len);
+	  return AARCH_PARSE_INVALID_FEATURE;
+	}
+
+      str = ext;
+    }
+
+  return AARCH_PARSE_OK;
+}
+
+/* Parse the tree in ARGS that contains the target_version attribute
+   information and update the global target options space.  */
+
+static bool
+aarch64_process_target_version_attr (tree args)
+{
+  if (TREE_CODE (args) == TREE_LIST)
+    {
+      if (TREE_CHAIN (args))
+	{
+	  error ("attribute %<target_version%> has multiple values");
+	  return false;
+	}
+      args = TREE_VALUE (args);
+    }
+
+  if (!args || TREE_CODE (args) != STRING_CST)
+    {
+      error ("attribute %<target_version%> argument not a string");
+      return false;
+    }
+
+  const char *str = TREE_STRING_POINTER (args);
+
+  enum aarch_parse_opt_result parse_res;
+  auto isa_flags = aarch64_asm_isa_flags;
+
+
+  std::string invalid_extension;
+  parse_res = aarch64_parse_fmv_features (str, &isa_flags, NULL,
+					  &invalid_extension);
+
+  if (parse_res == AARCH_PARSE_OK)
+    {
+      aarch64_set_asm_isa_flags (isa_flags);
+      return true;
+    }
+
+  switch (parse_res)
+    {
+      case AARCH_PARSE_MISSING_ARG:
+	error ("missing value in %<target_version%> attribute");
+	break;
+
+      case AARCH_PARSE_INVALID_FEATURE:
+	error ("invalid feature modifier %qs of value %qs in "
+	       "%<target_version%> attribute", invalid_extension.c_str (),
+	       str);
+	break;
+
+      case AARCH_PARSE_DUPLICATE_FEATURE:
+	error ("duplicate feature modifier %qs of value %qs in "
+	       "%<target_version%> attribute", invalid_extension.c_str (),
+	       str);
+	break;
+
+      default:
+	gcc_unreachable ();
+    }
+
+  return false;
+}
+
+/* Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P.  This is used to
+   process attribute ((target ("..."))).  */
+
+static bool
+aarch64_option_valid_version_attribute_p (tree fndecl, tree, tree args, int)
+{
+  struct cl_target_option cur_target;
+  bool ret;
+  tree new_target;
+  tree existing_target = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
+
+  /* Save the current target options to restore at the end.  */
+  cl_target_option_save (&cur_target, &global_options, &global_options_set);
+
+  /* If fndecl already has some target attributes applied to it, unpack
+     them so that we add this attribute on top of them, rather than
+     overwriting them.  */
+  if (existing_target)
+    {
+      struct cl_target_option *existing_options
+	= TREE_TARGET_OPTION (existing_target);
+
+      if (existing_options)
+	cl_target_option_restore (&global_options, &global_options_set,
+				  existing_options);
+    }
+  else
+    cl_target_option_restore (&global_options, &global_options_set,
+			      TREE_TARGET_OPTION (target_option_current_node));
+
+  ret = aarch64_process_target_version_attr (args);
+
+  /* Set up any additional state.  */
+  if (ret)
+    {
+      aarch64_override_options_internal (&global_options);
+      new_target = build_target_option_node (&global_options,
+					     &global_options_set);
+    }
+  else
+    new_target = NULL;
+
+  if (fndecl && ret)
+    {
+      DECL_FUNCTION_SPECIFIC_TARGET (fndecl) = new_target;
+    }
+
+  cl_target_option_restore (&global_options, &global_options_set, &cur_target);
+
+  return ret;
+}
+
+/* This parses the attribute arguments to target_version in DECL and the
+   feature mask required to select those targets.  No adjustments are made to
+   add or remove redundant feature requirements.  */
+
+static aarch64_fmv_feature_mask
+get_feature_mask_for_version (tree decl)
+{
+  tree version_attr = lookup_attribute ("target_version",
+					DECL_ATTRIBUTES (decl));
+  if (version_attr == NULL)
+    return 0;
+
+  const char *version_string = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE
+						    (version_attr)));
+  enum aarch_parse_opt_result parse_res;
+  aarch64_fmv_feature_mask feature_mask = 0ULL;
+
+  parse_res = aarch64_parse_fmv_features (version_string, NULL, &feature_mask,
+					  NULL);
+
+  /* We should have detected any errors before getting here.  */
+  gcc_assert (parse_res == AARCH_PARSE_OK);
+
+  return feature_mask;
+}
+
+/* Compare priorities of two feature masks. Return:
+     1: mask1 is higher priority
+    -1: mask2 is higher priority
+     0: masks are equal.  */
+
+static int
+compare_feature_masks (aarch64_fmv_feature_mask mask1,
+		       aarch64_fmv_feature_mask mask2)
+{
+  int pop1 = popcount_hwi(mask1);
+  int pop2 = popcount_hwi(mask2);
+  if (pop1 > pop2)
+    return 1;
+  if (pop2 > pop1)
+    return -1;
+
+  auto diff_mask = mask1 ^ mask2;
+  if (diff_mask == 0ULL)
+    return 0;
+  for (int i = FEAT_MAX - 1; i > 0; i--)
+    {
+      auto bit_mask = aarch64_fmv_feature_data[i].feature_mask;
+      if (diff_mask & bit_mask)
+	return (mask1 & bit_mask) ? 1 : -1;
+    }
+  gcc_unreachable();
+}
+
+int
+aarch64_compare_version_priority (tree decl1, tree decl2)
+{
+  auto mask1 = get_feature_mask_for_version (decl1);
+  auto mask2 = get_feature_mask_for_version (decl2);
+
+  return compare_feature_masks (mask1, mask2);
+}
+
+/* Build the struct __ifunc_arg_t type:
+
+   struct __ifunc_arg_t
+   {
+     unsigned long _size; // Size of the struct, so it can grow.
+     unsigned long _hwcap;
+     unsigned long _hwcap2;
+   }
+ */
+
+static tree
+build_ifunc_arg_type ()
+{
+  tree ifunc_arg_type = lang_hooks.types.make_type (RECORD_TYPE);
+  tree field1 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			    get_identifier ("_size"),
+			    long_unsigned_type_node);
+  tree field2 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			    get_identifier ("_hwcap"),
+			    long_unsigned_type_node);
+  tree field3 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			    get_identifier ("_hwcap2"),
+			    long_unsigned_type_node);
+
+  DECL_FIELD_CONTEXT (field1) = ifunc_arg_type;
+  DECL_FIELD_CONTEXT (field2) = ifunc_arg_type;
+  DECL_FIELD_CONTEXT (field3) = ifunc_arg_type;
+
+  TYPE_FIELDS (ifunc_arg_type) = field1;
+  DECL_CHAIN (field1) = field2;
+  DECL_CHAIN (field2) = field3;
+
+  layout_type (ifunc_arg_type);
+
+  tree const_type = build_qualified_type (ifunc_arg_type, TYPE_QUAL_CONST);
+  tree pointer_type = build_pointer_type (const_type);
+
+  return pointer_type;
+}
+
+/* Make the resolver function decl to dispatch the versions of
+   a multi-versioned function,  DEFAULT_DECL.  IFUNC_ALIAS_DECL is
+   ifunc alias that will point to the created resolver.  Create an
+   empty basic block in the resolver and store the pointer in
+   EMPTY_BB.  Return the decl of the resolver function.  */
+
+static tree
+make_resolver_func (const tree default_decl,
+		    const tree ifunc_alias_decl,
+		    basic_block *empty_bb)
+{
+  tree decl, type, t;
+
+  /* Create resolver function name based on default_decl.  */
+  tree decl_name = clone_function_name (default_decl, "resolver");
+  const char *resolver_name = IDENTIFIER_POINTER (decl_name);
+
+  /* The resolver function should have signature
+     (void *) resolver (uint64_t, const __ifunc_arg_t *) */
+  type = build_function_type_list (ptr_type_node,
+				   uint64_type_node,
+				   build_ifunc_arg_type(),
+				   NULL_TREE);
+
+  decl = build_fn_decl (resolver_name, type);
+  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
+
+  DECL_NAME (decl) = decl_name;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 1;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+
+  /* Resolver is not external, body is generated.  */
+  DECL_EXTERNAL (decl) = 0;
+  DECL_EXTERNAL (ifunc_alias_decl) = 0;
+
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  DECL_STATIC_CONSTRUCTOR (decl) = 0;
+
+  if (DECL_COMDAT_GROUP (default_decl)
+      || TREE_PUBLIC (default_decl))
+    {
+      /* In this case, each translation unit with a call to this
+	 versioned function will put out a resolver.  Ensure it
+	 is comdat to keep just one copy.  */
+      DECL_COMDAT (decl) = 1;
+      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+    }
+  else
+    TREE_PUBLIC (ifunc_alias_decl) = 0;
+
+  /* Build result decl and add to function_decl. */
+  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
+  DECL_CONTEXT (t) = decl;
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_RESULT (decl) = t;
+
+  /* Build parameter decls and add to function_decl. */
+  tree arg1 = build_decl (UNKNOWN_LOCATION, PARM_DECL,
+			  get_identifier ("hwcap"),
+			  uint64_type_node);
+  tree arg2 = build_decl (UNKNOWN_LOCATION, PARM_DECL,
+			  get_identifier ("arg"),
+			  build_ifunc_arg_type());
+  DECL_CONTEXT (arg1) = decl;
+  DECL_CONTEXT (arg2) = decl;
+  DECL_ARTIFICIAL (arg1) = 1;
+  DECL_ARTIFICIAL (arg2) = 1;
+  DECL_IGNORED_P (arg1) = 1;
+  DECL_IGNORED_P (arg2) = 1;
+  DECL_ARG_TYPE (arg1) = uint64_type_node;
+  DECL_ARG_TYPE (arg2) = build_ifunc_arg_type();
+  DECL_ARGUMENTS (decl) = arg1;
+  TREE_CHAIN (arg1) = arg2;
+
+  gimplify_function_tree (decl);
+  push_cfun (DECL_STRUCT_FUNCTION (decl));
+  *empty_bb = init_lowered_empty_function (decl, false,
+					   profile_count::uninitialized ());
+
+  cgraph_node::add_new_function (decl, true);
+  symtab->call_cgraph_insertion_hooks (cgraph_node::get_create (decl));
+
+  pop_cfun ();
+
+  gcc_assert (ifunc_alias_decl != NULL);
+  /* Mark ifunc_alias_decl as "ifunc" with resolver as resolver_name.  */
+  DECL_ATTRIBUTES (ifunc_alias_decl)
+    = make_attribute ("ifunc", resolver_name,
+		      DECL_ATTRIBUTES (ifunc_alias_decl));
+
+  /* Create the alias for dispatch to resolver here.  */
+  cgraph_node::create_same_body_alias (ifunc_alias_decl, decl);
+  return decl;
+}
+
+/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
+   to return a pointer to VERSION_DECL if all feature bits specified in
+   FEATURE_MASK are not set in MASK_VAR.  This function will be called during
+   version dispatch to decide which function version to execute.  It returns
+   the basic block at the end, to which more conditions can be added.  */
+static basic_block
+add_condition_to_bb (tree function_decl, tree version_decl,
+		     aarch64_fmv_feature_mask feature_mask,
+		     tree mask_var, basic_block new_bb)
+{
+  gimple *return_stmt;
+  tree convert_expr, result_var;
+  gimple *convert_stmt;
+  gimple *if_else_stmt;
+
+  basic_block bb1, bb2, bb3;
+  edge e12, e23;
+
+  gimple_seq gseq;
+
+  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
+
+  gcc_assert (new_bb != NULL);
+  gseq = bb_seq (new_bb);
+
+
+  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
+			 build_fold_addr_expr (version_decl));
+  result_var = create_tmp_var (ptr_type_node);
+  convert_stmt = gimple_build_assign (result_var, convert_expr);
+  return_stmt = gimple_build_return (result_var);
+
+
+  if (feature_mask == 0ULL)
+    {
+      /* Default version.  */
+      gimple_seq_add_stmt (&gseq, convert_stmt);
+      gimple_seq_add_stmt (&gseq, return_stmt);
+      set_bb_seq (new_bb, gseq);
+      gimple_set_bb (convert_stmt, new_bb);
+      gimple_set_bb (return_stmt, new_bb);
+      pop_cfun ();
+      return new_bb;
+    }
+
+  tree and_expr_var = create_tmp_var (long_long_unsigned_type_node);
+  tree and_expr = build2 (BIT_AND_EXPR,
+			  long_long_unsigned_type_node,
+			  mask_var,
+			  build_int_cst (long_long_unsigned_type_node,
+					 feature_mask));
+  gimple *and_stmt = gimple_build_assign (and_expr_var, and_expr);
+  gimple_set_block (and_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (and_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, and_stmt);
+
+  tree zero_llu = build_int_cst (long_long_unsigned_type_node, 0);
+  if_else_stmt = gimple_build_cond (EQ_EXPR, and_expr_var, zero_llu,
+				    NULL_TREE, NULL_TREE);
+  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
+  gimple_set_bb (if_else_stmt, new_bb);
+  gimple_seq_add_stmt (&gseq, if_else_stmt);
+
+  gimple_seq_add_stmt (&gseq, convert_stmt);
+  gimple_seq_add_stmt (&gseq, return_stmt);
+  set_bb_seq (new_bb, gseq);
+
+  bb1 = new_bb;
+  e12 = split_block (bb1, if_else_stmt);
+  bb2 = e12->dest;
+  e12->flags &= ~EDGE_FALLTHRU;
+  e12->flags |= EDGE_TRUE_VALUE;
+
+  e23 = split_block (bb2, return_stmt);
+
+  gimple_set_bb (convert_stmt, bb2);
+  gimple_set_bb (return_stmt, bb2);
+
+  bb3 = e23->dest;
+  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
+
+  remove_edge (e23);
+  make_edge (bb2, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
+
+  pop_cfun ();
+
+  return bb3;
+}
+
+/* Used when sorting the decls into dispatch order.  */
+static int compare_feature_version_info (const void *p1, const void *p2)
+{
+  struct _function_version_info
+    {
+      tree version_decl;
+      aarch64_fmv_feature_mask feature_mask;
+    };
+  const _function_version_info v1 = *(const _function_version_info *)p1;
+  const _function_version_info v2 = *(const _function_version_info *)p2;
+  return - compare_feature_masks (v1.feature_mask, v2.feature_mask);
+}
+
+static int
+dispatch_function_versions (tree dispatch_decl,
+			    void *fndecls_p,
+			    basic_block *empty_bb)
+{
+  gimple *ifunc_cpu_init_stmt;
+  gimple_seq gseq;
+  vec<tree> *fndecls;
+  unsigned int num_versions = 0;
+  unsigned int actual_versions = 0;
+  unsigned int i;
+
+  struct _function_version_info
+    {
+      tree version_decl;
+      aarch64_fmv_feature_mask feature_mask;
+    } *function_version_info;
+
+  gcc_assert (dispatch_decl != NULL
+	      && fndecls_p != NULL
+	      && empty_bb != NULL);
+
+  /*fndecls_p is actually a vector.  */
+  fndecls = static_cast<vec<tree> *> (fndecls_p);
+
+  /* At least one more version other than the default.  */
+  num_versions = fndecls->length ();
+  gcc_assert (num_versions >= 2);
+
+  function_version_info = (struct _function_version_info *)
+    XNEWVEC (struct _function_version_info, (num_versions));
+
+  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
+
+  gseq = bb_seq (*empty_bb);
+  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
+     constructors, so explicity call __init_cpu_features_resolver here.  */
+  tree init_fn_type = build_function_type_list (void_type_node,
+						long_unsigned_type_node,
+						build_ifunc_arg_type(),
+						NULL);
+  tree init_fn_id = get_identifier ("__init_cpu_features_resolver");
+  tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
+				  init_fn_id, init_fn_type);
+  tree arg1 = DECL_ARGUMENTS (dispatch_decl);
+  tree arg2 = TREE_CHAIN (arg1);
+  ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2);
+  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
+  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
+
+  /* Build the struct type for __aarch64_cpu_features.  */
+  tree global_type = lang_hooks.types.make_type (RECORD_TYPE);
+  tree field1 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
+			    get_identifier ("features"),
+			    long_long_unsigned_type_node);
+  DECL_FIELD_CONTEXT (field1) = global_type;
+  TYPE_FIELDS (global_type) = field1;
+  layout_type (global_type);
+
+  tree global_var = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+				get_identifier ("__aarch64_cpu_features"),
+				global_type);
+  DECL_EXTERNAL (global_var) = 1;
+  tree mask_var = create_tmp_var (long_long_unsigned_type_node);
+
+  tree component_expr = build3 (COMPONENT_REF, long_long_unsigned_type_node,
+				global_var, field1, NULL_TREE);
+  gimple *component_stmt = gimple_build_assign (mask_var, component_expr);
+  gimple_set_block (component_stmt, DECL_INITIAL (dispatch_decl));
+  gimple_set_bb (component_stmt, *empty_bb);
+  gimple_seq_add_stmt (&gseq, component_stmt);
+
+  tree not_expr = build1 (BIT_NOT_EXPR, long_long_unsigned_type_node, mask_var);
+  gimple *not_stmt = gimple_build_assign (mask_var, not_expr);
+  gimple_set_block (not_stmt, DECL_INITIAL (dispatch_decl));
+  gimple_set_bb (not_stmt, *empty_bb);
+  gimple_seq_add_stmt (&gseq, not_stmt);
+
+  set_bb_seq (*empty_bb, gseq);
+
+  pop_cfun ();
+
+  for (tree version_decl : *fndecls)
+    {
+      aarch64_fmv_feature_mask feature_mask;
+      /* Get attribute string, parse it and find the right features.  */
+      feature_mask = get_feature_mask_for_version (version_decl);
+      function_version_info [actual_versions].version_decl = version_decl;
+      function_version_info [actual_versions].feature_mask = feature_mask;
+      actual_versions++;
+    }
+
+  /* Sort the versions according to descending order of dispatch priority.  */
+  qsort (function_version_info, actual_versions,
+	 sizeof (struct _function_version_info), compare_feature_version_info);
+
+  for (i = 0; i < actual_versions; ++i)
+    *empty_bb = add_condition_to_bb (dispatch_decl,
+				     function_version_info[i].version_decl,
+				     function_version_info[i].feature_mask,
+				     mask_var,
+				     *empty_bb);
+
+  free (function_version_info);
+  return 0;
+}
+
+
+tree
+aarch64_generate_version_dispatcher_body (void *node_p)
+{
+  tree resolver_decl;
+  basic_block empty_bb;
+  tree default_ver_decl;
+  struct cgraph_node *versn;
+  struct cgraph_node *node;
+
+  struct cgraph_function_version_info *node_version_info = NULL;
+  struct cgraph_function_version_info *versn_info = NULL;
+
+  node = (cgraph_node *)node_p;
+
+  node_version_info = node->function_version ();
+  gcc_assert (node->dispatcher_function
+	      && node_version_info != NULL);
+
+  if (node_version_info->dispatcher_resolver)
+    return node_version_info->dispatcher_resolver;
+
+  /* The first version in the chain corresponds to the default version.  */
+  default_ver_decl = node_version_info->next->this_node->decl;
+
+  /* node is going to be an alias, so remove the finalized bit.  */
+  node->definition = false;
+
+  resolver_decl = make_resolver_func (default_ver_decl,
+				      node->decl, &empty_bb);
+
+  node_version_info->dispatcher_resolver = resolver_decl;
+
+  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
+
+  auto_vec<tree, 2> fn_ver_vec;
+
+  for (versn_info = node_version_info->next; versn_info;
+       versn_info = versn_info->next)
+    {
+      versn = versn_info->this_node;
+      /* Check for virtual functions here again, as by this time it should
+	 have been determined if this function needs a vtable index or
+	 not.  This happens for methods in derived classes that override
+	 virtual methods in base classes but are not explicitly marked as
+	 virtual.  */
+      if (DECL_VINDEX (versn->decl))
+	sorry ("virtual function multiversioning not supported");
+
+      fn_ver_vec.safe_push (versn->decl);
+    }
+
+  dispatch_function_versions (resolver_decl, &fn_ver_vec, &empty_bb);
+  cgraph_edge::rebuild_edges ();
+  pop_cfun ();
+  return resolver_decl;
+}
+
+/* Make a dispatcher declaration for the multi-versioned function DECL.
+   Calls to DECL function will be replaced with calls to the dispatcher
+   by the front-end.  Returns the decl of the dispatcher function.  */
+
+tree
+aarch64_get_function_versions_dispatcher (void *decl)
+{
+  tree fn = (tree) decl;
+  struct cgraph_node *node = NULL;
+  struct cgraph_node *default_node = NULL;
+  struct cgraph_function_version_info *node_v = NULL;
+  struct cgraph_function_version_info *first_v = NULL;
+
+  tree dispatch_decl = NULL;
+
+  struct cgraph_function_version_info *default_version_info = NULL;
+
+  gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
+
+  node = cgraph_node::get (fn);
+  gcc_assert (node != NULL);
+
+  node_v = node->function_version ();
+  gcc_assert (node_v != NULL);
+
+  if (node_v->dispatcher_resolver != NULL)
+    return node_v->dispatcher_resolver;
+
+  /* Find the default version and make it the first node.  */
+  first_v = node_v;
+  /* Go to the beginning of the chain.  */
+  while (first_v->prev != NULL)
+    first_v = first_v->prev;
+  default_version_info = first_v;
+  while (default_version_info != NULL)
+    {
+      if (get_feature_mask_for_version
+	    (default_version_info->this_node->decl) == 0ULL)
+	break;
+      default_version_info = default_version_info->next;
+    }
+
+  /* If there is no default node, just return NULL.  */
+  if (default_version_info == NULL)
+    return NULL;
+
+  /* Make default info the first node.  */
+  if (first_v != default_version_info)
+    {
+      default_version_info->prev->next = default_version_info->next;
+      if (default_version_info->next)
+	default_version_info->next->prev = default_version_info->prev;
+      first_v->prev = default_version_info;
+      default_version_info->next = first_v;
+      default_version_info->prev = NULL;
+    }
+
+  default_node = default_version_info->this_node;
+
+  if (targetm.has_ifunc_p ())
+    {
+      struct cgraph_function_version_info *it_v = NULL;
+      struct cgraph_node *dispatcher_node = NULL;
+      struct cgraph_function_version_info *dispatcher_version_info = NULL;
+
+      /* Right now, the dispatching is done via ifunc.  */
+      dispatch_decl = make_dispatcher_decl (default_node->decl);
+      TREE_NOTHROW (dispatch_decl) = TREE_NOTHROW (fn);
+
+      dispatcher_node = cgraph_node::get_create (dispatch_decl);
+      gcc_assert (dispatcher_node != NULL);
+      dispatcher_node->dispatcher_function = 1;
+      dispatcher_version_info
+	= dispatcher_node->insert_new_function_version ();
+      dispatcher_version_info->next = default_version_info;
+      dispatcher_node->definition = 1;
+
+      /* Set the dispatcher for all the versions.  */
+      it_v = default_version_info;
+      while (it_v != NULL)
+	{
+	  it_v->dispatcher_resolver = dispatch_decl;
+	  it_v = it_v->next;
+	}
+    }
+  else
+    {
+      error_at (DECL_SOURCE_LOCATION (default_node->decl),
+		"multiversioning needs %<ifunc%> which is not supported "
+		"on this target");
+    }
+
+  return dispatch_decl;
+}
+
+bool
+aarch64_common_function_versions (tree fn1, tree fn2)
+{
+  if (TREE_CODE (fn1) != FUNCTION_DECL
+      || TREE_CODE (fn2) != FUNCTION_DECL)
+    return false;
+
+  return (aarch64_compare_version_priority (fn1, fn2) != 0);
+}
+
+
+tree
+aarch64_mangle_decl_assembler_name (tree decl, tree id)
+{
+  /* For function version, add the target suffix to the assembler name.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_FUNCTION_VERSIONED (decl))
+    {
+      aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version (decl);
+
+      /* No suffix for the default version.  */
+      if (feature_mask == 0ULL)
+	return id;
+
+      char suffix[2048];
+      int pos = 0;
+      const char *base = IDENTIFIER_POINTER (id);
+
+      for (int i = 1; i < FEAT_MAX; i++)
+	{
+	  if (feature_mask & aarch64_fmv_feature_data[i].feature_mask)
+	    {
+	      suffix[pos] = 'M';
+	      strcpy (&suffix[pos+1], aarch64_fmv_feature_data[i].name);
+	      pos += strlen(aarch64_fmv_feature_data[i].name) + 1;
+	    }
+	}
+      suffix[pos] = '\0';
+
+      char *ret = XNEWVEC (char, strlen (base) + strlen (suffix) + 3);
+      sprintf (ret, "%s._%s", base, suffix);
+
+      if (DECL_ASSEMBLER_NAME_SET_P (decl))
+	SET_DECL_RTL (decl, NULL);
+
+      id = get_identifier (ret);
+    }
+  return id;
+}
+
+
 /* Helper for aarch64_can_inline_p.  In the case where CALLER and CALLEE are
    tri-bool options (yes, no, don't care) and the default value is
    DEF, determine whether to reject inlining.  */
@@ -28457,6 +29288,13 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_OPTION_VALID_ATTRIBUTE_P
 #define TARGET_OPTION_VALID_ATTRIBUTE_P aarch64_option_valid_attribute_p
 
+#undef TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P
+#define TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P \
+  aarch64_option_valid_version_attribute_p
+
+#undef TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
+#define TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE "target_version"
+
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION aarch64_set_current_function
 
@@ -28787,6 +29625,24 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_CONST_ANCHOR
 #define TARGET_CONST_ANCHOR 0x1000000
 
+#undef TARGET_OPTION_FUNCTION_VERSIONS
+#define TARGET_OPTION_FUNCTION_VERSIONS aarch64_common_function_versions
+
+#undef TARGET_COMPARE_VERSION_PRIORITY
+#define TARGET_COMPARE_VERSION_PRIORITY aarch64_compare_version_priority
+
+#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
+#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
+  aarch64_generate_version_dispatcher_body
+
+#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
+#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
+  aarch64_get_function_versions_dispatcher
+
+#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
+#define TARGET_MANGLE_DECL_ASSEMBLER_NAME aarch64_mangle_decl_assembler_name
+
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/config/arm/aarch-common.h b/gcc/config/arm/aarch-common.h
index c6a67f0d05cc75d85d019e1cc910c37173884c03..70f01fd3da6919dd98cfe92bfc4c54b7d2cba72c 100644
--- a/gcc/config/arm/aarch-common.h
+++ b/gcc/config/arm/aarch-common.h
@@ -23,7 +23,7 @@
 #define GCC_AARCH_COMMON_H
 
 /* Enum describing the various ways that the
-   aarch*_parse_{arch,tune,cpu,extension} functions can fail.
+   aarch*_parse_{arch,tune,cpu,extension,fmv_extension} functions can fail.
    This way their callers can choose what kind of error to give.  */
 
 enum aarch_parse_opt_result
@@ -31,7 +31,8 @@ enum aarch_parse_opt_result
   AARCH_PARSE_OK,			/* Parsing was successful.  */
   AARCH_PARSE_MISSING_ARG,		/* Missing argument.  */
   AARCH_PARSE_INVALID_FEATURE,		/* Invalid feature modifier.  */
-  AARCH_PARSE_INVALID_ARG		/* Invalid arch, tune, cpu arg.  */
+  AARCH_PARSE_INVALID_ARG,		/* Invalid arch, tune, cpu arg.  */
+  AARCH_PARSE_DUPLICATE_FEATURE		/* Duplicate feature modifier.  */
 };
 
 /* Function types -msign-return-address should sign.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
index 8499f87c39b173491a89626af56f4e193b1d12b5..8b7d7d2d8a00f6d5a6a35ffca28be7f1ff4cb9c7 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto} } } */
 
 /* Test a normal looking procinfo.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
index 551669091c7010379a4c5247a27c517c4e67ef98..234a1ce1d7b4714e64c95c15488784d73c0552f2 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto} } } */
 
 /* Test one with mixed order of feature bits.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
index 2f963bb2312711691f6f1c5989a100b88671ad52..bd3ea96a785de507578729a621ec4ae7bad8a516 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\+sve2} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+sve2} } } */
 
 /* Test a normal looking procinfo.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
index c68a697aa3e97ef52fd7e90233c5bb4ac8dbddd9..33e6319b46dcebc717e8a415484093e980660fb5 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
@@ -7,6 +7,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\+sve2} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+sve2} } } */
 
 /* Test a normal looking procinfo.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
index b5f0a3005f50cbf01edbcb8aefcc3c34aa11207f..abae7a7d1453f79f879ff5e24f7c67e819db1dbb 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8.6-a\+crc\+fp16\+aes\+sha3\+rng} } } */
+/* { dg-final { scan-assembler {\.arch armv8.6-a\+rng\+crc\+aes\+sha3\+fp16} } } */
 
 /* Test one where the boundary of buffer size would overwrite the last
    character read when stitching the fgets-calls together.  With the
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
index 980d3f79dfb03b0d8eb68f691bf2dedf80aed87d..a5b4b4d3442c6522a8cdadf4eebd3b5460e37213 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+profile\+memtag\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+nopauth\n} } } */
+/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+memtag\+profile\+nopauth\n} } } */
 
 /* Test one that if the kernel doesn't report the availability of a mandatory
    feature that it has turned it off for whatever reason.  As such compilers
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
index 117df2b0b6cd5751d9f5175b4343aad9825a6c43..e12aa543d02924f268729f96fe1f17181287f097 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+profile\+memtag\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\n} } } */
+/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+memtag\+profile\n} } } */
 
 /* Check whether features that don't have a midr name during detection are
    correctly ignored.  These features shouldn't affect the native detection.
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
index efbd02cbdc0638db85e776f1e79043709c11df21..920e1d65711cbcb77b07441597180c0159ccabf9 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+lse\+rcpc\+rdma\+dotprod\+fp16fml\+sb\+ssbs\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+flagm\n} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+flagm\+dotprod\+rdma\+lse\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n} } } */
 
 /* Check that an Armv8-A core doesn't fall apart on extensions without midr
    values.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
index d431d4938265d024891b464ac3d069607b21d8e7..416a29b514ab7599a7092e26e3716ec8a50cc895 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+lse\+rcpc\+rdma\+dotprod\+fp16fml\+sb\+ssbs\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+flagm\+pauth\n} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+flagm\+dotprod\+rdma\+lse\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n} } } */
 
 /* Check that an Armv8-A core doesn't fall apart on extensions without midr
    values and that it enables optional features.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
index 7608e8845a662219488effcdb8277006dcf457a9..907249c5c1e6a440731533407df0ff7caadcbf74 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+fp16\+crypto} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+fp16} } } */
 
-/* Test one where the feature bits for crypto and fp16 are given in
-   same order as declared in options file.  */
+/* Test one where the crypto and fp16 options are specified in different
+   order from what is in the options file.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
index 72b14b4f6ed0d50a4fc8a35931fbd232b09d2b61..b68a07a7c16b7a3cc9a896cca152d78e5cf9ea2f 100644
--- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
@@ -7,7 +7,7 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8-a\+fp16\+crypto} } } */
+/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+fp16} } } */
 
-/* Test one where the crypto and fp16 options are specified in different
-   order from what is in the options file.  */
+/* Test one where the feature bits for crypto and fp16 are given in
+   same order as declared in options file.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_17.c b/gcc/testsuite/gcc.target/aarch64/options_set_17.c
index c490e1f47a0a7a3adcbb7e96a3974d5651a023e8..4c53edd5cb92f83b3d34454c85062ff3f67b50ee 100644
--- a/gcc/testsuite/gcc.target/aarch64/options_set_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/options_set_17.c
@@ -6,6 +6,6 @@ int main ()
   return 0;
 }
 
-/* { dg-final { scan-assembler {\.arch armv8\.2-a\+crc\+dotprod} } } */
+/* { dg-final { scan-assembler {\.arch armv8\.2-a\+dotprod\+crc} } } */
 
  /* dotprod needs to be emitted pre armv8.4.  */
diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
index 0888ca4ed058430f524b99cb0e204bd996fa0e55..78664d5a4287be0369a4b02e1b8ab4a885869352 100644
--- a/libgcc/config/aarch64/cpuinfo.c
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -22,6 +22,8 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+#include "common/config/aarch64/cpuinfo.h"
+
 #if defined(__has_include)
 #if __has_include(<sys/auxv.h>)
 #include <sys/auxv.h>
@@ -39,73 +41,6 @@ typedef struct __ifunc_arg_t {
 #if __has_include(<asm/hwcap.h>)
 #include <asm/hwcap.h>
 
-/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
-enum CPUFeatures {
-  FEAT_RNG,
-  FEAT_FLAGM,
-  FEAT_FLAGM2,
-  FEAT_FP16FML,
-  FEAT_DOTPROD,
-  FEAT_SM4,
-  FEAT_RDM,
-  FEAT_LSE,
-  FEAT_FP,
-  FEAT_SIMD,
-  FEAT_CRC,
-  FEAT_SHA1,
-  FEAT_SHA2,
-  FEAT_SHA3,
-  FEAT_AES,
-  FEAT_PMULL,
-  FEAT_FP16,
-  FEAT_DIT,
-  FEAT_DPB,
-  FEAT_DPB2,
-  FEAT_JSCVT,
-  FEAT_FCMA,
-  FEAT_RCPC,
-  FEAT_RCPC2,
-  FEAT_FRINTTS,
-  FEAT_DGH,
-  FEAT_I8MM,
-  FEAT_BF16,
-  FEAT_EBF16,
-  FEAT_RPRES,
-  FEAT_SVE,
-  FEAT_SVE_BF16,
-  FEAT_SVE_EBF16,
-  FEAT_SVE_I8MM,
-  FEAT_SVE_F32MM,
-  FEAT_SVE_F64MM,
-  FEAT_SVE2,
-  FEAT_SVE_AES,
-  FEAT_SVE_PMULL128,
-  FEAT_SVE_BITPERM,
-  FEAT_SVE_SHA3,
-  FEAT_SVE_SM4,
-  FEAT_SME,
-  FEAT_MEMTAG,
-  FEAT_MEMTAG2,
-  FEAT_MEMTAG3,
-  FEAT_SB,
-  FEAT_PREDRES,
-  FEAT_SSBS,
-  FEAT_SSBS2,
-  FEAT_BTI,
-  FEAT_LS64,
-  FEAT_LS64_V,
-  FEAT_LS64_ACCDATA,
-  FEAT_WFXT,
-  FEAT_SME_F64,
-  FEAT_SME_I64,
-  FEAT_SME2,
-  FEAT_RCPC3,
-  FEAT_MAX,
-  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
-		    in __aarch64_cpu_features.  */
-  FEAT_INIT      /* Used as flag of features initialization completion.  */
-};
-
 /* Architecture features used in Function Multi Versioning.  */
 struct {
   unsigned long long features;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] ada: Improve attribute exclusion handling
  2023-11-17  2:54 ` [PATCH v2 3/5] ada: Improve " Andrew Carlotti
@ 2023-11-17 10:45   ` Marc Poulhiès
  2023-11-17 11:15     ` Andrew Carlotti
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Poulhiès @ 2023-11-17 10:45 UTC (permalink / raw)
  To: Andrew Carlotti
  Cc: gcc-patches, ebotcazou, rguenther, richard.sandiford, richard.earnshaw


Hello,

> I haven't manged to test the Ada frontend, but this patch (and the following

I don't have an aarch64 setup to test, but I may be able to help with the
issue preventing you from testing. Can you elaborate what is the problem?

Marc

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] ada: Improve attribute exclusion handling
  2023-11-17 10:45   ` Marc Poulhiès
@ 2023-11-17 11:15     ` Andrew Carlotti
  2023-11-20  8:26       ` Marc Poulhiès
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Carlotti @ 2023-11-17 11:15 UTC (permalink / raw)
  To: Marc Poulhiès
  Cc: gcc-patches, ebotcazou, rguenther, richard.sandiford, richard.earnshaw

On Fri, Nov 17, 2023 at 11:45:16AM +0100, Marc Poulhi�s wrote:
> 
> Hello,
> 
> > I haven't manged to test the Ada frontend, but this patch (and the following
> 
> I don't have an aarch64 setup to test, but I may be able to help with the
> issue preventing you from testing. Can you elaborate what is the problem?
> 
> Marc

I only really got as far as trying to configure a build environemnt, which
failed with 'configure: error: GNAT is required to build ada'.  I have no prior
Ada experience, and I couldn't work out how to get any relevant test code to
compile on Compiler Explorer.  I therefore decided it wasn't worth me spending
more effort trying to test from Ada a small change to some code that is
effectively front-end independent, but just happens to be added to a limited
subset of front ends.

It's probably sufficient to simply test that the Ada changes can be built for
any target, since I'd be surprised if I've managed to copy this code from C++
in a way that breaks functionality without obviously breaking the build.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/5] c-family: Simplify attribute exclusion handling
  2023-11-17  2:53 ` [PATCH v2 2/5] c-family: Simplify attribute exclusion handling Andrew Carlotti
@ 2023-11-19 21:45   ` Jeff Law
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff Law @ 2023-11-19 21:45 UTC (permalink / raw)
  To: Andrew Carlotti, gcc-patches
  Cc: jason, nathan, rguenther, richard.sandiford, richard.earnshaw



On 11/16/23 19:53, Andrew Carlotti wrote:
> This patch changes the handling of mutual exclusions involving the
> target and target_clones attributes to use the generic attribute
> exclusion lists.  Additionally, the duplicate handling for the
> always_inline and noinline attribute exclusion is removed.
> 
> The only change in functionality is the choice of warning message
> displayed - due to either a change in the wording for mutual exclusion
> warnings, or a change in the order in which different checks occur.
> 
> Ok for master?
> 
> gcc/c-family/ChangeLog:
> 
> 	* c-attribs.cc (attr_always_inline_exclusions): New.
> 	(attr_target_exclusions): Ditto.
> 	(attr_target_clones_exclusions): Ditto.
> 	(c_common_attribute_table): Add new exclusion lists.
> 	(handle_noinline_attribute): Remove custom exclusion handling.
> 	(handle_always_inline_attribute): Ditto.
> 	(handle_target_attribute): Ditto.
> 	(handle_target_clones_attribute): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.target/i386/mvc2.C:
> 	* g++.target/i386/mvc3.C:
OK
jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/5] ada: Improve attribute exclusion handling
  2023-11-17 11:15     ` Andrew Carlotti
@ 2023-11-20  8:26       ` Marc Poulhiès
  0 siblings, 0 replies; 16+ messages in thread
From: Marc Poulhiès @ 2023-11-20  8:26 UTC (permalink / raw)
  To: Andrew Carlotti
  Cc: gcc-patches, ebotcazou, rguenther, richard.sandiford, richard.earnshaw


Andrew Carlotti <andrew.carlotti@arm.com> writes:

> On Fri, Nov 17, 2023 at 11:45:16AM +0100, Marc Poulhi�s wrote:
>>
>> Hello,
>>
>> > I haven't manged to test the Ada frontend, but this patch (and the following
>>
>> I don't have an aarch64 setup to test, but I may be able to help with the
>> issue preventing you from testing. Can you elaborate what is the problem?
>>
>> Marc
>
> I only really got as far as trying to configure a build environemnt, which
> failed with 'configure: error: GNAT is required to build ada'.  I have no prior
> Ada experience, and I couldn't work out how to get any relevant test code to
> compile on Compiler Explorer.  I therefore decided it wasn't worth me spending
> more effort trying to test from Ada a small change to some code that is
> effectively front-end independent, but just happens to be added to a limited
> subset of front ends.
>
> It's probably sufficient to simply test that the Ada changes can be built for
> any target, since I'd be surprised if I've managed to copy this code from C++
> in a way that breaks functionality without obviously breaking the build.

Hello,

I've tested your changes. The compiler builds correctly and there's
no regression (x86_64-linux) + I've also executed some extra tests.

> gcc/ada/ChangeLog:
> 	* gcc-interface/utils.cc (attr_noinline_exclusions): New.
> 	(attr_always_inline_exclusions): Ditto.
> 	(attr_target_exclusions): Ditto.
> 	(attr_target_clones_exclusions): Ditto.
> 	(gnat_internal_attribute_table): Add new exclusion lists.
> 	(handle_noinline_attribute): Remove custom exclusion handling.
> 	(handle_target_attribute): Ditto.
> 	(handle_target_clones_attribute): Ditto.

Ok.

Marc

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc
  2023-11-17  2:51 ` [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc Andrew Carlotti
@ 2023-11-20 15:46   ` Richard Sandiford
  2023-12-04 10:31     ` Andrew Carlotti
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2023-11-20 15:46 UTC (permalink / raw)
  To: Andrew Carlotti; +Cc: gcc-patches, richard.earnshaw

Andrew Carlotti <andrew.carlotti@arm.com> writes:
> This is added to enable function multiversioning, but can also be used
> directly.  The interface is chosen to match that used in LLVM's
> compiler-rt, to facilitate cross-compiler compatibility.
>
> The content of the patch is derived almost entirely from Pavel's prior
> contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
> changes to align more closely with GCC coding style, and to exclude any code
> from other LLVM contributors, and am adding this to GCC with Pavel's approval.
>
> libgcc/ChangeLog:
>
> 	* config/aarch64/t-aarch64: Include cpuinfo.c
> 	* config/aarch64/cpuinfo.c: New file
> 	(__init_cpu_features_constructor) New.
> 	(__init_cpu_features_resolver) New.
> 	(__init_cpu_features) New.

OK on the basis that you mentioed in the covering note: we can deal
with fixes incrementally.  One question though...
>
> Co-authored-by: Pavel Iliin <Pavel.Iliin@arm.com>
>
>
> diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..0888ca4ed058430f524b99cb0e204bd996fa0e55
> --- /dev/null
> +++ b/libgcc/config/aarch64/cpuinfo.c
> @@ -0,0 +1,502 @@
> +/* CPU feature detection for AArch64 architecture.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   This file is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by the
> +   Free Software Foundation; either version 3, or (at your option) any
> +   later version.
> +
> +   This file is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +  
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#if defined(__has_include)

Is this protecting against a known condition?  libgcc has to be built
with the associated version of GCC, so it might be better to drop the
#if and get a noisy failure if something unexpected happens.  That can
be part of 5/5 though.

Thanks,
Richard

> +#if __has_include(<sys/auxv.h>)
> +#include <sys/auxv.h>
> +
> +#if __has_include(<sys/ifunc.h>)
> +#include <sys/ifunc.h>
> +#else
> +typedef struct __ifunc_arg_t {
> +  unsigned long _size;
> +  unsigned long _hwcap;
> +  unsigned long _hwcap2;
> +} __ifunc_arg_t;
> +#endif
> +
> +#if __has_include(<asm/hwcap.h>)
> +#include <asm/hwcap.h>
> +
> +/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
> +enum CPUFeatures {
> +  FEAT_RNG,
> +  FEAT_FLAGM,
> +  FEAT_FLAGM2,
> +  FEAT_FP16FML,
> +  FEAT_DOTPROD,
> +  FEAT_SM4,
> +  FEAT_RDM,
> +  FEAT_LSE,
> +  FEAT_FP,
> +  FEAT_SIMD,
> +  FEAT_CRC,
> +  FEAT_SHA1,
> +  FEAT_SHA2,
> +  FEAT_SHA3,
> +  FEAT_AES,
> +  FEAT_PMULL,
> +  FEAT_FP16,
> +  FEAT_DIT,
> +  FEAT_DPB,
> +  FEAT_DPB2,
> +  FEAT_JSCVT,
> +  FEAT_FCMA,
> +  FEAT_RCPC,
> +  FEAT_RCPC2,
> +  FEAT_FRINTTS,
> +  FEAT_DGH,
> +  FEAT_I8MM,
> +  FEAT_BF16,
> +  FEAT_EBF16,
> +  FEAT_RPRES,
> +  FEAT_SVE,
> +  FEAT_SVE_BF16,
> +  FEAT_SVE_EBF16,
> +  FEAT_SVE_I8MM,
> +  FEAT_SVE_F32MM,
> +  FEAT_SVE_F64MM,
> +  FEAT_SVE2,
> +  FEAT_SVE_AES,
> +  FEAT_SVE_PMULL128,
> +  FEAT_SVE_BITPERM,
> +  FEAT_SVE_SHA3,
> +  FEAT_SVE_SM4,
> +  FEAT_SME,
> +  FEAT_MEMTAG,
> +  FEAT_MEMTAG2,
> +  FEAT_MEMTAG3,
> +  FEAT_SB,
> +  FEAT_PREDRES,
> +  FEAT_SSBS,
> +  FEAT_SSBS2,
> +  FEAT_BTI,
> +  FEAT_LS64,
> +  FEAT_LS64_V,
> +  FEAT_LS64_ACCDATA,
> +  FEAT_WFXT,
> +  FEAT_SME_F64,
> +  FEAT_SME_I64,
> +  FEAT_SME2,
> +  FEAT_RCPC3,
> +  FEAT_MAX,
> +  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
> +		    in __aarch64_cpu_features.  */
> +  FEAT_INIT      /* Used as flag of features initialization completion.  */
> +};
> +
> +/* Architecture features used in Function Multi Versioning.  */
> +struct {
> +  unsigned long long features;
> +  /* As features grows new fields could be added.  */
> +} __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
> +
> +#ifndef _IFUNC_ARG_HWCAP
> +#define _IFUNC_ARG_HWCAP (1ULL << 62)
> +#endif
> +#ifndef AT_HWCAP
> +#define AT_HWCAP 16
> +#endif
> +#ifndef HWCAP_CPUID
> +#define HWCAP_CPUID (1 << 11)
> +#endif
> +#ifndef HWCAP_FP
> +#define HWCAP_FP (1 << 0)
> +#endif
> +#ifndef HWCAP_ASIMD
> +#define HWCAP_ASIMD (1 << 1)
> +#endif
> +#ifndef HWCAP_AES
> +#define HWCAP_AES (1 << 3)
> +#endif
> +#ifndef HWCAP_PMULL
> +#define HWCAP_PMULL (1 << 4)
> +#endif
> +#ifndef HWCAP_SHA1
> +#define HWCAP_SHA1 (1 << 5)
> +#endif
> +#ifndef HWCAP_SHA2
> +#define HWCAP_SHA2 (1 << 6)
> +#endif
> +#ifndef HWCAP_ATOMICS
> +#define HWCAP_ATOMICS (1 << 8)
> +#endif
> +#ifndef HWCAP_FPHP
> +#define HWCAP_FPHP (1 << 9)
> +#endif
> +#ifndef HWCAP_ASIMDHP
> +#define HWCAP_ASIMDHP (1 << 10)
> +#endif
> +#ifndef HWCAP_ASIMDRDM
> +#define HWCAP_ASIMDRDM (1 << 12)
> +#endif
> +#ifndef HWCAP_JSCVT
> +#define HWCAP_JSCVT (1 << 13)
> +#endif
> +#ifndef HWCAP_FCMA
> +#define HWCAP_FCMA (1 << 14)
> +#endif
> +#ifndef HWCAP_LRCPC
> +#define HWCAP_LRCPC (1 << 15)
> +#endif
> +#ifndef HWCAP_DCPOP
> +#define HWCAP_DCPOP (1 << 16)
> +#endif
> +#ifndef HWCAP_SHA3
> +#define HWCAP_SHA3 (1 << 17)
> +#endif
> +#ifndef HWCAP_SM3
> +#define HWCAP_SM3 (1 << 18)
> +#endif
> +#ifndef HWCAP_SM4
> +#define HWCAP_SM4 (1 << 19)
> +#endif
> +#ifndef HWCAP_ASIMDDP
> +#define HWCAP_ASIMDDP (1 << 20)
> +#endif
> +#ifndef HWCAP_SHA512
> +#define HWCAP_SHA512 (1 << 21)
> +#endif
> +#ifndef HWCAP_SVE
> +#define HWCAP_SVE (1 << 22)
> +#endif
> +#ifndef HWCAP_ASIMDFHM
> +#define HWCAP_ASIMDFHM (1 << 23)
> +#endif
> +#ifndef HWCAP_DIT
> +#define HWCAP_DIT (1 << 24)
> +#endif
> +#ifndef HWCAP_ILRCPC
> +#define HWCAP_ILRCPC (1 << 26)
> +#endif
> +#ifndef HWCAP_FLAGM
> +#define HWCAP_FLAGM (1 << 27)
> +#endif
> +#ifndef HWCAP_SSBS
> +#define HWCAP_SSBS (1 << 28)
> +#endif
> +#ifndef HWCAP_SB
> +#define HWCAP_SB (1 << 29)
> +#endif
> +
> +#ifndef HWCAP2_DCPODP
> +#define HWCAP2_DCPODP (1 << 0)
> +#endif
> +#ifndef HWCAP2_SVE2
> +#define HWCAP2_SVE2 (1 << 1)
> +#endif
> +#ifndef HWCAP2_SVEAES
> +#define HWCAP2_SVEAES (1 << 2)
> +#endif
> +#ifndef HWCAP2_SVEPMULL
> +#define HWCAP2_SVEPMULL (1 << 3)
> +#endif
> +#ifndef HWCAP2_SVEBITPERM
> +#define HWCAP2_SVEBITPERM (1 << 4)
> +#endif
> +#ifndef HWCAP2_SVESHA3
> +#define HWCAP2_SVESHA3 (1 << 5)
> +#endif
> +#ifndef HWCAP2_SVESM4
> +#define HWCAP2_SVESM4 (1 << 6)
> +#endif
> +#ifndef HWCAP2_FLAGM2
> +#define HWCAP2_FLAGM2 (1 << 7)
> +#endif
> +#ifndef HWCAP2_FRINT
> +#define HWCAP2_FRINT (1 << 8)
> +#endif
> +#ifndef HWCAP2_SVEI8MM
> +#define HWCAP2_SVEI8MM (1 << 9)
> +#endif
> +#ifndef HWCAP2_SVEF32MM
> +#define HWCAP2_SVEF32MM (1 << 10)
> +#endif
> +#ifndef HWCAP2_SVEF64MM
> +#define HWCAP2_SVEF64MM (1 << 11)
> +#endif
> +#ifndef HWCAP2_SVEBF16
> +#define HWCAP2_SVEBF16 (1 << 12)
> +#endif
> +#ifndef HWCAP2_I8MM
> +#define HWCAP2_I8MM (1 << 13)
> +#endif
> +#ifndef HWCAP2_BF16
> +#define HWCAP2_BF16 (1 << 14)
> +#endif
> +#ifndef HWCAP2_DGH
> +#define HWCAP2_DGH (1 << 15)
> +#endif
> +#ifndef HWCAP2_RNG
> +#define HWCAP2_RNG (1 << 16)
> +#endif
> +#ifndef HWCAP2_BTI
> +#define HWCAP2_BTI (1 << 17)
> +#endif
> +#ifndef HWCAP2_MTE
> +#define HWCAP2_MTE (1 << 18)
> +#endif
> +#ifndef HWCAP2_RPRES
> +#define HWCAP2_RPRES (1 << 21)
> +#endif
> +#ifndef HWCAP2_MTE3
> +#define HWCAP2_MTE3 (1 << 22)
> +#endif
> +#ifndef HWCAP2_SME
> +#define HWCAP2_SME (1 << 23)
> +#endif
> +#ifndef HWCAP2_SME_I16I64
> +#define HWCAP2_SME_I16I64 (1 << 24)
> +#endif
> +#ifndef HWCAP2_SME_F64F64
> +#define HWCAP2_SME_F64F64 (1 << 25)
> +#endif
> +#ifndef HWCAP2_WFXT
> +#define HWCAP2_WFXT (1UL << 31)
> +#endif
> +#ifndef HWCAP2_EBF16
> +#define HWCAP2_EBF16 (1UL << 32)
> +#endif
> +#ifndef HWCAP2_SVE_EBF16
> +#define HWCAP2_SVE_EBF16 (1UL << 33)
> +#endif
> +
> +static void
> +__init_cpu_features_constructor(unsigned long hwcap,
> +				const __ifunc_arg_t *arg) {
> +#define setCPUFeature(F) __aarch64_cpu_features.features |= 1ULL << F
> +#define getCPUFeature(id, ftr) __asm__("mrs %0, " #id : "=r"(ftr))
> +#define extractBits(val, start, number) \
> +  (val & ((1ULL << number) - 1ULL) << start) >> start
> +  unsigned long hwcap2 = 0;
> +  if (hwcap & _IFUNC_ARG_HWCAP)
> +    hwcap2 = arg->_hwcap2;
> +  if (hwcap & HWCAP_CRC32)
> +    setCPUFeature(FEAT_CRC);
> +  if (hwcap & HWCAP_PMULL)
> +    setCPUFeature(FEAT_PMULL);
> +  if (hwcap & HWCAP_FLAGM)
> +    setCPUFeature(FEAT_FLAGM);
> +  if (hwcap2 & HWCAP2_FLAGM2) {
> +    setCPUFeature(FEAT_FLAGM);
> +    setCPUFeature(FEAT_FLAGM2);
> +  }
> +  if (hwcap & HWCAP_SM3 && hwcap & HWCAP_SM4)
> +    setCPUFeature(FEAT_SM4);
> +  if (hwcap & HWCAP_ASIMDDP)
> +    setCPUFeature(FEAT_DOTPROD);
> +  if (hwcap & HWCAP_ASIMDFHM)
> +    setCPUFeature(FEAT_FP16FML);
> +  if (hwcap & HWCAP_FPHP) {
> +    setCPUFeature(FEAT_FP16);
> +    setCPUFeature(FEAT_FP);
> +  }
> +  if (hwcap & HWCAP_DIT)
> +    setCPUFeature(FEAT_DIT);
> +  if (hwcap & HWCAP_ASIMDRDM)
> +    setCPUFeature(FEAT_RDM);
> +  if (hwcap & HWCAP_ILRCPC)
> +    setCPUFeature(FEAT_RCPC2);
> +  if (hwcap & HWCAP_AES)
> +    setCPUFeature(FEAT_AES);
> +  if (hwcap & HWCAP_SHA1)
> +    setCPUFeature(FEAT_SHA1);
> +  if (hwcap & HWCAP_SHA2)
> +    setCPUFeature(FEAT_SHA2);
> +  if (hwcap & HWCAP_JSCVT)
> +    setCPUFeature(FEAT_JSCVT);
> +  if (hwcap & HWCAP_FCMA)
> +    setCPUFeature(FEAT_FCMA);
> +  if (hwcap & HWCAP_SB)
> +    setCPUFeature(FEAT_SB);
> +  if (hwcap & HWCAP_SSBS)
> +    setCPUFeature(FEAT_SSBS2);
> +  if (hwcap2 & HWCAP2_MTE) {
> +    setCPUFeature(FEAT_MEMTAG);
> +    setCPUFeature(FEAT_MEMTAG2);
> +  }
> +  if (hwcap2 & HWCAP2_MTE3) {
> +    setCPUFeature(FEAT_MEMTAG);
> +    setCPUFeature(FEAT_MEMTAG2);
> +    setCPUFeature(FEAT_MEMTAG3);
> +  }
> +  if (hwcap2 & HWCAP2_SVEAES)
> +    setCPUFeature(FEAT_SVE_AES);
> +  if (hwcap2 & HWCAP2_SVEPMULL) {
> +    setCPUFeature(FEAT_SVE_AES);
> +    setCPUFeature(FEAT_SVE_PMULL128);
> +  }
> +  if (hwcap2 & HWCAP2_SVEBITPERM)
> +    setCPUFeature(FEAT_SVE_BITPERM);
> +  if (hwcap2 & HWCAP2_SVESHA3)
> +    setCPUFeature(FEAT_SVE_SHA3);
> +  if (hwcap2 & HWCAP2_SVESM4)
> +    setCPUFeature(FEAT_SVE_SM4);
> +  if (hwcap2 & HWCAP2_DCPODP)
> +    setCPUFeature(FEAT_DPB2);
> +  if (hwcap & HWCAP_ATOMICS)
> +    setCPUFeature(FEAT_LSE);
> +  if (hwcap2 & HWCAP2_RNG)
> +    setCPUFeature(FEAT_RNG);
> +  if (hwcap2 & HWCAP2_I8MM)
> +    setCPUFeature(FEAT_I8MM);
> +  if (hwcap2 & HWCAP2_EBF16)
> +    setCPUFeature(FEAT_EBF16);
> +  if (hwcap2 & HWCAP2_SVE_EBF16)
> +    setCPUFeature(FEAT_SVE_EBF16);
> +  if (hwcap2 & HWCAP2_DGH)
> +    setCPUFeature(FEAT_DGH);
> +  if (hwcap2 & HWCAP2_FRINT)
> +    setCPUFeature(FEAT_FRINTTS);
> +  if (hwcap2 & HWCAP2_SVEI8MM)
> +    setCPUFeature(FEAT_SVE_I8MM);
> +  if (hwcap2 & HWCAP2_SVEF32MM)
> +    setCPUFeature(FEAT_SVE_F32MM);
> +  if (hwcap2 & HWCAP2_SVEF64MM)
> +    setCPUFeature(FEAT_SVE_F64MM);
> +  if (hwcap2 & HWCAP2_BTI)
> +    setCPUFeature(FEAT_BTI);
> +  if (hwcap2 & HWCAP2_RPRES)
> +    setCPUFeature(FEAT_RPRES);
> +  if (hwcap2 & HWCAP2_WFXT)
> +    setCPUFeature(FEAT_WFXT);
> +  if (hwcap2 & HWCAP2_SME)
> +    setCPUFeature(FEAT_SME);
> +  if (hwcap2 & HWCAP2_SME_I16I64)
> +    setCPUFeature(FEAT_SME_I64);
> +  if (hwcap2 & HWCAP2_SME_F64F64)
> +    setCPUFeature(FEAT_SME_F64);
> +  if (hwcap & HWCAP_CPUID) {
> +    unsigned long ftr;
> +    getCPUFeature(ID_AA64PFR1_EL1, ftr);
> +    /* ID_AA64PFR1_EL1.MTE >= 0b0001  */
> +    if (extractBits(ftr, 8, 4) >= 0x1)
> +      setCPUFeature(FEAT_MEMTAG);
> +    /* ID_AA64PFR1_EL1.SSBS == 0b0001  */
> +    if (extractBits(ftr, 4, 4) == 0x1)
> +      setCPUFeature(FEAT_SSBS);
> +    /* ID_AA64PFR1_EL1.SME == 0b0010  */
> +    if (extractBits(ftr, 24, 4) == 0x2)
> +      setCPUFeature(FEAT_SME2);
> +    getCPUFeature(ID_AA64PFR0_EL1, ftr);
> +    /* ID_AA64PFR0_EL1.FP != 0b1111  */
> +    if (extractBits(ftr, 16, 4) != 0xF) {
> +      setCPUFeature(FEAT_FP);
> +      /* ID_AA64PFR0_EL1.AdvSIMD has the same value as ID_AA64PFR0_EL1.FP  */
> +      setCPUFeature(FEAT_SIMD);
> +    }
> +    /* ID_AA64PFR0_EL1.SVE != 0b0000  */
> +    if (extractBits(ftr, 32, 4) != 0x0) {
> +      /* get ID_AA64ZFR0_EL1, that name supported if sve enabled only  */
> +      getCPUFeature(S3_0_C0_C4_4, ftr);
> +      /* ID_AA64ZFR0_EL1.SVEver == 0b0000  */
> +      if (extractBits(ftr, 0, 4) == 0x0)
> +	setCPUFeature(FEAT_SVE);
> +      /* ID_AA64ZFR0_EL1.SVEver == 0b0001  */
> +      if (extractBits(ftr, 0, 4) == 0x1)
> +	setCPUFeature(FEAT_SVE2);
> +      /* ID_AA64ZFR0_EL1.BF16 != 0b0000  */
> +      if (extractBits(ftr, 20, 4) != 0x0)
> +	setCPUFeature(FEAT_SVE_BF16);
> +    }
> +    getCPUFeature(ID_AA64ISAR0_EL1, ftr);
> +    /* ID_AA64ISAR0_EL1.SHA3 != 0b0000  */
> +    if (extractBits(ftr, 32, 4) != 0x0)
> +      setCPUFeature(FEAT_SHA3);
> +    getCPUFeature(ID_AA64ISAR1_EL1, ftr);
> +    /* ID_AA64ISAR1_EL1.DPB >= 0b0001  */
> +    if (extractBits(ftr, 0, 4) >= 0x1)
> +      setCPUFeature(FEAT_DPB);
> +    /* ID_AA64ISAR1_EL1.LRCPC != 0b0000  */
> +    if (extractBits(ftr, 20, 4) != 0x0)
> +      setCPUFeature(FEAT_RCPC);
> +    /* ID_AA64ISAR1_EL1.LRCPC == 0b0011  */
> +    if (extractBits(ftr, 20, 4) == 0x3)
> +      setCPUFeature(FEAT_RCPC3);
> +    /* ID_AA64ISAR1_EL1.SPECRES == 0b0001  */
> +    if (extractBits(ftr, 40, 4) == 0x2)
> +      setCPUFeature(FEAT_PREDRES);
> +    /* ID_AA64ISAR1_EL1.BF16 != 0b0000  */
> +    if (extractBits(ftr, 44, 4) != 0x0)
> +      setCPUFeature(FEAT_BF16);
> +    /* ID_AA64ISAR1_EL1.LS64 >= 0b0001  */
> +    if (extractBits(ftr, 60, 4) >= 0x1)
> +      setCPUFeature(FEAT_LS64);
> +    /* ID_AA64ISAR1_EL1.LS64 >= 0b0010  */
> +    if (extractBits(ftr, 60, 4) >= 0x2)
> +      setCPUFeature(FEAT_LS64_V);
> +    /* ID_AA64ISAR1_EL1.LS64 >= 0b0011  */
> +    if (extractBits(ftr, 60, 4) >= 0x3)
> +      setCPUFeature(FEAT_LS64_ACCDATA);
> +  } else {
> +    /* Set some features in case of no CPUID support.  */
> +    if (hwcap & (HWCAP_FP | HWCAP_FPHP)) {
> +      setCPUFeature(FEAT_FP);
> +      /* FP and AdvSIMD fields have the same value.  */
> +      setCPUFeature(FEAT_SIMD);
> +    }
> +    if (hwcap & HWCAP_DCPOP || hwcap2 & HWCAP2_DCPODP)
> +      setCPUFeature(FEAT_DPB);
> +    if (hwcap & HWCAP_LRCPC || hwcap & HWCAP_ILRCPC)
> +      setCPUFeature(FEAT_RCPC);
> +    if (hwcap2 & HWCAP2_BF16 || hwcap2 & HWCAP2_EBF16)
> +      setCPUFeature(FEAT_BF16);
> +    if (hwcap2 & HWCAP2_SVEBF16)
> +      setCPUFeature(FEAT_SVE_BF16);
> +    if (hwcap2 & HWCAP2_SVE2 && hwcap & HWCAP_SVE)
> +      setCPUFeature(FEAT_SVE2);
> +    if (hwcap & HWCAP_SHA3)
> +      setCPUFeature(FEAT_SHA3);
> +  }
> +  setCPUFeature(FEAT_INIT);
> +}
> +
> +void
> +__init_cpu_features_resolver(unsigned long hwcap, const __ifunc_arg_t *arg) {
> +  if (__aarch64_cpu_features.features)
> +    return;
> +  __init_cpu_features_constructor(hwcap, arg);
> +}
> +
> +void __attribute__ ((constructor))
> +__init_cpu_features(void) {
> +  unsigned long hwcap;
> +  unsigned long hwcap2;
> +  /* CPU features already initialized.  */
> +  if (__aarch64_cpu_features.features)
> +    return;
> +  hwcap = getauxval(AT_HWCAP);
> +  hwcap2 = getauxval(AT_HWCAP2);
> +  __ifunc_arg_t arg;
> +  arg._size = sizeof(__ifunc_arg_t);
> +  arg._hwcap = hwcap;
> +  arg._hwcap2 = hwcap2;
> +  __init_cpu_features_constructor(hwcap | _IFUNC_ARG_HWCAP, &arg);
> +#undef extractBits
> +#undef getCPUFeature
> +#undef setCPUFeature
> +}
> +#endif /* __has_include(<asm/hwcap.h>)  */
> +#endif /* __has_include(<sys/auxv.h>)  */
> +#endif /* defined(__has_include)  */
> diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64
> index a40b6241c86ecc4007b5cfd28aa989ee894aa410..8bc1a4ca0c2eb75c17e62a25aa45a875bfd472f8 100644
> --- a/libgcc/config/aarch64/t-aarch64
> +++ b/libgcc/config/aarch64/t-aarch64
> @@ -19,3 +19,4 @@
>  # <http://www.gnu.org/licenses/>.
>  
>  LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c
> +LIB2ADD += $(srcdir)/config/aarch64/cpuinfo.c

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] aarch64: Add function multiversioning support
  2023-11-17  2:56 ` [PATCH v2 5/5] aarch64: Add function multiversioning support Andrew Carlotti
@ 2023-11-24 16:22   ` Richard Sandiford
  2023-12-04 13:23     ` Andrew Carlotti
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2023-11-24 16:22 UTC (permalink / raw)
  To: Andrew Carlotti; +Cc: gcc-patches, rguenther, richard.earnshaw

Andrew Carlotti <andrew.carlotti@arm.com> writes:
> This adds initial support for function multiversioning on aarch64 using
> the target_version and target_clones attributes.  This loosely follows
> the Beta specification in the ACLE [1], although with some differences
> that still need to be resolved (possibly as follow-up patches).
>
> Existing function multiversioning implementations are broken in various
> ways when used across translation units.  This includes placing
> resolvers in the wrong translation units, and using symbol mangling that
> callers to unintentionally bypass the resolver in some circumstances.
> Fixing these issues for aarch64 will require modifications to our ACLE
> specification.  It will also require further adjustments to existing
> middle end code, to facilitate different mangling and resolver
> placement while preserving existing target behaviours.
>
> The list of function multiversioning features specified in the ACLE is
> also inconsistent with the list of features supported in target option
> extensions.  I intend to resolve some or all of these inconsistencies at
> a later stage.
>
> The target_version attribute is currently only supported in C++, since
> this is the only frontend with existing support for multiversioning
> using the target attribute.  On the other hand, this patch happens to
> enable multiversioning with the target_clones attribute in Ada and D, as
> well as the entire C family, using their existing frontend support.
>
> This patch also does not support the following aspects of the Beta
> specification:
>
> - The target_clones attribute should allow an implicit unlisted
>   "default" version.
> - There should be an option to disable function multiversioning at
>   compile time.
> - Unrecognised target names in a target_clones attribute should be
>   ignored (with an optional warning).  This current patch raises an
>   error instead.
>
> [1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
>
> ---
>
> I believe the support present in this patch correctly handles function
> multiversioning within a single translation unit for all features in the ACLE
> specification with option extension support.
>
> Is it ok to push this patch in its current state? I'd then continue working on
> incremental improvements to the supported feature extensions and the ABI issues
> in followup patches, in along with corresponding changes and improvements to
> the ACLE specification.
>
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-feature-deps.h (fmv_deps_<FEAT_NAME>):
> 	Define aarch64_feature_flags mask foreach FMV feature.
> 	* config/aarch64/aarch64-option-extensions.def: Use new macros
> 	to define FMV feature extensions.
> 	* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
> 	Check for target_version attribute after processing target
> 	attribute.
> 	(aarch64_fmv_feature_data): New.
> 	(aarch64_parse_fmv_features): New.
> 	(aarch64_process_target_version_attr): New.
> 	(aarch64_option_valid_version_attribute_p): New.
> 	(get_feature_mask_for_version): New.
> 	(compare_feature_masks): New.
> 	(aarch64_compare_version_priority): New.
> 	(build_ifunc_arg_type): New.
> 	(make_resolver_func): New.
> 	(add_condition_to_bb): New.
> 	(compare_feature_version_info): New.
> 	(dispatch_function_versions): New.
> 	(aarch64_generate_version_dispatcher_body): New.
> 	(aarch64_get_function_versions_dispatcher): New.
> 	(aarch64_common_function_versions): New.
> 	(aarch64_mangle_decl_assembler_name): New.
> 	(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
> 	(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
> 	(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
> 	(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
> 	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
> 	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
> 	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
> 	* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
> 	  new value to report duplicate FMV feature.
> 	* common/config/aarch64/cpuinfo.h: New file.
>
> libgcc/ChangeLog:
>
> 	* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
> 	  copy in gcc/common
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
> 	* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
> 	* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.

Thanks, mostly looks good, but some comments below:

> diff --git a/gcc/common/config/aarch64/cpuinfo.h b/gcc/common/config/aarch64/cpuinfo.h
> new file mode 100644
> index 0000000000000000000000000000000000000000..1690b6eee48e960d0ae675f8e8b05e6f182b56a3
> --- /dev/null
> +++ b/gcc/common/config/aarch64/cpuinfo.h
> @@ -0,0 +1,94 @@
> +/* CPU feature detection for AArch64 architecture.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   This file is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by the
> +   Free Software Foundation; either version 3, or (at your option) any
> +   later version.
> +
> +   This file is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* This enum is used in libgcc feature detection, and in the function
> +   multiversioning implementation in aarch64.cc.  The enum should use the same
> +   values as the corresponding enum in LLVM's compiler-rt, to faciliate
> +   compatibility between compilers.  */
> +
> +enum CPUFeatures {
> +  FEAT_RNG,
> +  FEAT_FLAGM,
> +  FEAT_FLAGM2,
> +  FEAT_FP16FML,
> +  FEAT_DOTPROD,
> +  FEAT_SM4,
> +  FEAT_RDM,
> +  FEAT_LSE,
> +  FEAT_FP,
> +  FEAT_SIMD,
> +  FEAT_CRC,
> +  FEAT_SHA1,
> +  FEAT_SHA2,
> +  FEAT_SHA3,
> +  FEAT_AES,
> +  FEAT_PMULL,
> +  FEAT_FP16,
> +  FEAT_DIT,
> +  FEAT_DPB,
> +  FEAT_DPB2,
> +  FEAT_JSCVT,
> +  FEAT_FCMA,
> +  FEAT_RCPC,
> +  FEAT_RCPC2,
> +  FEAT_FRINTTS,
> +  FEAT_DGH,
> +  FEAT_I8MM,
> +  FEAT_BF16,
> +  FEAT_EBF16,
> +  FEAT_RPRES,
> +  FEAT_SVE,
> +  FEAT_SVE_BF16,
> +  FEAT_SVE_EBF16,
> +  FEAT_SVE_I8MM,
> +  FEAT_SVE_F32MM,
> +  FEAT_SVE_F64MM,
> +  FEAT_SVE2,
> +  FEAT_SVE_AES,
> +  FEAT_SVE_PMULL128,
> +  FEAT_SVE_BITPERM,
> +  FEAT_SVE_SHA3,
> +  FEAT_SVE_SM4,
> +  FEAT_SME,
> +  FEAT_MEMTAG,
> +  FEAT_MEMTAG2,
> +  FEAT_MEMTAG3,
> +  FEAT_SB,
> +  FEAT_PREDRES,
> +  FEAT_SSBS,
> +  FEAT_SSBS2,
> +  FEAT_BTI,
> +  FEAT_LS64,
> +  FEAT_LS64_V,
> +  FEAT_LS64_ACCDATA,
> +  FEAT_WFXT,
> +  FEAT_SME_F64,
> +  FEAT_SME_I64,
> +  FEAT_SME2,
> +  FEAT_RCPC3,
> +  FEAT_MAX,
> +  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
> +		    in __aarch64_cpu_features.  */
> +  FEAT_INIT      /* Used as flag of features initialization completion.  */
> +};
> diff --git a/gcc/config/aarch64/aarch64-feature-deps.h b/gcc/config/aarch64/aarch64-feature-deps.h
> index 7b85a8860de57f6727644c03296cef192ad0990c..8f20582e1efdd4817138480bee8cdb27fa7f3dfe 100644
> --- a/gcc/config/aarch64/aarch64-feature-deps.h
> +++ b/gcc/config/aarch64/aarch64-feature-deps.h
> @@ -115,6 +115,13 @@ get_flags_off (aarch64_feature_flags mask)
>    constexpr auto cpu_##CORE_IDENT = ARCH_IDENT ().enable | get_enable FEATURES;
>  #include "config/aarch64/aarch64-cores.def"
>  
> +/* Define fmv_deps_<NAME> variables for each FMV feature, giving the transitive
> +   closure of all the features that the FMV feature enables.  */
> +#define AARCH64_FMV_FEATURE(A, FEAT_NAME, OPT_FLAGS) \
> +  constexpr auto fmv_deps_##FEAT_NAME = get_enable OPT_FLAGS;
> +#include "config/aarch64/aarch64-option-extensions.def"
> +
> +
>  }
>  }
>  
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
> index 825f3bf775899e2e5cffb1867b82766d632c8708..07df403491494d6dfe19095872ab32b9d60e9690 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -17,17 +17,22 @@
>     along with GCC; see the file COPYING3.  If not see
>     <http://www.gnu.org/licenses/>.  */
>  
> -/* This is a list of ISA extentsions in AArch64.
> +/* This is a list of ISA extensions in AArch64.
>  
> -   Before using #include to read this file, define a macro:
> +   Before using #include to read this file, define one of the following
> +   macros:
>  
>        AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,
>  			    EXPLICIT_OFF, FEATURE_STRING)
>  
> +      AARCH64_FMV_FEATURE(NAME, FEAT_NAME, IDENT)
> +
>     - NAME is the name of the extension, represented as a string constant.
>  
>     - IDENT is the canonical internal name for this flag.
>  
> +   - FEAT_NAME is the unprefixed name used in the CPUFeatures enum.
> +
>     - REQUIRES is a list of features that must be enabled whenever this
>       feature is enabled.  The relationship is implicitly transitive:
>       if A appears in B's REQUIRES and B appears in C's REQUIRES then
> @@ -58,45 +63,96 @@
>       that are required.  Their order is not important.  An empty string means
>       do not detect this feature during auto detection.
>  
> -   The list of features must follow topological order wrt REQUIRES
> -   and EXPLICIT_ON.  For example, if A is in B's REQUIRES list, A must
> -   come before B.  This is enforced by aarch64-feature-deps.h.
> +   - OPT_FLAGS is a list of feature IDENTS that should be enabled (along with
> +     their transitive dependencies) when the specified FMV feature is present.
> +
> +   Where a feature is present as both an extension and a function
> +   multiversioning feature, and IDENT matches the FEAT_NAME suffix, then these
> +   can be listed here simultaneously using the macro:
> +
> +      AARCH64_OPT_FMV_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,
> +				EXPLICIT_OFF, FEATURE_STRING)
> +
> +   The list of features extensions must follow topological order wrt REQUIRES
> +   and EXPLICIT_ON.  For example, if A is in B's REQUIRES list, A must come
> +   before B.  This is enforced by aarch64-feature-deps.h.
> +
> +   The list of multiversioning features must be ordered by increasing priority,
> +   as defined in https://github.com/ARM-software/acle/blob/main/main/acle.md
>  
>     NOTE: Any changes to the AARCH64_OPT_EXTENSION macro need to be mirrored in
>     config.gcc.  */
>  
> +#ifndef AARCH64_OPT_EXTENSION
> +#define AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON, \
> +			      EXPLICIT_OFF, FEATURE_STRING)
> +#endif
> +
> +#ifndef AARCH64_FMV_FEATURE
> +#define AARCH64_FMV_FEATURE(NAME, FEAT_NAME, OPT_FLAGS)
> +#endif
> +
> +#define AARCH64_OPT_FMV_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,   \
> +				  EXPLICIT_OFF, FEATURE_STRING)		\
> +AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON, EXPLICIT_OFF,	\
> +		      FEATURE_STRING)					\
> +AARCH64_FMV_FEATURE(NAME, IDENT, (IDENT))
> +
> +
>  AARCH64_OPT_EXTENSION("fp", FP, (), (), (), "fp")
>  
>  AARCH64_OPT_EXTENSION("simd", SIMD, (FP), (), (), "asimd")
>  
> -AARCH64_OPT_EXTENSION("crc", CRC, (), (), (), "crc32")
> +AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng")
>  
> -AARCH64_OPT_EXTENSION("lse", LSE, (), (), (), "atomics")
> +AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
>  
> -/* +nofp16 disables an implicit F16FML, even though an implicit F16FML
> -   does not imply F16.  See F16FML for more details.  */
> -AARCH64_OPT_EXTENSION("fp16", F16, (FP), (), (F16FML), "fphp asimdhp")
> +AARCH64_FMV_FEATURE("flagm2", FLAGM2, (FLAGM))
> +
> +AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
> +
> +AARCH64_OPT_FMV_EXTENSION("dotprod", DOTPROD, (SIMD), (), (), "asimddp")
>  
> -AARCH64_OPT_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
> +AARCH64_OPT_FMV_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
>  
>  /* An explicit +rdma implies +simd, but +rdma+nosimd still enables scalar
>     RDMA instructions.  */
>  AARCH64_OPT_EXTENSION("rdma", RDMA, (), (SIMD), (), "asimdrdm")
>  
> -AARCH64_OPT_EXTENSION("dotprod", DOTPROD, (SIMD), (), (), "asimddp")
> +AARCH64_FMV_FEATURE("rmd", RDM, (RDMA))
> +
> +AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
> +
> +AARCH64_FMV_FEATURE("fp", FP, (FP))
> +
> +AARCH64_FMV_FEATURE("simd", SIMD, (SIMD))
> +
> +AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
>  
> -AARCH64_OPT_EXTENSION("aes", AES, (SIMD), (), (), "aes")
> +AARCH64_FMV_FEATURE("sha1", SHA1, ())
>  
> -AARCH64_OPT_EXTENSION("sha2", SHA2, (SIMD), (), (), "sha1 sha2")
> +AARCH64_OPT_FMV_EXTENSION("sha2", SHA2, (SIMD), (), (), "sha1 sha2")
> +
> +AARCH64_FMV_FEATURE("sha3", SHA3, (SHA3))
> +
> +AARCH64_OPT_FMV_EXTENSION("aes", AES, (SIMD), (), (), "aes")
> +
> +AARCH64_FMV_FEATURE("pmull", PMULL, ())
>  
>  /* +nocrypto disables AES, SHA2 and SM4, and anything that depends on them
>     (such as SHA3 and the SVE2 crypto extensions).  */
>  AARCH64_OPT_EXTENSION("crypto", CRYPTO, (AES, SHA2), (), (AES, SHA2, SM4),
>  		      "aes pmull sha1 sha2")
>  
> +/* Listing sha3 after crypto means we pass "+aes+sha3" to the assembler
> +   instead of "+sha3+crypto".  */
>  AARCH64_OPT_EXTENSION("sha3", SHA3, (SHA2), (), (), "sha3 sha512")
>  
> -AARCH64_OPT_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
> +/* +nofp16 disables an implicit F16FML, even though an implicit F16FML
> +   does not imply F16.  See F16FML for more details.  */
> +AARCH64_OPT_EXTENSION("fp16", F16, (FP), (), (F16FML), "fphp asimdhp")
> +
> +AARCH64_FMV_FEATURE("fp16", FP16, (F16))
>  
>  /* An explicit +fp16fml implies +fp16, but a dependence on it does not.
>     Thus -march=armv8.4-a implies F16FML but not F16.  -march=armv8.4-a+fp16
> @@ -104,51 +160,117 @@ AARCH64_OPT_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
>     -march=armv8.4-a+nofp16+fp16 enables F16 but not F16FML.  */
>  AARCH64_OPT_EXTENSION("fp16fml", F16FML, (), (F16), (), "asimdfhm")
>  
> -AARCH64_OPT_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
> +AARCH64_FMV_FEATURE("dit", DIT, ())
>  
> -AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
> +AARCH64_FMV_FEATURE("dpb", DPB, ())
>  
> -AARCH64_OPT_EXTENSION("rng", RNG, (), (), (), "rng")
> +AARCH64_FMV_FEATURE("dpb2", DPB2, ())
>  
> -AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
> +AARCH64_FMV_FEATURE("jscvt", JSCVT, ())
>  
> -AARCH64_OPT_EXTENSION("sb", SB, (), (), (), "sb")
> +AARCH64_FMV_FEATURE("fcma", FCMA, (SIMD))
>  
> -AARCH64_OPT_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
> +AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
>  
> -AARCH64_OPT_EXTENSION("predres", PREDRES, (), (), (), "")
> +AARCH64_FMV_FEATURE("rcpc2", RCPC2, (RCPC))
>  
> -AARCH64_OPT_EXTENSION("sve2", SVE2, (SVE), (), (), "sve2")
> +AARCH64_FMV_FEATURE("rcpc3", RCPC3, (RCPC))
>  
> -AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), (), (), "svesm4")
> +AARCH64_FMV_FEATURE("frintts", FRINTTS, ())
> +
> +AARCH64_FMV_FEATURE("dgh", DGH, ())
> +
> +AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
> +
> +/* An explicit +bf16 implies +simd, but +bf16+nosimd still enables scalar BF16
> +   instructions.  */
> +AARCH64_OPT_FMV_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
> +
> +AARCH64_FMV_FEATURE("ebf16", EBF16, (BF16))
> +
> +AARCH64_FMV_FEATURE("rpres", RPRES, ())
> +
> +AARCH64_OPT_FMV_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
> +
> +AARCH64_FMV_FEATURE("sve-bf16", SVE_BF16, (SVE, BF16))
> +
> +AARCH64_FMV_FEATURE("sve-ebf16", SVE_EBF16, (SVE, BF16))
> +
> +AARCH64_FMV_FEATURE("sve-i8mm", SVE_I8MM, (SVE, I8MM))
> +
> +AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
> +
> +AARCH64_FMV_FEATURE("f32mm", SVE_F32MM, (F32MM))
> +
> +AARCH64_OPT_EXTENSION("f64mm", F64MM, (SVE), (), (), "f64mm")
> +
> +AARCH64_FMV_FEATURE("f64mm", SVE_F64MM, (F64MM))
> +
> +AARCH64_OPT_FMV_EXTENSION("sve2", SVE2, (SVE), (), (), "sve2")
>  
>  AARCH64_OPT_EXTENSION("sve2-aes", SVE2_AES, (SVE2, AES), (), (), "sveaes")
>  
> -AARCH64_OPT_EXTENSION("sve2-sha3", SVE2_SHA3, (SVE2, SHA3), (), (), "svesha3")
> +AARCH64_FMV_FEATURE("sve2-aes", SVE_AES, (SVE2, AES))
> +
> +AARCH64_FMV_FEATURE("sve2-pmull128", SVE_PMULL128, (SVE2))
>  
>  AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
>  		      "svebitperm")
>  
> -AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
> +AARCH64_FMV_FEATURE("sve2-bitperm", SVE_BITPERM, (SVE2_BITPERM))
>  
> -AARCH64_OPT_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
> +AARCH64_OPT_EXTENSION("sve2-sha3", SVE2_SHA3, (SVE2, SHA3), (), (), "svesha3")
>  
> -AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
> +AARCH64_FMV_FEATURE("sve2-sha3", SVE_SHA3, (SVE2_SHA3))
>  
> -AARCH64_OPT_EXTENSION("f64mm", F64MM, (SVE), (), (), "f64mm")
> +AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), (), (), "svesm4")
>  
> -/* An explicit +bf16 implies +simd, but +bf16+nosimd still enables scalar BF16
> -   instructions.  */
> -AARCH64_OPT_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
> +AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4))
> +
> +AARCH64_FMV_FEATURE("sme", SME, ())
>  
> -AARCH64_OPT_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
> +AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
> +
> +AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
> +
> +AARCH64_FMV_FEATURE("memtag3", MEMTAG3, (MEMTAG))
> +
> +AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
> +
> +AARCH64_OPT_FMV_EXTENSION("predres", PREDRES, (), (), (), "")
> +
> +AARCH64_OPT_FMV_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
> +
> +AARCH64_FMV_FEATURE("ssbs2", SSBS2, (SSBS))
> +
> +AARCH64_FMV_FEATURE("bti", BTI, ())
> +
> +AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
> +
> +AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
>  
>  AARCH64_OPT_EXTENSION("pauth", PAUTH, (), (), (), "paca pacg")
>  
>  AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "")
>  
> +AARCH64_FMV_FEATURE("ls64", LS64, ())
> +
> +AARCH64_FMV_FEATURE("ls64_v", LS64_V, ())
> +
> +AARCH64_FMV_FEATURE("ls64_accdata", LS64_ACCDATA, (LS64))
> +
> +AARCH64_FMV_FEATURE("wfxt", WFXT, ())
> +
> +AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, ())
> +
> +AARCH64_FMV_FEATURE("sme-i64i64", SME_I64, ())
> +
> +AARCH64_FMV_FEATURE("sme2", SME2, ())
> +
>  AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
>  
>  AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
>  
> +#undef AARCH64_OPT_FMV_EXTENSION
>  #undef AARCH64_OPT_EXTENSION
> +#undef AARCH64_FMV_FEATURE
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 800a8b0e11005416fb4e4b1222717629b16f3745..8721c0a923c53af2c2413ed90ccb05fa698c1f85 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -84,6 +84,7 @@
>  #include "aarch64-feature-deps.h"
>  #include "config/arm/aarch-common.h"
>  #include "config/arm/aarch-common-protos.h"
> +#include "common/config/aarch64/cpuinfo.h"
>  #include "ssa.h"
>  
>  /* This file should be included last.  */
> @@ -19525,6 +19526,8 @@ aarch64_process_target_attr (tree args)
>    return true;
>  }
>  
> +static bool aarch64_process_target_version_attr (tree args);
> +
>  /* Implement TARGET_OPTION_VALID_ATTRIBUTE_P.  This is used to
>     process attribute ((target ("..."))).  */
>  
> @@ -19580,6 +19583,19 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, tree args, int)
>  			      TREE_TARGET_OPTION (target_option_current_node));
>  
>    ret = aarch64_process_target_attr (args);
> +  if (ret)
> +    {
> +      tree version_attr = lookup_attribute ("target_version",
> +					    DECL_ATTRIBUTES (fndecl));
> +      if (version_attr != NULL_TREE)
> +	{
> +	  /* Reapply any target_version attribute after target attribute.
> +	     This should be equivalent to applying the target_version once
> +	     after processing all target attributes.  */
> +	  tree version_args = TREE_VALUE (version_attr);
> +	  ret = aarch64_process_target_version_attr (version_args);
> +	}
> +    }
>  
>    /* Set up any additional state.  */
>    if (ret)
> @@ -19610,6 +19626,821 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, tree args, int)
>    return ret;
>  }
>  
> +typedef unsigned long long aarch64_fmv_feature_mask;
> +
> +typedef struct
> +{
> +  const char *name;
> +  aarch64_fmv_feature_mask feature_mask;
> +  aarch64_feature_flags opt_flags;
> +} aarch64_fmv_feature_datum;
> +
> +#define AARCH64_FMV_FEATURE(NAME, FEAT_NAME, C) \
> +  {NAME, 1ULL << FEAT_##FEAT_NAME, ::feature_deps::fmv_deps_##FEAT_NAME},
> +
> +/* FMV features are listed in priority order, to make it easier to sort target
> +   strings.  */
> +static aarch64_fmv_feature_datum aarch64_fmv_feature_data[] = {
> +#include "config/aarch64/aarch64-option-extensions.def"
> +};
> +
> +
> +/* Parse a non-default fmv feature string, as found in a target_version or
> +   target_clones attribute.  */

The comment says non-default, but the function does handle "default".

It would be good to describe the arguments too.  E.g. something like:

/* Parse function multi-versioning feature string STR, as found in a
   target_version or target_clones attribute.  Add the selected FMV
   features to *FEATURE_MASK and the associated -march ISA extensions
   to *ISA_FLAGS.  If parsing fails due to an invalid or duplicate
   feature name, store that feature name in *INVALID_EXTENSION.  */

> +
> +static enum aarch_parse_opt_result
> +aarch64_parse_fmv_features (const char *str, aarch64_feature_flags *isa_flags,
> +			    aarch64_fmv_feature_mask *feature_mask,
> +			    std::string *invalid_extension)
> +{
> +  if (strcmp (str, "default") == 0)
> +    return AARCH_PARSE_OK;
> +
> +  while (str != NULL && *str != 0)
> +    {
> +      const char *ext;
> +      size_t len;
> +
> +      ext = strchr (str, '+');
> +
> +      if (ext != NULL)
> +	len = ext - str;
> +      else
> +	len = strlen (str);
> +
> +      if (len == 0)
> +	return AARCH_PARSE_MISSING_ARG;
> +
> +      static const int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
> +      int i;
> +      for (i = 0; i < num_features; i++)
> +	{
> +	  if (strlen (aarch64_fmv_feature_data[i].name) == len
> +	      && strncmp (aarch64_fmv_feature_data[i].name, str, len) == 0)
> +	    {
> +	      if (isa_flags)
> +		*isa_flags |= aarch64_fmv_feature_data[i].opt_flags;
> +	      if (feature_mask)
> +		{
> +		  auto old_feature_mask = *feature_mask;
> +		  *feature_mask |= aarch64_fmv_feature_data[i].feature_mask;
> +		  if (*feature_mask == old_feature_mask)
> +		    {
> +		      /* Duplicate feature.  */
> +		      if (invalid_extension)
> +			*invalid_extension = std::string (str, len);
> +		      return AARCH_PARSE_DUPLICATE_FEATURE;
> +		    }
> +		}
> +	      break;
> +	    }
> +	}
> +
> +      if (i == num_features)
> +	{
> +	  /* Feature not found in list.  */
> +	  if (invalid_extension)
> +	    *invalid_extension = std::string (str, len);
> +	  return AARCH_PARSE_INVALID_FEATURE;
> +	}
> +
> +      str = ext;
> +    }

Does this work for "feat1+feat2"?  It looks like str would be set to
"+feat2" for the second iteration, and then the strchr would likewise
return "+feat2", giving an empty string.

> +
> +  return AARCH_PARSE_OK;
> +}
> +
> +/* Parse the tree in ARGS that contains the target_version attribute
> +   information and update the global target options space.  */
> +
> +static bool
> +aarch64_process_target_version_attr (tree args)
> +{
> +  if (TREE_CODE (args) == TREE_LIST)
> +    {
> +      if (TREE_CHAIN (args))
> +	{
> +	  error ("attribute %<target_version%> has multiple values");
> +	  return false;
> +	}
> +      args = TREE_VALUE (args);
> +    }
> +
> +  if (!args || TREE_CODE (args) != STRING_CST)
> +    {
> +      error ("attribute %<target_version%> argument not a string");
> +      return false;
> +    }
> +
> +  const char *str = TREE_STRING_POINTER (args);
> +
> +  enum aarch_parse_opt_result parse_res;
> +  auto isa_flags = aarch64_asm_isa_flags;
> +
> +
> +  std::string invalid_extension;
> +  parse_res = aarch64_parse_fmv_features (str, &isa_flags, NULL,
> +					  &invalid_extension);
> +
> +  if (parse_res == AARCH_PARSE_OK)
> +    {
> +      aarch64_set_asm_isa_flags (isa_flags);
> +      return true;
> +    }
> +
> +  switch (parse_res)
> +    {
> +      case AARCH_PARSE_MISSING_ARG:
> +	error ("missing value in %<target_version%> attribute");
> +	break;
> +
> +      case AARCH_PARSE_INVALID_FEATURE:
> +	error ("invalid feature modifier %qs of value %qs in "
> +	       "%<target_version%> attribute", invalid_extension.c_str (),
> +	       str);
> +	break;
> +
> +      case AARCH_PARSE_DUPLICATE_FEATURE:
> +	error ("duplicate feature modifier %qs of value %qs in "
> +	       "%<target_version%> attribute", invalid_extension.c_str (),
> +	       str);
> +	break;
> +
> +      default:
> +	gcc_unreachable ();
> +    }

Formating nit: the convention is for cases to line up with the "{"
of the switch, so the switch body between { and } above should be
indented by 2 fewer columns.

> +
> +  return false;
> +}
> +
> +/* Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P.  This is used to
> +   process attribute ((target ("..."))).  */

attribute ((target_version ("...")))  ?

> +
> +static bool
> +aarch64_option_valid_version_attribute_p (tree fndecl, tree, tree args, int)
> +{
> +  struct cl_target_option cur_target;
> +  bool ret;
> +  tree new_target;
> +  tree existing_target = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
> +
> +  /* Save the current target options to restore at the end.  */
> +  cl_target_option_save (&cur_target, &global_options, &global_options_set);
> +
> +  /* If fndecl already has some target attributes applied to it, unpack
> +     them so that we add this attribute on top of them, rather than
> +     overwriting them.  */
> +  if (existing_target)
> +    {
> +      struct cl_target_option *existing_options
> +	= TREE_TARGET_OPTION (existing_target);
> +
> +      if (existing_options)
> +	cl_target_option_restore (&global_options, &global_options_set,
> +				  existing_options);
> +    }
> +  else
> +    cl_target_option_restore (&global_options, &global_options_set,
> +			      TREE_TARGET_OPTION (target_option_current_node));
> +
> +  ret = aarch64_process_target_version_attr (args);
> +
> +  /* Set up any additional state.  */
> +  if (ret)
> +    {
> +      aarch64_override_options_internal (&global_options);
> +      new_target = build_target_option_node (&global_options,
> +					     &global_options_set);
> +    }
> +  else
> +    new_target = NULL;
> +
> +  if (fndecl && ret)
> +    {
> +      DECL_FUNCTION_SPECIFIC_TARGET (fndecl) = new_target;
> +    }
> +
> +  cl_target_option_restore (&global_options, &global_options_set, &cur_target);
> +
> +  return ret;
> +}
> +
> +/* This parses the attribute arguments to target_version in DECL and the
> +   feature mask required to select those targets.  No adjustments are made to
> +   add or remove redundant feature requirements.  */
> +
> +static aarch64_fmv_feature_mask
> +get_feature_mask_for_version (tree decl)
> +{
> +  tree version_attr = lookup_attribute ("target_version",
> +					DECL_ATTRIBUTES (decl));
> +  if (version_attr == NULL)
> +    return 0;
> +
> +  const char *version_string = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE
> +						    (version_attr)));
> +  enum aarch_parse_opt_result parse_res;
> +  aarch64_fmv_feature_mask feature_mask = 0ULL;
> +
> +  parse_res = aarch64_parse_fmv_features (version_string, NULL, &feature_mask,
> +					  NULL);
> +
> +  /* We should have detected any errors before getting here.  */
> +  gcc_assert (parse_res == AARCH_PARSE_OK);
> +
> +  return feature_mask;
> +}
> +
> +/* Compare priorities of two feature masks. Return:
> +     1: mask1 is higher priority
> +    -1: mask2 is higher priority
> +     0: masks are equal.  */
> +
> +static int
> +compare_feature_masks (aarch64_fmv_feature_mask mask1,
> +		       aarch64_fmv_feature_mask mask2)
> +{
> +  int pop1 = popcount_hwi(mask1);
> +  int pop2 = popcount_hwi(mask2);

Nit: should be a space before "(mask1" and "(mask2".

> +  if (pop1 > pop2)
> +    return 1;
> +  if (pop2 > pop1)
> +    return -1;
> +
> +  auto diff_mask = mask1 ^ mask2;
> +  if (diff_mask == 0ULL)
> +    return 0;
> +  for (int i = FEAT_MAX - 1; i > 0; i--)
> +    {
> +      auto bit_mask = aarch64_fmv_feature_data[i].feature_mask;
> +      if (diff_mask & bit_mask)
> +	return (mask1 & bit_mask) ? 1 : -1;
> +    }
> +  gcc_unreachable();
> +}

Still not sure that this is the right criteria to use, but I suppose
we can adjust it post-commit to match any changes in the spec.

> +
> +int
> +aarch64_compare_version_priority (tree decl1, tree decl2)
> +{
> +  auto mask1 = get_feature_mask_for_version (decl1);
> +  auto mask2 = get_feature_mask_for_version (decl2);
> +
> +  return compare_feature_masks (mask1, mask2);
> +}
> +
> +/* Build the struct __ifunc_arg_t type:
> +
> +   struct __ifunc_arg_t
> +   {
> +     unsigned long _size; // Size of the struct, so it can grow.
> +     unsigned long _hwcap;
> +     unsigned long _hwcap2;
> +   }
> + */

This isn't ILP32-friendly, but I agree we need to stick to the types
that glibc uses.

> +
> +static tree
> +build_ifunc_arg_type ()
> +{
> +  tree ifunc_arg_type = lang_hooks.types.make_type (RECORD_TYPE);
> +  tree field1 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> +			    get_identifier ("_size"),
> +			    long_unsigned_type_node);
> +  tree field2 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> +			    get_identifier ("_hwcap"),
> +			    long_unsigned_type_node);
> +  tree field3 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> +			    get_identifier ("_hwcap2"),
> +			    long_unsigned_type_node);
> +
> +  DECL_FIELD_CONTEXT (field1) = ifunc_arg_type;
> +  DECL_FIELD_CONTEXT (field2) = ifunc_arg_type;
> +  DECL_FIELD_CONTEXT (field3) = ifunc_arg_type;
> +
> +  TYPE_FIELDS (ifunc_arg_type) = field1;
> +  DECL_CHAIN (field1) = field2;
> +  DECL_CHAIN (field2) = field3;
> +
> +  layout_type (ifunc_arg_type);
> +
> +  tree const_type = build_qualified_type (ifunc_arg_type, TYPE_QUAL_CONST);
> +  tree pointer_type = build_pointer_type (const_type);
> +
> +  return pointer_type;
> +}
> +
> +/* Make the resolver function decl to dispatch the versions of
> +   a multi-versioned function,  DEFAULT_DECL.  IFUNC_ALIAS_DECL is
> +   ifunc alias that will point to the created resolver.  Create an
> +   empty basic block in the resolver and store the pointer in
> +   EMPTY_BB.  Return the decl of the resolver function.  */
> +
> +static tree
> +make_resolver_func (const tree default_decl,
> +		    const tree ifunc_alias_decl,
> +		    basic_block *empty_bb)
> +{
> +  tree decl, type, t;
> +
> +  /* Create resolver function name based on default_decl.  */
> +  tree decl_name = clone_function_name (default_decl, "resolver");
> +  const char *resolver_name = IDENTIFIER_POINTER (decl_name);
> +
> +  /* The resolver function should have signature
> +     (void *) resolver (uint64_t, const __ifunc_arg_t *) */
> +  type = build_function_type_list (ptr_type_node,
> +				   uint64_type_node,
> +				   build_ifunc_arg_type(),
> +				   NULL_TREE);
> +
> +  decl = build_fn_decl (resolver_name, type);
> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
> +
> +  DECL_NAME (decl) = decl_name;
> +  TREE_USED (decl) = 1;
> +  DECL_ARTIFICIAL (decl) = 1;
> +  DECL_IGNORED_P (decl) = 1;
> +  TREE_PUBLIC (decl) = 0;
> +  DECL_UNINLINABLE (decl) = 1;
> +
> +  /* Resolver is not external, body is generated.  */
> +  DECL_EXTERNAL (decl) = 0;
> +  DECL_EXTERNAL (ifunc_alias_decl) = 0;
> +
> +  DECL_CONTEXT (decl) = NULL_TREE;
> +  DECL_INITIAL (decl) = make_node (BLOCK);
> +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
> +
> +  if (DECL_COMDAT_GROUP (default_decl)
> +      || TREE_PUBLIC (default_decl))
> +    {
> +      /* In this case, each translation unit with a call to this
> +	 versioned function will put out a resolver.  Ensure it
> +	 is comdat to keep just one copy.  */
> +      DECL_COMDAT (decl) = 1;
> +      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
> +    }
> +  else
> +    TREE_PUBLIC (ifunc_alias_decl) = 0;
> +
> +  /* Build result decl and add to function_decl. */
> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
> +  DECL_CONTEXT (t) = decl;
> +  DECL_ARTIFICIAL (t) = 1;
> +  DECL_IGNORED_P (t) = 1;
> +  DECL_RESULT (decl) = t;
> +
> +  /* Build parameter decls and add to function_decl. */
> +  tree arg1 = build_decl (UNKNOWN_LOCATION, PARM_DECL,
> +			  get_identifier ("hwcap"),
> +			  uint64_type_node);
> +  tree arg2 = build_decl (UNKNOWN_LOCATION, PARM_DECL,
> +			  get_identifier ("arg"),
> +			  build_ifunc_arg_type());
> +  DECL_CONTEXT (arg1) = decl;
> +  DECL_CONTEXT (arg2) = decl;
> +  DECL_ARTIFICIAL (arg1) = 1;
> +  DECL_ARTIFICIAL (arg2) = 1;
> +  DECL_IGNORED_P (arg1) = 1;
> +  DECL_IGNORED_P (arg2) = 1;
> +  DECL_ARG_TYPE (arg1) = uint64_type_node;
> +  DECL_ARG_TYPE (arg2) = build_ifunc_arg_type();

Nit: space before second "(".

> +  DECL_ARGUMENTS (decl) = arg1;
> +  TREE_CHAIN (arg1) = arg2;
> +
> +  gimplify_function_tree (decl);
> +  push_cfun (DECL_STRUCT_FUNCTION (decl));
> +  *empty_bb = init_lowered_empty_function (decl, false,
> +					   profile_count::uninitialized ());
> +
> +  cgraph_node::add_new_function (decl, true);
> +  symtab->call_cgraph_insertion_hooks (cgraph_node::get_create (decl));
> +
> +  pop_cfun ();
> +
> +  gcc_assert (ifunc_alias_decl != NULL);
> +  /* Mark ifunc_alias_decl as "ifunc" with resolver as resolver_name.  */
> +  DECL_ATTRIBUTES (ifunc_alias_decl)
> +    = make_attribute ("ifunc", resolver_name,
> +		      DECL_ATTRIBUTES (ifunc_alias_decl));
> +
> +  /* Create the alias for dispatch to resolver here.  */
> +  cgraph_node::create_same_body_alias (ifunc_alias_decl, decl);
> +  return decl;
> +}
> +
> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
> +   to return a pointer to VERSION_DECL if all feature bits specified in
> +   FEATURE_MASK are not set in MASK_VAR.  This function will be called during
> +   version dispatch to decide which function version to execute.  It returns
> +   the basic block at the end, to which more conditions can be added.  */
> +static basic_block
> +add_condition_to_bb (tree function_decl, tree version_decl,
> +		     aarch64_fmv_feature_mask feature_mask,
> +		     tree mask_var, basic_block new_bb)
> +{
> +  gimple *return_stmt;
> +  tree convert_expr, result_var;
> +  gimple *convert_stmt;
> +  gimple *if_else_stmt;
> +
> +  basic_block bb1, bb2, bb3;
> +  edge e12, e23;
> +
> +  gimple_seq gseq;
> +
> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
> +
> +  gcc_assert (new_bb != NULL);
> +  gseq = bb_seq (new_bb);
> +
> +
> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
> +			 build_fold_addr_expr (version_decl));
> +  result_var = create_tmp_var (ptr_type_node);
> +  convert_stmt = gimple_build_assign (result_var, convert_expr);
> +  return_stmt = gimple_build_return (result_var);
> +
> +

Nit: just one blank line (before and after the block).  Some other instances
in the patch too.

> +  if (feature_mask == 0ULL)
> +    {
> +      /* Default version.  */
> +      gimple_seq_add_stmt (&gseq, convert_stmt);
> +      gimple_seq_add_stmt (&gseq, return_stmt);
> +      set_bb_seq (new_bb, gseq);
> +      gimple_set_bb (convert_stmt, new_bb);
> +      gimple_set_bb (return_stmt, new_bb);
> +      pop_cfun ();
> +      return new_bb;
> +    }
> +
> +  tree and_expr_var = create_tmp_var (long_long_unsigned_type_node);
> +  tree and_expr = build2 (BIT_AND_EXPR,
> +			  long_long_unsigned_type_node,
> +			  mask_var,
> +			  build_int_cst (long_long_unsigned_type_node,
> +					 feature_mask));
> +  gimple *and_stmt = gimple_build_assign (and_expr_var, and_expr);
> +  gimple_set_block (and_stmt, DECL_INITIAL (function_decl));
> +  gimple_set_bb (and_stmt, new_bb);
> +  gimple_seq_add_stmt (&gseq, and_stmt);
> +
> +  tree zero_llu = build_int_cst (long_long_unsigned_type_node, 0);
> +  if_else_stmt = gimple_build_cond (EQ_EXPR, and_expr_var, zero_llu,
> +				    NULL_TREE, NULL_TREE);
> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
> +  gimple_set_bb (if_else_stmt, new_bb);
> +  gimple_seq_add_stmt (&gseq, if_else_stmt);
> +
> +  gimple_seq_add_stmt (&gseq, convert_stmt);
> +  gimple_seq_add_stmt (&gseq, return_stmt);
> +  set_bb_seq (new_bb, gseq);
> +
> +  bb1 = new_bb;
> +  e12 = split_block (bb1, if_else_stmt);
> +  bb2 = e12->dest;
> +  e12->flags &= ~EDGE_FALLTHRU;
> +  e12->flags |= EDGE_TRUE_VALUE;
> +
> +  e23 = split_block (bb2, return_stmt);
> +
> +  gimple_set_bb (convert_stmt, bb2);
> +  gimple_set_bb (return_stmt, bb2);
> +
> +  bb3 = e23->dest;
> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
> +
> +  remove_edge (e23);
> +  make_edge (bb2, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
> +
> +  pop_cfun ();
> +
> +  return bb3;
> +}
> +
> +/* Used when sorting the decls into dispatch order.  */
> +static int compare_feature_version_info (const void *p1, const void *p2)

Formatting nit: new line after "static int".

> +{
> +  struct _function_version_info
> +    {
> +      tree version_decl;
> +      aarch64_fmv_feature_mask feature_mask;
> +    };

Think we should move this struct out of the function so that it can
be shared by dispatch_function_versions.  Alternatively, the comparison
function could be a lambda within dispatch_function_versions.

It's best to avoid names starting with "_", since those are reserved
for the implementation.

> +  const _function_version_info v1 = *(const _function_version_info *)p1;
> +  const _function_version_info v2 = *(const _function_version_info *)p2;
> +  return - compare_feature_masks (v1.feature_mask, v2.feature_mask);
> +}
> +
> +static int
> +dispatch_function_versions (tree dispatch_decl,
> +			    void *fndecls_p,
> +			    basic_block *empty_bb)

Missing function comment.

> +{
> +  gimple *ifunc_cpu_init_stmt;
> +  gimple_seq gseq;
> +  vec<tree> *fndecls;
> +  unsigned int num_versions = 0;
> +  unsigned int actual_versions = 0;
> +  unsigned int i;
> +
> +  struct _function_version_info
> +    {
> +      tree version_decl;
> +      aarch64_fmv_feature_mask feature_mask;
> +    } *function_version_info;
> +
> +  gcc_assert (dispatch_decl != NULL
> +	      && fndecls_p != NULL
> +	      && empty_bb != NULL);
> +
> +  /*fndecls_p is actually a vector.  */
> +  fndecls = static_cast<vec<tree> *> (fndecls_p);
> +
> +  /* At least one more version other than the default.  */
> +  num_versions = fndecls->length ();
> +  gcc_assert (num_versions >= 2);
> +
> +  function_version_info = (struct _function_version_info *)
> +    XNEWVEC (struct _function_version_info, (num_versions));
> +
> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
> +
> +  gseq = bb_seq (*empty_bb);
> +  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
> +     constructors, so explicity call __init_cpu_features_resolver here.  */
> +  tree init_fn_type = build_function_type_list (void_type_node,
> +						long_unsigned_type_node,
> +						build_ifunc_arg_type(),
> +						NULL);
> +  tree init_fn_id = get_identifier ("__init_cpu_features_resolver");
> +  tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
> +				  init_fn_id, init_fn_type);
> +  tree arg1 = DECL_ARGUMENTS (dispatch_decl);
> +  tree arg2 = TREE_CHAIN (arg1);
> +  ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2);
> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
> +
> +  /* Build the struct type for __aarch64_cpu_features.  */
> +  tree global_type = lang_hooks.types.make_type (RECORD_TYPE);
> +  tree field1 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> +			    get_identifier ("features"),
> +			    long_long_unsigned_type_node);
> +  DECL_FIELD_CONTEXT (field1) = global_type;
> +  TYPE_FIELDS (global_type) = field1;
> +  layout_type (global_type);
> +
> +  tree global_var = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +				get_identifier ("__aarch64_cpu_features"),
> +				global_type);
> +  DECL_EXTERNAL (global_var) = 1;
> +  tree mask_var = create_tmp_var (long_long_unsigned_type_node);
> +
> +  tree component_expr = build3 (COMPONENT_REF, long_long_unsigned_type_node,
> +				global_var, field1, NULL_TREE);
> +  gimple *component_stmt = gimple_build_assign (mask_var, component_expr);
> +  gimple_set_block (component_stmt, DECL_INITIAL (dispatch_decl));
> +  gimple_set_bb (component_stmt, *empty_bb);
> +  gimple_seq_add_stmt (&gseq, component_stmt);
> +
> +  tree not_expr = build1 (BIT_NOT_EXPR, long_long_unsigned_type_node, mask_var);
> +  gimple *not_stmt = gimple_build_assign (mask_var, not_expr);
> +  gimple_set_block (not_stmt, DECL_INITIAL (dispatch_decl));
> +  gimple_set_bb (not_stmt, *empty_bb);
> +  gimple_seq_add_stmt (&gseq, not_stmt);
> +
> +  set_bb_seq (*empty_bb, gseq);
> +
> +  pop_cfun ();
> +
> +  for (tree version_decl : *fndecls)
> +    {
> +      aarch64_fmv_feature_mask feature_mask;
> +      /* Get attribute string, parse it and find the right features.  */
> +      feature_mask = get_feature_mask_for_version (version_decl);
> +      function_version_info [actual_versions].version_decl = version_decl;
> +      function_version_info [actual_versions].feature_mask = feature_mask;
> +      actual_versions++;
> +    }
> +
> +  /* Sort the versions according to descending order of dispatch priority.  */
> +  qsort (function_version_info, actual_versions,
> +	 sizeof (struct _function_version_info), compare_feature_version_info);
> +
> +  for (i = 0; i < actual_versions; ++i)
> +    *empty_bb = add_condition_to_bb (dispatch_decl,
> +				     function_version_info[i].version_decl,
> +				     function_version_info[i].feature_mask,
> +				     mask_var,
> +				     *empty_bb);
> +
> +  free (function_version_info);
> +  return 0;
> +}
> +
> +
> +tree
> +aarch64_generate_version_dispatcher_body (void *node_p)

Missing function comment.  Since the function implements a defined interface,
the comment can just be:

/* Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY.  */

> +{
> +  tree resolver_decl;
> +  basic_block empty_bb;
> +  tree default_ver_decl;
> +  struct cgraph_node *versn;
> +  struct cgraph_node *node;
> +
> +  struct cgraph_function_version_info *node_version_info = NULL;
> +  struct cgraph_function_version_info *versn_info = NULL;
> +
> +  node = (cgraph_node *)node_p;
> +
> +  node_version_info = node->function_version ();
> +  gcc_assert (node->dispatcher_function
> +	      && node_version_info != NULL);
> +
> +  if (node_version_info->dispatcher_resolver)
> +    return node_version_info->dispatcher_resolver;
> +
> +  /* The first version in the chain corresponds to the default version.  */
> +  default_ver_decl = node_version_info->next->this_node->decl;
> +
> +  /* node is going to be an alias, so remove the finalized bit.  */
> +  node->definition = false;
> +
> +  resolver_decl = make_resolver_func (default_ver_decl,
> +				      node->decl, &empty_bb);
> +
> +  node_version_info->dispatcher_resolver = resolver_decl;
> +
> +  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
> +
> +  auto_vec<tree, 2> fn_ver_vec;
> +
> +  for (versn_info = node_version_info->next; versn_info;
> +       versn_info = versn_info->next)
> +    {
> +      versn = versn_info->this_node;
> +      /* Check for virtual functions here again, as by this time it should
> +	 have been determined if this function needs a vtable index or
> +	 not.  This happens for methods in derived classes that override
> +	 virtual methods in base classes but are not explicitly marked as
> +	 virtual.  */
> +      if (DECL_VINDEX (versn->decl))
> +	sorry ("virtual function multiversioning not supported");
> +
> +      fn_ver_vec.safe_push (versn->decl);
> +    }
> +
> +  dispatch_function_versions (resolver_decl, &fn_ver_vec, &empty_bb);
> +  cgraph_edge::rebuild_edges ();
> +  pop_cfun ();
> +  return resolver_decl;
> +}
> +
> +/* Make a dispatcher declaration for the multi-versioned function DECL.
> +   Calls to DECL function will be replaced with calls to the dispatcher
> +   by the front-end.  Returns the decl of the dispatcher function.  */
> +
> +tree
> +aarch64_get_function_versions_dispatcher (void *decl)
> +{
> +  tree fn = (tree) decl;
> +  struct cgraph_node *node = NULL;
> +  struct cgraph_node *default_node = NULL;
> +  struct cgraph_function_version_info *node_v = NULL;
> +  struct cgraph_function_version_info *first_v = NULL;
> +
> +  tree dispatch_decl = NULL;
> +
> +  struct cgraph_function_version_info *default_version_info = NULL;
> +
> +  gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
> +
> +  node = cgraph_node::get (fn);
> +  gcc_assert (node != NULL);
> +
> +  node_v = node->function_version ();
> +  gcc_assert (node_v != NULL);
> +
> +  if (node_v->dispatcher_resolver != NULL)
> +    return node_v->dispatcher_resolver;
> +
> +  /* Find the default version and make it the first node.  */
> +  first_v = node_v;
> +  /* Go to the beginning of the chain.  */
> +  while (first_v->prev != NULL)
> +    first_v = first_v->prev;
> +  default_version_info = first_v;
> +  while (default_version_info != NULL)
> +    {
> +      if (get_feature_mask_for_version
> +	    (default_version_info->this_node->decl) == 0ULL)
> +	break;
> +      default_version_info = default_version_info->next;
> +    }
> +
> +  /* If there is no default node, just return NULL.  */
> +  if (default_version_info == NULL)
> +    return NULL;
> +
> +  /* Make default info the first node.  */
> +  if (first_v != default_version_info)
> +    {
> +      default_version_info->prev->next = default_version_info->next;
> +      if (default_version_info->next)
> +	default_version_info->next->prev = default_version_info->prev;
> +      first_v->prev = default_version_info;
> +      default_version_info->next = first_v;
> +      default_version_info->prev = NULL;
> +    }
> +
> +  default_node = default_version_info->this_node;
> +
> +  if (targetm.has_ifunc_p ())
> +    {
> +      struct cgraph_function_version_info *it_v = NULL;
> +      struct cgraph_node *dispatcher_node = NULL;
> +      struct cgraph_function_version_info *dispatcher_version_info = NULL;
> +
> +      /* Right now, the dispatching is done via ifunc.  */
> +      dispatch_decl = make_dispatcher_decl (default_node->decl);
> +      TREE_NOTHROW (dispatch_decl) = TREE_NOTHROW (fn);
> +
> +      dispatcher_node = cgraph_node::get_create (dispatch_decl);
> +      gcc_assert (dispatcher_node != NULL);
> +      dispatcher_node->dispatcher_function = 1;
> +      dispatcher_version_info
> +	= dispatcher_node->insert_new_function_version ();
> +      dispatcher_version_info->next = default_version_info;
> +      dispatcher_node->definition = 1;
> +
> +      /* Set the dispatcher for all the versions.  */
> +      it_v = default_version_info;
> +      while (it_v != NULL)
> +	{
> +	  it_v->dispatcher_resolver = dispatch_decl;
> +	  it_v = it_v->next;
> +	}
> +    }
> +  else
> +    {
> +      error_at (DECL_SOURCE_LOCATION (default_node->decl),
> +		"multiversioning needs %<ifunc%> which is not supported "
> +		"on this target");
> +    }
> +
> +  return dispatch_decl;
> +}
> +
> +bool
> +aarch64_common_function_versions (tree fn1, tree fn2)

Missing comment here too.  Same for other functions later.

> +{
> +  if (TREE_CODE (fn1) != FUNCTION_DECL
> +      || TREE_CODE (fn2) != FUNCTION_DECL)
> +    return false;
> +
> +  return (aarch64_compare_version_priority (fn1, fn2) != 0);
> +}
> +
> +
> +tree
> +aarch64_mangle_decl_assembler_name (tree decl, tree id)
> +{
> +  /* For function version, add the target suffix to the assembler name.  */
> +  if (TREE_CODE (decl) == FUNCTION_DECL
> +      && DECL_FUNCTION_VERSIONED (decl))
> +    {
> +      aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version (decl);
> +
> +      /* No suffix for the default version.  */
> +      if (feature_mask == 0ULL)
> +	return id;
> +
> +      char suffix[2048];
> +      int pos = 0;
> +      const char *base = IDENTIFIER_POINTER (id);
> +
> +      for (int i = 1; i < FEAT_MAX; i++)

Why does this start at 1 rather than 0?  Think it deserves a comment.

> +	{
> +	  if (feature_mask & aarch64_fmv_feature_data[i].feature_mask)
> +	    {
> +	      suffix[pos] = 'M';
> +	      strcpy (&suffix[pos+1], aarch64_fmv_feature_data[i].name);
> +	      pos += strlen(aarch64_fmv_feature_data[i].name) + 1;
> +	    }
> +	}
> +      suffix[pos] = '\0';
> +
> +      char *ret = XNEWVEC (char, strlen (base) + strlen (suffix) + 3);
> +      sprintf (ret, "%s._%s", base, suffix);

It isn't obvious that the limit of 2048 is or will stay safe.  Probably
best to build the suffix using a std::string instead.

Thanks,
Richard

> +
> +      if (DECL_ASSEMBLER_NAME_SET_P (decl))
> +	SET_DECL_RTL (decl, NULL);
> +
> +      id = get_identifier (ret);
> +    }
> +  return id;
> +}
> +
> +
>  /* Helper for aarch64_can_inline_p.  In the case where CALLER and CALLEE are
>     tri-bool options (yes, no, don't care) and the default value is
>     DEF, determine whether to reject inlining.  */
> @@ -28457,6 +29288,13 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_OPTION_VALID_ATTRIBUTE_P
>  #define TARGET_OPTION_VALID_ATTRIBUTE_P aarch64_option_valid_attribute_p
>  
> +#undef TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P
> +#define TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P \
> +  aarch64_option_valid_version_attribute_p
> +
> +#undef TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
> +#define TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE "target_version"
> +
>  #undef TARGET_SET_CURRENT_FUNCTION
>  #define TARGET_SET_CURRENT_FUNCTION aarch64_set_current_function
>  
> @@ -28787,6 +29625,24 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_CONST_ANCHOR
>  #define TARGET_CONST_ANCHOR 0x1000000
>  
> +#undef TARGET_OPTION_FUNCTION_VERSIONS
> +#define TARGET_OPTION_FUNCTION_VERSIONS aarch64_common_function_versions
> +
> +#undef TARGET_COMPARE_VERSION_PRIORITY
> +#define TARGET_COMPARE_VERSION_PRIORITY aarch64_compare_version_priority
> +
> +#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
> +#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
> +  aarch64_generate_version_dispatcher_body
> +
> +#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
> +#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
> +  aarch64_get_function_versions_dispatcher
> +
> +#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
> +#define TARGET_MANGLE_DECL_ASSEMBLER_NAME aarch64_mangle_decl_assembler_name
> +
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  #include "gt-aarch64.h"
> diff --git a/gcc/config/arm/aarch-common.h b/gcc/config/arm/aarch-common.h
> index c6a67f0d05cc75d85d019e1cc910c37173884c03..70f01fd3da6919dd98cfe92bfc4c54b7d2cba72c 100644
> --- a/gcc/config/arm/aarch-common.h
> +++ b/gcc/config/arm/aarch-common.h
> @@ -23,7 +23,7 @@
>  #define GCC_AARCH_COMMON_H
>  
>  /* Enum describing the various ways that the
> -   aarch*_parse_{arch,tune,cpu,extension} functions can fail.
> +   aarch*_parse_{arch,tune,cpu,extension,fmv_extension} functions can fail.
>     This way their callers can choose what kind of error to give.  */
>  
>  enum aarch_parse_opt_result
> @@ -31,7 +31,8 @@ enum aarch_parse_opt_result
>    AARCH_PARSE_OK,			/* Parsing was successful.  */
>    AARCH_PARSE_MISSING_ARG,		/* Missing argument.  */
>    AARCH_PARSE_INVALID_FEATURE,		/* Invalid feature modifier.  */
> -  AARCH_PARSE_INVALID_ARG		/* Invalid arch, tune, cpu arg.  */
> +  AARCH_PARSE_INVALID_ARG,		/* Invalid arch, tune, cpu arg.  */
> +  AARCH_PARSE_DUPLICATE_FEATURE		/* Duplicate feature modifier.  */
>  };
>  
>  /* Function types -msign-return-address should sign.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
> index 8499f87c39b173491a89626af56f4e193b1d12b5..8b7d7d2d8a00f6d5a6a35ffca28be7f1ff4cb9c7 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
> @@ -7,6 +7,6 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
> +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto} } } */
>  
>  /* Test a normal looking procinfo.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
> index 551669091c7010379a4c5247a27c517c4e67ef98..234a1ce1d7b4714e64c95c15488784d73c0552f2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
> @@ -7,6 +7,6 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
> +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto} } } */
>  
>  /* Test one with mixed order of feature bits.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> index 2f963bb2312711691f6f1c5989a100b88671ad52..bd3ea96a785de507578729a621ec4ae7bad8a516 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> @@ -7,6 +7,6 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\+sve2} } } */
> +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+sve2} } } */
>  
>  /* Test a normal looking procinfo.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
> index c68a697aa3e97ef52fd7e90233c5bb4ac8dbddd9..33e6319b46dcebc717e8a415484093e980660fb5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
> @@ -7,6 +7,6 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\+sve2} } } */
> +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+sve2} } } */
>  
>  /* Test a normal looking procinfo.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> index b5f0a3005f50cbf01edbcb8aefcc3c34aa11207f..abae7a7d1453f79f879ff5e24f7c67e819db1dbb 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> @@ -7,7 +7,7 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8.6-a\+crc\+fp16\+aes\+sha3\+rng} } } */
> +/* { dg-final { scan-assembler {\.arch armv8.6-a\+rng\+crc\+aes\+sha3\+fp16} } } */
>  
>  /* Test one where the boundary of buffer size would overwrite the last
>     character read when stitching the fgets-calls together.  With the
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
> index 980d3f79dfb03b0d8eb68f691bf2dedf80aed87d..a5b4b4d3442c6522a8cdadf4eebd3b5460e37213 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
> @@ -7,7 +7,7 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+profile\+memtag\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+nopauth\n} } } */
> +/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+memtag\+profile\+nopauth\n} } } */
>  
>  /* Test one that if the kernel doesn't report the availability of a mandatory
>     feature that it has turned it off for whatever reason.  As such compilers
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
> index 117df2b0b6cd5751d9f5175b4343aad9825a6c43..e12aa543d02924f268729f96fe1f17181287f097 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
> @@ -7,7 +7,7 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+profile\+memtag\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\n} } } */
> +/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+memtag\+profile\n} } } */
>  
>  /* Check whether features that don't have a midr name during detection are
>     correctly ignored.  These features shouldn't affect the native detection.
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
> index efbd02cbdc0638db85e776f1e79043709c11df21..920e1d65711cbcb77b07441597180c0159ccabf9 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
> @@ -7,7 +7,7 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+lse\+rcpc\+rdma\+dotprod\+fp16fml\+sb\+ssbs\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+flagm\n} } } */
> +/* { dg-final { scan-assembler {\.arch armv8-a\+flagm\+dotprod\+rdma\+lse\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n} } } */
>  
>  /* Check that an Armv8-A core doesn't fall apart on extensions without midr
>     values.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
> index d431d4938265d024891b464ac3d069607b21d8e7..416a29b514ab7599a7092e26e3716ec8a50cc895 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
> @@ -7,7 +7,7 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+lse\+rcpc\+rdma\+dotprod\+fp16fml\+sb\+ssbs\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+flagm\+pauth\n} } } */
> +/* { dg-final { scan-assembler {\.arch armv8-a\+flagm\+dotprod\+rdma\+lse\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n} } } */
>  
>  /* Check that an Armv8-A core doesn't fall apart on extensions without midr
>     values and that it enables optional features.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
> index 7608e8845a662219488effcdb8277006dcf457a9..907249c5c1e6a440731533407df0ff7caadcbf74 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
> @@ -7,7 +7,7 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8-a\+fp16\+crypto} } } */
> +/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+fp16} } } */
>  
> -/* Test one where the feature bits for crypto and fp16 are given in
> -   same order as declared in options file.  */
> +/* Test one where the crypto and fp16 options are specified in different
> +   order from what is in the options file.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
> index 72b14b4f6ed0d50a4fc8a35931fbd232b09d2b61..b68a07a7c16b7a3cc9a896cca152d78e5cf9ea2f 100644
> --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
> @@ -7,7 +7,7 @@ int main()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8-a\+fp16\+crypto} } } */
> +/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+fp16} } } */
>  
> -/* Test one where the crypto and fp16 options are specified in different
> -   order from what is in the options file.  */
> +/* Test one where the feature bits for crypto and fp16 are given in
> +   same order as declared in options file.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_17.c b/gcc/testsuite/gcc.target/aarch64/options_set_17.c
> index c490e1f47a0a7a3adcbb7e96a3974d5651a023e8..4c53edd5cb92f83b3d34454c85062ff3f67b50ee 100644
> --- a/gcc/testsuite/gcc.target/aarch64/options_set_17.c
> +++ b/gcc/testsuite/gcc.target/aarch64/options_set_17.c
> @@ -6,6 +6,6 @@ int main ()
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler {\.arch armv8\.2-a\+crc\+dotprod} } } */
> +/* { dg-final { scan-assembler {\.arch armv8\.2-a\+dotprod\+crc} } } */
>  
>   /* dotprod needs to be emitted pre armv8.4.  */
> diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
> index 0888ca4ed058430f524b99cb0e204bd996fa0e55..78664d5a4287be0369a4b02e1b8ab4a885869352 100644
> --- a/libgcc/config/aarch64/cpuinfo.c
> +++ b/libgcc/config/aarch64/cpuinfo.c
> @@ -22,6 +22,8 @@
>     see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> +#include "common/config/aarch64/cpuinfo.h"
> +
>  #if defined(__has_include)
>  #if __has_include(<sys/auxv.h>)
>  #include <sys/auxv.h>
> @@ -39,73 +41,6 @@ typedef struct __ifunc_arg_t {
>  #if __has_include(<asm/hwcap.h>)
>  #include <asm/hwcap.h>
>  
> -/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
> -enum CPUFeatures {
> -  FEAT_RNG,
> -  FEAT_FLAGM,
> -  FEAT_FLAGM2,
> -  FEAT_FP16FML,
> -  FEAT_DOTPROD,
> -  FEAT_SM4,
> -  FEAT_RDM,
> -  FEAT_LSE,
> -  FEAT_FP,
> -  FEAT_SIMD,
> -  FEAT_CRC,
> -  FEAT_SHA1,
> -  FEAT_SHA2,
> -  FEAT_SHA3,
> -  FEAT_AES,
> -  FEAT_PMULL,
> -  FEAT_FP16,
> -  FEAT_DIT,
> -  FEAT_DPB,
> -  FEAT_DPB2,
> -  FEAT_JSCVT,
> -  FEAT_FCMA,
> -  FEAT_RCPC,
> -  FEAT_RCPC2,
> -  FEAT_FRINTTS,
> -  FEAT_DGH,
> -  FEAT_I8MM,
> -  FEAT_BF16,
> -  FEAT_EBF16,
> -  FEAT_RPRES,
> -  FEAT_SVE,
> -  FEAT_SVE_BF16,
> -  FEAT_SVE_EBF16,
> -  FEAT_SVE_I8MM,
> -  FEAT_SVE_F32MM,
> -  FEAT_SVE_F64MM,
> -  FEAT_SVE2,
> -  FEAT_SVE_AES,
> -  FEAT_SVE_PMULL128,
> -  FEAT_SVE_BITPERM,
> -  FEAT_SVE_SHA3,
> -  FEAT_SVE_SM4,
> -  FEAT_SME,
> -  FEAT_MEMTAG,
> -  FEAT_MEMTAG2,
> -  FEAT_MEMTAG3,
> -  FEAT_SB,
> -  FEAT_PREDRES,
> -  FEAT_SSBS,
> -  FEAT_SSBS2,
> -  FEAT_BTI,
> -  FEAT_LS64,
> -  FEAT_LS64_V,
> -  FEAT_LS64_ACCDATA,
> -  FEAT_WFXT,
> -  FEAT_SME_F64,
> -  FEAT_SME_I64,
> -  FEAT_SME2,
> -  FEAT_RCPC3,
> -  FEAT_MAX,
> -  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
> -		    in __aarch64_cpu_features.  */
> -  FEAT_INIT      /* Used as flag of features initialization completion.  */
> -};
> -
>  /* Architecture features used in Function Multi Versioning.  */
>  struct {
>    unsigned long long features;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/5] Add support for target_version attribute
  2023-11-17  2:55 ` [PATCH v2 4/5] Add support for target_version attribute Andrew Carlotti
@ 2023-11-29 17:53   ` Richard Sandiford
  2023-12-04 11:14     ` Andrew Carlotti
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2023-11-29 17:53 UTC (permalink / raw)
  To: Andrew Carlotti
  Cc: gcc-patches, ebotcazou, poulhies, ibuclaw, jason, nathan,
	rguenther, richard.earnshaw

Andrew Carlotti <andrew.carlotti@arm.com> writes:
> This patch adds support for the "target_version" attribute to the middle
> end and the C++ frontend, which will be used to implement function
> multiversioning in the aarch64 backend.
>
> On targets that don't use the "target" attribute for multiversioning,
> there is no conflict between the "target" and "target_clones"
> attributes.  This patch therefore makes the mutual exclusion in
> C-family, D and Ada conditonal upon the value of the
> expanded_clones_attribute target hook.
>
> The "target_version" attribute is only added to C++ in this patch,
> because this is currently the only frontend which supports
> multiversioning using the "target" attribute.  Support for the
> "target_version" attribute will be extended to C at a later date.
>
> Targets that currently use the "target" attribute for function
> multiversioning (i.e. i386 and rs6000) are not affected by this patch.
>
> Ok for master?
>
> gcc/ChangeLog:
>
> 	* attribs.cc (decl_attributes): Pass attribute name to target.
> 	(is_function_default_version): Update comment to specify
> 	incompatibility with target_version attributes.
> 	* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
> 	Call valid_version_attribute_p for target_version attributes.
> 	* target.def (valid_version_attribute_p): New hook.
> 	(expanded_clones_attribute): New hook.
> 	* doc/tm.texi.in: Add new hooks.
> 	* doc/tm.texi: Regenerate.
> 	* multiple_target.cc (create_dispatcher_calls): Remove redundant
> 	is_function_default_version check.
> 	(expand_target_clones): Use target hook for attribute name.
> 	* targhooks.cc (default_target_option_valid_version_attribute_p):
> 	New.
> 	* targhooks.h (default_target_option_valid_version_attribute_p):
> 	New.
> 	* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
> 	target_version attributes.
>
> gcc/c-family/ChangeLog:
>
> 	* c-attribs.cc (CLONES_USES_TARGET): New macro.
> 	(attr_target_exclusions): Use new macro.
> 	(attr_target_clones_exclusions): Ditto, and add target_version.
> 	(attr_target_version_exclusions): New.
> 	(c_common_attribute_table): Add target_version.
> 	(handle_target_version_attribute): New.
>
> gcc/ada/ChangeLog:
>
> 	* gcc-interface/utils.cc (CLONES_USES_TARGET): New macro.
> 	(attr_target_exclusions): Use new macro.
> 	(attr_target_clones_exclusions): Ditto.
>
> gcc/d/ChangeLog:
>
> 	* d-attribs.cc (CLONES_USES_TARGET): New macro.
> 	(attr_target_exclusions): Use new macro.
> 	(attr_target_clones_exclusions): Ditto.
>
> gcc/cp/ChangeLog:
>
> 	* decl2.cc (check_classfn): Update comment to include
> 	target_version attributes.
>
>
> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> index e33a63948cebdeafc3abcdd539a35141969ad978..8850943cb3326568b4679a73405f50487aa1b7c6 100644
> --- a/gcc/ada/gcc-interface/utils.cc
> +++ b/gcc/ada/gcc-interface/utils.cc
> @@ -143,16 +143,21 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
>    { NULL, false, false, false },
>  };
>  
> +#define CLONES_USES_TARGET \
> +  (strcmp (targetm.target_option.expanded_clones_attribute, \
> +	   "target") == 0)
> +

Sorry for the slower review on this part.  I was hoping inspiration
would strike for a way to resolve this, but it hasn't, so:

The codebase usually avoids static variables that need dynamic
initialisation.  So although macros are not the preferred way of
doing things, I think one is probably appropriate here.  How about:

  TARGET_HAS_FMV_TARGET_ATTRIBUTE

with the default being true, and with AArch64 defining it to false?

This would replace the expanded_clones_attribute hook, with:

  const char *new_attr_name = targetm.target_option.expanded_clones_attribute;

becoming:

  const char *new_attr_name = (TARGET_HAS_FMV_TARGET_ATTRIBUTE
			       ? "target" : "target_version");

I realise this is anything but elegant, but I think it's probably
the least worst option, given where we are.

>  static const struct attribute_spec::exclusions attr_target_exclusions[] =
>  {
> -  { "target_clones", true, true, true },
> +  { "target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
> +    CLONES_USES_TARGET },
>    { NULL, false, false, false },
>  };
>  
>  static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
>  {
>    { "always_inline", true, true, true },
> -  { "target", true, true, true },
> +  { "target", CLONES_USES_TARGET, CLONES_USES_TARGET, CLONES_USES_TARGET },
>    { NULL, false, false, false },
>  };
>  
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index f9fd258598914ce2112ecaaeaad6c63cd69a44e2..27533023ef5c481ba085c2f0c605dfb992987b3e 100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
>       options to the attribute((target(...))) list.  */
>    if (TREE_CODE (*node) == FUNCTION_DECL
>        && current_target_pragma
> -      && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
> +      && targetm.target_option.valid_attribute_p (*node,
> +						  get_identifier("target"),

Formatting nit: should be a space before ("target")

>  						  current_target_pragma, 0))
>      {
>        tree cur_attr = lookup_attribute ("target", attributes);
> @@ -1241,8 +1242,9 @@ make_dispatcher_decl (const tree decl)
>    return func_decl;  
>  }
>  
> -/* Returns true if decl is multi-versioned and DECL is the default function,
> -   that is it is not tagged with target specific optimization.  */
> +/* Returns true if DECL is multi-versioned using the target attribute, and this
> +   is the default version.  This function can only be used for targets that do
> +   not support the "target_version" attribute.  */
>  
>  bool
>  is_function_default_version (const tree decl)
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index b3b41ef123a0f171f57acb1b7f7fdde716428c00..8e33b7c3f4a9e7dcaa299eeff0eea92240f7ef0a 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -149,6 +149,7 @@ static tree handle_alloc_align_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_target_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_target_version_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
>  static tree ignore_attribute (tree *, tree, tree, int, bool *);
> @@ -228,16 +229,29 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
>    ATTR_EXCL (NULL, false, false, false),
>  };
>  
> +#define CLONES_USES_TARGET \
> +  (strcmp (targetm.target_option.expanded_clones_attribute, \
> +	   "target") == 0)
> +
>  static const struct attribute_spec::exclusions attr_target_exclusions[] =
>  {
> -  ATTR_EXCL ("target_clones", true, true, true),
> +  ATTR_EXCL ("target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
> +	     CLONES_USES_TARGET),
>    ATTR_EXCL (NULL, false, false, false),
>  };
>  
>  static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
>  {
>    ATTR_EXCL ("always_inline", true, true, true),
> -  ATTR_EXCL ("target", true, true, true),
> +  ATTR_EXCL ("target", CLONES_USES_TARGET, CLONES_USES_TARGET,
> +	     CLONES_USES_TARGET),
> +  ATTR_EXCL ("target_version", true, true, true),
> +  ATTR_EXCL (NULL, false, false, false),
> +};
> +
> +static const struct attribute_spec::exclusions attr_target_version_exclusions[] =
> +{
> +  ATTR_EXCL ("target_clones", true, true, true),

Just FTR: I suppose this should also include "target" if there is ever
a port that uses "target" and "target_version" for the same thing, but
there's no need to predict that case.

>    ATTR_EXCL (NULL, false, false, false),
>  };
>  
> @@ -505,6 +519,9 @@ const struct attribute_spec c_common_attribute_table[] =
>    { "target",                 1, -1, true, false, false, false,
>  			      handle_target_attribute,
>  			      attr_target_exclusions },
> +  { "target_version",         1, 1, true, false, false, false,
> +			      handle_target_version_attribute,
> +			      attr_target_version_exclusions },
>    { "target_clones",          1, -1, true, false, false, false,
>  			      handle_target_clones_attribute,
>  			      attr_target_clones_exclusions },
> @@ -5670,6 +5687,25 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
>    return NULL_TREE;
>  }
>  
> +/* Handle a "target_version" attribute.  */
> +
> +static tree
> +handle_target_version_attribute (tree *node, tree name, tree args, int flags,
> +				  bool *no_add_attrs)
> +{
> +  /* Ensure we have a function type.  */
> +  if (TREE_CODE (*node) != FUNCTION_DECL)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute ignored", name);
> +      *no_add_attrs = true;
> +    }
> +  else if (!targetm.target_option.valid_version_attribute_p (*node, name, args,
> +							     flags))
> +    *no_add_attrs = true;
> +
> +  return NULL_TREE;
> +}
> +
>  /* Handle a "target_clones" attribute.  */
>  
>  static tree
> diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
> index 29d28ef895a73a223695cbb86aafbc845bbe7688..8af6b23d8c0306920e0fdcb3559ef047a16689f4 100644
> --- a/gcc/cgraphclones.cc
> +++ b/gcc/cgraphclones.cc
> @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-eh.h"
>  #include "tree-cfg.h"
>  #include "tree-inline.h"
> +#include "attribs.h"
>  #include "dumpfile.h"
>  #include "gimple-pretty-print.h"
>  #include "alloc-pool.h"
> @@ -1048,7 +1049,17 @@ cgraph_node::create_version_clone_with_body
>        location_t saved_loc = input_location;
>        tree v = TREE_VALUE (target_attributes);
>        input_location = DECL_SOURCE_LOCATION (new_decl);
> -      bool r = targetm.target_option.valid_attribute_p (new_decl, NULL, v, 1);
> +      bool r;
> +      tree name_id = get_attribute_name (target_attributes);
> +      const char* name_str = IDENTIFIER_POINTER (name_id);

Formatting nit, sorry, but: "const char* name_str".

> +      if (strcmp (name_str, "target") == 0)
> +	r = targetm.target_option.valid_attribute_p (new_decl, name_id, v, 1);
> +      else if (strcmp (name_str, "target_version") == 0)
> +	r = targetm.target_option.valid_version_attribute_p (new_decl, name_id,
> +							     v, 1);
> +      else
> +	gcc_assert(false);

gcc_unreachable ();

LGTM otherwise, thanks.

Richard

> +
>        input_location = saved_loc;
>        if (!r)
>  	return NULL;
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index 9e666e5eecee07ae7c742c3a2b27e85899945c4e..e607aa14d284d545d122e04b0eae1247fd301882 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -832,8 +832,8 @@ check_classfn (tree ctype, tree function, tree template_parms)
>        tree c2 = get_constraints (fndecl);
>  
>        /* While finding a match, same types and params are not enough
> -	 if the function is versioned.  Also check version ("target")
> -	 attributes.  */
> +	 if the function is versioned.  Also check for different target
> +	 specific attributes.  */
>        if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
>  		       TREE_TYPE (TREE_TYPE (fndecl)))
>  	  && compparms (p1, p2)
> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
> index c0dc0e24ded871c136e54e5527e901d16cfa5ceb..7fe68565e70dd1124aac63601416dad68600a34e 100644
> --- a/gcc/d/d-attribs.cc
> +++ b/gcc/d/d-attribs.cc
> @@ -126,16 +126,22 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
>    ATTR_EXCL (NULL, false, false, false),
>  };
>  
> +#define CLONES_USES_TARGET \
> +  (strcmp (targetm.target_option.expanded_clones_attribute, \
> +	   "target") == 0)
> +
>  static const struct attribute_spec::exclusions attr_target_exclusions[] =
>  {
> -  ATTR_EXCL ("target_clones", true, true, true),
> +  ATTR_EXCL ("target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
> +	     CLONES_USES_TARGET),
>    ATTR_EXCL (NULL, false, false, false),
>  };
>  
>  static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
>  {
>    ATTR_EXCL ("always_inline", true, true, true),
> -  ATTR_EXCL ("target", true, true, true),
> +  ATTR_EXCL ("target", CLONES_USES_TARGET, CLONES_USES_TARGET,
> +	     CLONES_USES_TARGET),
>    ATTR_EXCL (NULL, false, false, false),
>  };
>  
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index d83ca73b1aff90d3c181436afedc162b977a4158..6f6b133803f4574fcf0112b1385eec861112ddd5 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -10644,6 +10644,23 @@ the function declaration to hold a pointer to a target-specific
>  @code{struct cl_target_option} structure.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} bool TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P (tree @var{fndecl}, tree @var{name}, tree @var{args}, int @var{flags})
> +This hook is called to parse @code{attribute(target_version("..."))},
> +which allows setting target-specific options on individual function versions.
> +These function-specific options may differ
> +from the options specified on the command line.  The hook should return
> +@code{true} if the options are valid.
> +
> +The hook should set the @code{DECL_FUNCTION_SPECIFIC_TARGET} field in
> +the function declaration to hold a pointer to a target-specific
> +@code{struct cl_target_option} structure.
> +@end deftypefn
> +
> +@deftypevr {Target Hook} {const char *} TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
> +Contains the name of the attribute used for the version description string
> +when expanding clones for a function with the target_clones attribute.
> +@end deftypevr
> +
>  @deftypefn {Target Hook} void TARGET_OPTION_SAVE (struct cl_target_option *@var{ptr}, struct gcc_options *@var{opts}, struct gcc_options *@var{opts_set})
>  This hook is called to save any additional target-specific information
>  in the @code{struct cl_target_option} structure for function-specific
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 3d3ae12cc2ff62025b1138430de501a33961fd90..149c88f627be20a9a35ead2eaebdb704e51927fa 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -7028,6 +7028,10 @@ on this implementation detail.
>  
>  @hook TARGET_OPTION_VALID_ATTRIBUTE_P
>  
> +@hook TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P
> +
> +@hook TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
> +
>  @hook TARGET_OPTION_SAVE
>  
>  @hook TARGET_OPTION_RESTORE
> diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
> index a2ed048d7dd28ec470953fcd8a0dc86817e4b7dc..3db57c2b13d612a37240d9dcf58ad21b2286633c 100644
> --- a/gcc/multiple_target.cc
> +++ b/gcc/multiple_target.cc
> @@ -66,10 +66,6 @@ create_dispatcher_calls (struct cgraph_node *node)
>  {
>    ipa_ref *ref;
>  
> -  if (!DECL_FUNCTION_VERSIONED (node->decl)
> -      || !is_function_default_version (node->decl))
> -    return;
> -
>    if (!targetm.has_ifunc_p ())
>      {
>        error_at (DECL_SOURCE_LOCATION (node->decl),
> @@ -377,6 +373,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
>        return false;
>      }
>  
> +  const char *new_attr_name = targetm.target_option.expanded_clones_attribute;
>    cgraph_function_version_info *decl1_v = NULL;
>    cgraph_function_version_info *decl2_v = NULL;
>    cgraph_function_version_info *before = NULL;
> @@ -392,7 +389,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
>        char *attr = attrs[i];
>  
>        /* Create new target clone.  */
> -      tree attributes = make_attribute ("target", attr,
> +      tree attributes = make_attribute (new_attr_name, attr,
>  					DECL_ATTRIBUTES (node->decl));
>  
>        char *suffix = XNEWVEC (char, strlen (attr) + 1);
> @@ -430,7 +427,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
>    XDELETEVEC (attr_str);
>  
>    /* Setting new attribute to initial function.  */
> -  tree attributes = make_attribute ("target", "default",
> +  tree attributes = make_attribute (new_attr_name, "default",
>  				    DECL_ATTRIBUTES (node->decl));
>    DECL_ATTRIBUTES (node->decl) = attributes;
>    node->local = false;
> diff --git a/gcc/target.def b/gcc/target.def
> index 0996da0f71a85f8217a41ceb08de8b21087e4ed9..1d2e0d8bf03a8b949ec636e6a78a111308d3dd71 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -6533,6 +6533,31 @@ the function declaration to hold a pointer to a target-specific\n\
>   bool, (tree fndecl, tree name, tree args, int flags),
>   default_target_option_valid_attribute_p)
>  
> +/* Function to validate the attribute((target_version(...))) strings.  If
> +   the option is validated, the hook should also fill in
> +   DECL_FUNCTION_SPECIFIC_TARGET in the function decl node.  */
> +DEFHOOK
> +(valid_version_attribute_p,
> + "This hook is called to parse @code{attribute(target_version(\"...\"))},\n\
> +which allows setting target-specific options on individual function versions.\n\
> +These function-specific options may differ\n\
> +from the options specified on the command line.  The hook should return\n\
> +@code{true} if the options are valid.\n\
> +\n\
> +The hook should set the @code{DECL_FUNCTION_SPECIFIC_TARGET} field in\n\
> +the function declaration to hold a pointer to a target-specific\n\
> +@code{struct cl_target_option} structure.",
> + bool, (tree fndecl, tree name, tree args, int flags),
> + default_target_option_valid_version_attribute_p)
> +
> +/* Attribute to be used when expanding clones for functions with
> +   target_clones attribute.  */
> +DEFHOOKPOD
> +(expanded_clones_attribute,
> + "Contains the name of the attribute used for the version description string\n\
> +when expanding clones for a function with the target_clones attribute.",
> + const char *, "target")
> +
>  /* Function to save any extra target state in the target options structure.  */
>  DEFHOOK
>  (save,
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 189549cb1c742c37c17623141989b492a7c2b2f8..ff2957fd9fd8389e23992281b35e8e5467072f7d 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -192,6 +192,7 @@ extern bool default_hard_regno_scratch_ok (unsigned int);
>  extern bool default_mode_dependent_address_p (const_rtx, addr_space_t);
>  extern bool default_new_address_profitable_p (rtx, rtx_insn *, rtx);
>  extern bool default_target_option_valid_attribute_p (tree, tree, tree, int);
> +extern bool default_target_option_valid_version_attribute_p (tree, tree, tree, int);
>  extern bool default_target_option_pragma_parse (tree, tree);
>  extern bool default_target_can_inline_p (tree, tree);
>  extern bool default_update_ipa_fn_target_info (unsigned int &, const gimple *);
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index 4f5b240f8d65eeeaf73418c9f1e2c2684b257cfa..b693352d7eae555912477b6e431dd9c016105007 100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -1789,7 +1789,19 @@ default_target_option_valid_attribute_p (tree ARG_UNUSED (fndecl),
>  					 int ARG_UNUSED (flags))
>  {
>    warning (OPT_Wattributes,
> -	   "target attribute is not supported on this machine");
> +	   "%<target%> attribute is not supported on this machine");
> +
> +  return false;
> +}
> +
> +bool
> +default_target_option_valid_version_attribute_p (tree ARG_UNUSED (fndecl),
> +						 tree ARG_UNUSED (name),
> +						 tree ARG_UNUSED (args),
> +						 int ARG_UNUSED (flags))
> +{
> +  warning (OPT_Wattributes,
> +	   "%<target_version%> attribute is not supported on this machine");
>  
>    return false;
>  }
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 086b55f0375435d53a1604b6659da4f19fce3d17..d7841af19b20b0dc0ae28b433d5150e9c4763eff 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -3500,8 +3500,8 @@ extern vec<tree, va_gc> **decl_debug_args_insert (tree);
>     (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
>  
>  /* In FUNCTION_DECL, this is set if this function has other versions generated
> -   using "target" attributes.  The default version is the one which does not
> -   have any "target" attribute set. */
> +   to support different architecture feature sets, e.g. using "target" or
> +   "target_version" attributes.  */
>  #define DECL_FUNCTION_VERSIONED(NODE)\
>     (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc
  2023-11-20 15:46   ` Richard Sandiford
@ 2023-12-04 10:31     ` Andrew Carlotti
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Carlotti @ 2023-12-04 10:31 UTC (permalink / raw)
  To: gcc-patches, richard.earnshaw, richard.sandiford

On Mon, Nov 20, 2023 at 03:46:06PM +0000, Richard Sandiford wrote:
> Andrew Carlotti <andrew.carlotti@arm.com> writes:
> > This is added to enable function multiversioning, but can also be used
> > directly.  The interface is chosen to match that used in LLVM's
> > compiler-rt, to facilitate cross-compiler compatibility.
> >
> > The content of the patch is derived almost entirely from Pavel's prior
> > contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
> > changes to align more closely with GCC coding style, and to exclude any code
> > from other LLVM contributors, and am adding this to GCC with Pavel's approval.
> >
> > libgcc/ChangeLog:
> >
> > 	* config/aarch64/t-aarch64: Include cpuinfo.c
> > 	* config/aarch64/cpuinfo.c: New file
> > 	(__init_cpu_features_constructor) New.
> > 	(__init_cpu_features_resolver) New.
> > 	(__init_cpu_features) New.
> 
> OK on the basis that you mentioed in the covering note: we can deal
> with fixes incrementally.  One question though...
> >
> > Co-authored-by: Pavel Iliin <Pavel.Iliin@arm.com>
> >
> >
> > diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..0888ca4ed058430f524b99cb0e204bd996fa0e55
> > --- /dev/null
> > +++ b/libgcc/config/aarch64/cpuinfo.c
> > @@ -0,0 +1,502 @@
> > +/* CPU feature detection for AArch64 architecture.
> > +   Copyright (C) 2023 Free Software Foundation, Inc.
> > +
> > +   This file is part of GCC.
> > +
> > +   This file is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by the
> > +   Free Software Foundation; either version 3, or (at your option) any
> > +   later version.
> > +
> > +   This file is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   Under Section 7 of GPL version 3, you are granted additional
> > +   permissions described in the GCC Runtime Library Exception, version
> > +   3.1, as published by the Free Software Foundation.
> > +  
> > +   You should have received a copy of the GNU General Public License and
> > +   a copy of the GCC Runtime Library Exception along with this program;
> > +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +#if defined(__has_include)
> 
> Is this protecting against a known condition?  libgcc has to be built
> with the associated version of GCC, so it might be better to drop the
> #if and get a noisy failure if something unexpected happens.  That can
> be part of 5/5 though.
> 
> Thanks,
> Richard

I don't know that this is required, so I'll assume it isn't.  I'll drop it in
the next version of this patch.

> > +#if __has_include(<sys/auxv.h>)
> > +#include <sys/auxv.h>
> > +
> > +#if __has_include(<sys/ifunc.h>)
> > +#include <sys/ifunc.h>
> > +#else
> > +typedef struct __ifunc_arg_t {
> > +  unsigned long _size;
> > +  unsigned long _hwcap;
> > +  unsigned long _hwcap2;
> > +} __ifunc_arg_t;
> > +#endif
> > +
> > +#if __has_include(<asm/hwcap.h>)
> > +#include <asm/hwcap.h>
> > +
> > +/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
> > +enum CPUFeatures {
> > +  FEAT_RNG,
> > +  FEAT_FLAGM,
> > +  FEAT_FLAGM2,
> > +  FEAT_FP16FML,
> > +  FEAT_DOTPROD,
> > +  FEAT_SM4,
> > +  FEAT_RDM,
> > +  FEAT_LSE,
> > +  FEAT_FP,
> > +  FEAT_SIMD,
> > +  FEAT_CRC,
> > +  FEAT_SHA1,
> > +  FEAT_SHA2,
> > +  FEAT_SHA3,
> > +  FEAT_AES,
> > +  FEAT_PMULL,
> > +  FEAT_FP16,
> > +  FEAT_DIT,
> > +  FEAT_DPB,
> > +  FEAT_DPB2,
> > +  FEAT_JSCVT,
> > +  FEAT_FCMA,
> > +  FEAT_RCPC,
> > +  FEAT_RCPC2,
> > +  FEAT_FRINTTS,
> > +  FEAT_DGH,
> > +  FEAT_I8MM,
> > +  FEAT_BF16,
> > +  FEAT_EBF16,
> > +  FEAT_RPRES,
> > +  FEAT_SVE,
> > +  FEAT_SVE_BF16,
> > +  FEAT_SVE_EBF16,
> > +  FEAT_SVE_I8MM,
> > +  FEAT_SVE_F32MM,
> > +  FEAT_SVE_F64MM,
> > +  FEAT_SVE2,
> > +  FEAT_SVE_AES,
> > +  FEAT_SVE_PMULL128,
> > +  FEAT_SVE_BITPERM,
> > +  FEAT_SVE_SHA3,
> > +  FEAT_SVE_SM4,
> > +  FEAT_SME,
> > +  FEAT_MEMTAG,
> > +  FEAT_MEMTAG2,
> > +  FEAT_MEMTAG3,
> > +  FEAT_SB,
> > +  FEAT_PREDRES,
> > +  FEAT_SSBS,
> > +  FEAT_SSBS2,
> > +  FEAT_BTI,
> > +  FEAT_LS64,
> > +  FEAT_LS64_V,
> > +  FEAT_LS64_ACCDATA,
> > +  FEAT_WFXT,
> > +  FEAT_SME_F64,
> > +  FEAT_SME_I64,
> > +  FEAT_SME2,
> > +  FEAT_RCPC3,
> > +  FEAT_MAX,
> > +  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
> > +		    in __aarch64_cpu_features.  */
> > +  FEAT_INIT      /* Used as flag of features initialization completion.  */
> > +};
> > +
> > +/* Architecture features used in Function Multi Versioning.  */
> > +struct {
> > +  unsigned long long features;
> > +  /* As features grows new fields could be added.  */
> > +} __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
> > +
> > +#ifndef _IFUNC_ARG_HWCAP
> > +#define _IFUNC_ARG_HWCAP (1ULL << 62)
> > +#endif
> > +#ifndef AT_HWCAP
> > +#define AT_HWCAP 16
> > +#endif
> > +#ifndef HWCAP_CPUID
> > +#define HWCAP_CPUID (1 << 11)
> > +#endif
> > +#ifndef HWCAP_FP
> > +#define HWCAP_FP (1 << 0)
> > +#endif
> > +#ifndef HWCAP_ASIMD
> > +#define HWCAP_ASIMD (1 << 1)
> > +#endif
> > +#ifndef HWCAP_AES
> > +#define HWCAP_AES (1 << 3)
> > +#endif
> > +#ifndef HWCAP_PMULL
> > +#define HWCAP_PMULL (1 << 4)
> > +#endif
> > +#ifndef HWCAP_SHA1
> > +#define HWCAP_SHA1 (1 << 5)
> > +#endif
> > +#ifndef HWCAP_SHA2
> > +#define HWCAP_SHA2 (1 << 6)
> > +#endif
> > +#ifndef HWCAP_ATOMICS
> > +#define HWCAP_ATOMICS (1 << 8)
> > +#endif
> > +#ifndef HWCAP_FPHP
> > +#define HWCAP_FPHP (1 << 9)
> > +#endif
> > +#ifndef HWCAP_ASIMDHP
> > +#define HWCAP_ASIMDHP (1 << 10)
> > +#endif
> > +#ifndef HWCAP_ASIMDRDM
> > +#define HWCAP_ASIMDRDM (1 << 12)
> > +#endif
> > +#ifndef HWCAP_JSCVT
> > +#define HWCAP_JSCVT (1 << 13)
> > +#endif
> > +#ifndef HWCAP_FCMA
> > +#define HWCAP_FCMA (1 << 14)
> > +#endif
> > +#ifndef HWCAP_LRCPC
> > +#define HWCAP_LRCPC (1 << 15)
> > +#endif
> > +#ifndef HWCAP_DCPOP
> > +#define HWCAP_DCPOP (1 << 16)
> > +#endif
> > +#ifndef HWCAP_SHA3
> > +#define HWCAP_SHA3 (1 << 17)
> > +#endif
> > +#ifndef HWCAP_SM3
> > +#define HWCAP_SM3 (1 << 18)
> > +#endif
> > +#ifndef HWCAP_SM4
> > +#define HWCAP_SM4 (1 << 19)
> > +#endif
> > +#ifndef HWCAP_ASIMDDP
> > +#define HWCAP_ASIMDDP (1 << 20)
> > +#endif
> > +#ifndef HWCAP_SHA512
> > +#define HWCAP_SHA512 (1 << 21)
> > +#endif
> > +#ifndef HWCAP_SVE
> > +#define HWCAP_SVE (1 << 22)
> > +#endif
> > +#ifndef HWCAP_ASIMDFHM
> > +#define HWCAP_ASIMDFHM (1 << 23)
> > +#endif
> > +#ifndef HWCAP_DIT
> > +#define HWCAP_DIT (1 << 24)
> > +#endif
> > +#ifndef HWCAP_ILRCPC
> > +#define HWCAP_ILRCPC (1 << 26)
> > +#endif
> > +#ifndef HWCAP_FLAGM
> > +#define HWCAP_FLAGM (1 << 27)
> > +#endif
> > +#ifndef HWCAP_SSBS
> > +#define HWCAP_SSBS (1 << 28)
> > +#endif
> > +#ifndef HWCAP_SB
> > +#define HWCAP_SB (1 << 29)
> > +#endif
> > +
> > +#ifndef HWCAP2_DCPODP
> > +#define HWCAP2_DCPODP (1 << 0)
> > +#endif
> > +#ifndef HWCAP2_SVE2
> > +#define HWCAP2_SVE2 (1 << 1)
> > +#endif
> > +#ifndef HWCAP2_SVEAES
> > +#define HWCAP2_SVEAES (1 << 2)
> > +#endif
> > +#ifndef HWCAP2_SVEPMULL
> > +#define HWCAP2_SVEPMULL (1 << 3)
> > +#endif
> > +#ifndef HWCAP2_SVEBITPERM
> > +#define HWCAP2_SVEBITPERM (1 << 4)
> > +#endif
> > +#ifndef HWCAP2_SVESHA3
> > +#define HWCAP2_SVESHA3 (1 << 5)
> > +#endif
> > +#ifndef HWCAP2_SVESM4
> > +#define HWCAP2_SVESM4 (1 << 6)
> > +#endif
> > +#ifndef HWCAP2_FLAGM2
> > +#define HWCAP2_FLAGM2 (1 << 7)
> > +#endif
> > +#ifndef HWCAP2_FRINT
> > +#define HWCAP2_FRINT (1 << 8)
> > +#endif
> > +#ifndef HWCAP2_SVEI8MM
> > +#define HWCAP2_SVEI8MM (1 << 9)
> > +#endif
> > +#ifndef HWCAP2_SVEF32MM
> > +#define HWCAP2_SVEF32MM (1 << 10)
> > +#endif
> > +#ifndef HWCAP2_SVEF64MM
> > +#define HWCAP2_SVEF64MM (1 << 11)
> > +#endif
> > +#ifndef HWCAP2_SVEBF16
> > +#define HWCAP2_SVEBF16 (1 << 12)
> > +#endif
> > +#ifndef HWCAP2_I8MM
> > +#define HWCAP2_I8MM (1 << 13)
> > +#endif
> > +#ifndef HWCAP2_BF16
> > +#define HWCAP2_BF16 (1 << 14)
> > +#endif
> > +#ifndef HWCAP2_DGH
> > +#define HWCAP2_DGH (1 << 15)
> > +#endif
> > +#ifndef HWCAP2_RNG
> > +#define HWCAP2_RNG (1 << 16)
> > +#endif
> > +#ifndef HWCAP2_BTI
> > +#define HWCAP2_BTI (1 << 17)
> > +#endif
> > +#ifndef HWCAP2_MTE
> > +#define HWCAP2_MTE (1 << 18)
> > +#endif
> > +#ifndef HWCAP2_RPRES
> > +#define HWCAP2_RPRES (1 << 21)
> > +#endif
> > +#ifndef HWCAP2_MTE3
> > +#define HWCAP2_MTE3 (1 << 22)
> > +#endif
> > +#ifndef HWCAP2_SME
> > +#define HWCAP2_SME (1 << 23)
> > +#endif
> > +#ifndef HWCAP2_SME_I16I64
> > +#define HWCAP2_SME_I16I64 (1 << 24)
> > +#endif
> > +#ifndef HWCAP2_SME_F64F64
> > +#define HWCAP2_SME_F64F64 (1 << 25)
> > +#endif
> > +#ifndef HWCAP2_WFXT
> > +#define HWCAP2_WFXT (1UL << 31)
> > +#endif
> > +#ifndef HWCAP2_EBF16
> > +#define HWCAP2_EBF16 (1UL << 32)
> > +#endif
> > +#ifndef HWCAP2_SVE_EBF16
> > +#define HWCAP2_SVE_EBF16 (1UL << 33)
> > +#endif
> > +
> > +static void
> > +__init_cpu_features_constructor(unsigned long hwcap,
> > +				const __ifunc_arg_t *arg) {
> > +#define setCPUFeature(F) __aarch64_cpu_features.features |= 1ULL << F
> > +#define getCPUFeature(id, ftr) __asm__("mrs %0, " #id : "=r"(ftr))
> > +#define extractBits(val, start, number) \
> > +  (val & ((1ULL << number) - 1ULL) << start) >> start
> > +  unsigned long hwcap2 = 0;
> > +  if (hwcap & _IFUNC_ARG_HWCAP)
> > +    hwcap2 = arg->_hwcap2;
> > +  if (hwcap & HWCAP_CRC32)
> > +    setCPUFeature(FEAT_CRC);
> > +  if (hwcap & HWCAP_PMULL)
> > +    setCPUFeature(FEAT_PMULL);
> > +  if (hwcap & HWCAP_FLAGM)
> > +    setCPUFeature(FEAT_FLAGM);
> > +  if (hwcap2 & HWCAP2_FLAGM2) {
> > +    setCPUFeature(FEAT_FLAGM);
> > +    setCPUFeature(FEAT_FLAGM2);
> > +  }
> > +  if (hwcap & HWCAP_SM3 && hwcap & HWCAP_SM4)
> > +    setCPUFeature(FEAT_SM4);
> > +  if (hwcap & HWCAP_ASIMDDP)
> > +    setCPUFeature(FEAT_DOTPROD);
> > +  if (hwcap & HWCAP_ASIMDFHM)
> > +    setCPUFeature(FEAT_FP16FML);
> > +  if (hwcap & HWCAP_FPHP) {
> > +    setCPUFeature(FEAT_FP16);
> > +    setCPUFeature(FEAT_FP);
> > +  }
> > +  if (hwcap & HWCAP_DIT)
> > +    setCPUFeature(FEAT_DIT);
> > +  if (hwcap & HWCAP_ASIMDRDM)
> > +    setCPUFeature(FEAT_RDM);
> > +  if (hwcap & HWCAP_ILRCPC)
> > +    setCPUFeature(FEAT_RCPC2);
> > +  if (hwcap & HWCAP_AES)
> > +    setCPUFeature(FEAT_AES);
> > +  if (hwcap & HWCAP_SHA1)
> > +    setCPUFeature(FEAT_SHA1);
> > +  if (hwcap & HWCAP_SHA2)
> > +    setCPUFeature(FEAT_SHA2);
> > +  if (hwcap & HWCAP_JSCVT)
> > +    setCPUFeature(FEAT_JSCVT);
> > +  if (hwcap & HWCAP_FCMA)
> > +    setCPUFeature(FEAT_FCMA);
> > +  if (hwcap & HWCAP_SB)
> > +    setCPUFeature(FEAT_SB);
> > +  if (hwcap & HWCAP_SSBS)
> > +    setCPUFeature(FEAT_SSBS2);
> > +  if (hwcap2 & HWCAP2_MTE) {
> > +    setCPUFeature(FEAT_MEMTAG);
> > +    setCPUFeature(FEAT_MEMTAG2);
> > +  }
> > +  if (hwcap2 & HWCAP2_MTE3) {
> > +    setCPUFeature(FEAT_MEMTAG);
> > +    setCPUFeature(FEAT_MEMTAG2);
> > +    setCPUFeature(FEAT_MEMTAG3);
> > +  }
> > +  if (hwcap2 & HWCAP2_SVEAES)
> > +    setCPUFeature(FEAT_SVE_AES);
> > +  if (hwcap2 & HWCAP2_SVEPMULL) {
> > +    setCPUFeature(FEAT_SVE_AES);
> > +    setCPUFeature(FEAT_SVE_PMULL128);
> > +  }
> > +  if (hwcap2 & HWCAP2_SVEBITPERM)
> > +    setCPUFeature(FEAT_SVE_BITPERM);
> > +  if (hwcap2 & HWCAP2_SVESHA3)
> > +    setCPUFeature(FEAT_SVE_SHA3);
> > +  if (hwcap2 & HWCAP2_SVESM4)
> > +    setCPUFeature(FEAT_SVE_SM4);
> > +  if (hwcap2 & HWCAP2_DCPODP)
> > +    setCPUFeature(FEAT_DPB2);
> > +  if (hwcap & HWCAP_ATOMICS)
> > +    setCPUFeature(FEAT_LSE);
> > +  if (hwcap2 & HWCAP2_RNG)
> > +    setCPUFeature(FEAT_RNG);
> > +  if (hwcap2 & HWCAP2_I8MM)
> > +    setCPUFeature(FEAT_I8MM);
> > +  if (hwcap2 & HWCAP2_EBF16)
> > +    setCPUFeature(FEAT_EBF16);
> > +  if (hwcap2 & HWCAP2_SVE_EBF16)
> > +    setCPUFeature(FEAT_SVE_EBF16);
> > +  if (hwcap2 & HWCAP2_DGH)
> > +    setCPUFeature(FEAT_DGH);
> > +  if (hwcap2 & HWCAP2_FRINT)
> > +    setCPUFeature(FEAT_FRINTTS);
> > +  if (hwcap2 & HWCAP2_SVEI8MM)
> > +    setCPUFeature(FEAT_SVE_I8MM);
> > +  if (hwcap2 & HWCAP2_SVEF32MM)
> > +    setCPUFeature(FEAT_SVE_F32MM);
> > +  if (hwcap2 & HWCAP2_SVEF64MM)
> > +    setCPUFeature(FEAT_SVE_F64MM);
> > +  if (hwcap2 & HWCAP2_BTI)
> > +    setCPUFeature(FEAT_BTI);
> > +  if (hwcap2 & HWCAP2_RPRES)
> > +    setCPUFeature(FEAT_RPRES);
> > +  if (hwcap2 & HWCAP2_WFXT)
> > +    setCPUFeature(FEAT_WFXT);
> > +  if (hwcap2 & HWCAP2_SME)
> > +    setCPUFeature(FEAT_SME);
> > +  if (hwcap2 & HWCAP2_SME_I16I64)
> > +    setCPUFeature(FEAT_SME_I64);
> > +  if (hwcap2 & HWCAP2_SME_F64F64)
> > +    setCPUFeature(FEAT_SME_F64);
> > +  if (hwcap & HWCAP_CPUID) {
> > +    unsigned long ftr;
> > +    getCPUFeature(ID_AA64PFR1_EL1, ftr);
> > +    /* ID_AA64PFR1_EL1.MTE >= 0b0001  */
> > +    if (extractBits(ftr, 8, 4) >= 0x1)
> > +      setCPUFeature(FEAT_MEMTAG);
> > +    /* ID_AA64PFR1_EL1.SSBS == 0b0001  */
> > +    if (extractBits(ftr, 4, 4) == 0x1)
> > +      setCPUFeature(FEAT_SSBS);
> > +    /* ID_AA64PFR1_EL1.SME == 0b0010  */
> > +    if (extractBits(ftr, 24, 4) == 0x2)
> > +      setCPUFeature(FEAT_SME2);
> > +    getCPUFeature(ID_AA64PFR0_EL1, ftr);
> > +    /* ID_AA64PFR0_EL1.FP != 0b1111  */
> > +    if (extractBits(ftr, 16, 4) != 0xF) {
> > +      setCPUFeature(FEAT_FP);
> > +      /* ID_AA64PFR0_EL1.AdvSIMD has the same value as ID_AA64PFR0_EL1.FP  */
> > +      setCPUFeature(FEAT_SIMD);
> > +    }
> > +    /* ID_AA64PFR0_EL1.SVE != 0b0000  */
> > +    if (extractBits(ftr, 32, 4) != 0x0) {
> > +      /* get ID_AA64ZFR0_EL1, that name supported if sve enabled only  */
> > +      getCPUFeature(S3_0_C0_C4_4, ftr);
> > +      /* ID_AA64ZFR0_EL1.SVEver == 0b0000  */
> > +      if (extractBits(ftr, 0, 4) == 0x0)
> > +	setCPUFeature(FEAT_SVE);
> > +      /* ID_AA64ZFR0_EL1.SVEver == 0b0001  */
> > +      if (extractBits(ftr, 0, 4) == 0x1)
> > +	setCPUFeature(FEAT_SVE2);
> > +      /* ID_AA64ZFR0_EL1.BF16 != 0b0000  */
> > +      if (extractBits(ftr, 20, 4) != 0x0)
> > +	setCPUFeature(FEAT_SVE_BF16);
> > +    }
> > +    getCPUFeature(ID_AA64ISAR0_EL1, ftr);
> > +    /* ID_AA64ISAR0_EL1.SHA3 != 0b0000  */
> > +    if (extractBits(ftr, 32, 4) != 0x0)
> > +      setCPUFeature(FEAT_SHA3);
> > +    getCPUFeature(ID_AA64ISAR1_EL1, ftr);
> > +    /* ID_AA64ISAR1_EL1.DPB >= 0b0001  */
> > +    if (extractBits(ftr, 0, 4) >= 0x1)
> > +      setCPUFeature(FEAT_DPB);
> > +    /* ID_AA64ISAR1_EL1.LRCPC != 0b0000  */
> > +    if (extractBits(ftr, 20, 4) != 0x0)
> > +      setCPUFeature(FEAT_RCPC);
> > +    /* ID_AA64ISAR1_EL1.LRCPC == 0b0011  */
> > +    if (extractBits(ftr, 20, 4) == 0x3)
> > +      setCPUFeature(FEAT_RCPC3);
> > +    /* ID_AA64ISAR1_EL1.SPECRES == 0b0001  */
> > +    if (extractBits(ftr, 40, 4) == 0x2)
> > +      setCPUFeature(FEAT_PREDRES);
> > +    /* ID_AA64ISAR1_EL1.BF16 != 0b0000  */
> > +    if (extractBits(ftr, 44, 4) != 0x0)
> > +      setCPUFeature(FEAT_BF16);
> > +    /* ID_AA64ISAR1_EL1.LS64 >= 0b0001  */
> > +    if (extractBits(ftr, 60, 4) >= 0x1)
> > +      setCPUFeature(FEAT_LS64);
> > +    /* ID_AA64ISAR1_EL1.LS64 >= 0b0010  */
> > +    if (extractBits(ftr, 60, 4) >= 0x2)
> > +      setCPUFeature(FEAT_LS64_V);
> > +    /* ID_AA64ISAR1_EL1.LS64 >= 0b0011  */
> > +    if (extractBits(ftr, 60, 4) >= 0x3)
> > +      setCPUFeature(FEAT_LS64_ACCDATA);
> > +  } else {
> > +    /* Set some features in case of no CPUID support.  */
> > +    if (hwcap & (HWCAP_FP | HWCAP_FPHP)) {
> > +      setCPUFeature(FEAT_FP);
> > +      /* FP and AdvSIMD fields have the same value.  */
> > +      setCPUFeature(FEAT_SIMD);
> > +    }
> > +    if (hwcap & HWCAP_DCPOP || hwcap2 & HWCAP2_DCPODP)
> > +      setCPUFeature(FEAT_DPB);
> > +    if (hwcap & HWCAP_LRCPC || hwcap & HWCAP_ILRCPC)
> > +      setCPUFeature(FEAT_RCPC);
> > +    if (hwcap2 & HWCAP2_BF16 || hwcap2 & HWCAP2_EBF16)
> > +      setCPUFeature(FEAT_BF16);
> > +    if (hwcap2 & HWCAP2_SVEBF16)
> > +      setCPUFeature(FEAT_SVE_BF16);
> > +    if (hwcap2 & HWCAP2_SVE2 && hwcap & HWCAP_SVE)
> > +      setCPUFeature(FEAT_SVE2);
> > +    if (hwcap & HWCAP_SHA3)
> > +      setCPUFeature(FEAT_SHA3);
> > +  }
> > +  setCPUFeature(FEAT_INIT);
> > +}
> > +
> > +void
> > +__init_cpu_features_resolver(unsigned long hwcap, const __ifunc_arg_t *arg) {
> > +  if (__aarch64_cpu_features.features)
> > +    return;
> > +  __init_cpu_features_constructor(hwcap, arg);
> > +}
> > +
> > +void __attribute__ ((constructor))
> > +__init_cpu_features(void) {
> > +  unsigned long hwcap;
> > +  unsigned long hwcap2;
> > +  /* CPU features already initialized.  */
> > +  if (__aarch64_cpu_features.features)
> > +    return;
> > +  hwcap = getauxval(AT_HWCAP);
> > +  hwcap2 = getauxval(AT_HWCAP2);
> > +  __ifunc_arg_t arg;
> > +  arg._size = sizeof(__ifunc_arg_t);
> > +  arg._hwcap = hwcap;
> > +  arg._hwcap2 = hwcap2;
> > +  __init_cpu_features_constructor(hwcap | _IFUNC_ARG_HWCAP, &arg);
> > +#undef extractBits
> > +#undef getCPUFeature
> > +#undef setCPUFeature
> > +}
> > +#endif /* __has_include(<asm/hwcap.h>)  */
> > +#endif /* __has_include(<sys/auxv.h>)  */
> > +#endif /* defined(__has_include)  */
> > diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64
> > index a40b6241c86ecc4007b5cfd28aa989ee894aa410..8bc1a4ca0c2eb75c17e62a25aa45a875bfd472f8 100644
> > --- a/libgcc/config/aarch64/t-aarch64
> > +++ b/libgcc/config/aarch64/t-aarch64
> > @@ -19,3 +19,4 @@
> >  # <http://www.gnu.org/licenses/>.
> >  
> >  LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c
> > +LIB2ADD += $(srcdir)/config/aarch64/cpuinfo.c

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 4/5] Add support for target_version attribute
  2023-11-29 17:53   ` Richard Sandiford
@ 2023-12-04 11:14     ` Andrew Carlotti
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Carlotti @ 2023-12-04 11:14 UTC (permalink / raw)
  To: gcc-patches, ebotcazou, poulhies, ibuclaw, jason, nathan,
	rguenther, richard.earnshaw, richard.sandiford

On Wed, Nov 29, 2023 at 05:53:56PM +0000, Richard Sandiford wrote:
> Andrew Carlotti <andrew.carlotti@arm.com> writes:
> > This patch adds support for the "target_version" attribute to the middle
> > end and the C++ frontend, which will be used to implement function
> > multiversioning in the aarch64 backend.
> >
> > On targets that don't use the "target" attribute for multiversioning,
> > there is no conflict between the "target" and "target_clones"
> > attributes.  This patch therefore makes the mutual exclusion in
> > C-family, D and Ada conditonal upon the value of the
> > expanded_clones_attribute target hook.
> >
> > The "target_version" attribute is only added to C++ in this patch,
> > because this is currently the only frontend which supports
> > multiversioning using the "target" attribute.  Support for the
> > "target_version" attribute will be extended to C at a later date.
> >
> > Targets that currently use the "target" attribute for function
> > multiversioning (i.e. i386 and rs6000) are not affected by this patch.
> >
> > Ok for master?
> >
> > gcc/ChangeLog:
> >
> > 	* attribs.cc (decl_attributes): Pass attribute name to target.
> > 	(is_function_default_version): Update comment to specify
> > 	incompatibility with target_version attributes.
> > 	* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
> > 	Call valid_version_attribute_p for target_version attributes.
> > 	* target.def (valid_version_attribute_p): New hook.
> > 	(expanded_clones_attribute): New hook.
> > 	* doc/tm.texi.in: Add new hooks.
> > 	* doc/tm.texi: Regenerate.
> > 	* multiple_target.cc (create_dispatcher_calls): Remove redundant
> > 	is_function_default_version check.
> > 	(expand_target_clones): Use target hook for attribute name.
> > 	* targhooks.cc (default_target_option_valid_version_attribute_p):
> > 	New.
> > 	* targhooks.h (default_target_option_valid_version_attribute_p):
> > 	New.
> > 	* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
> > 	target_version attributes.
> >
> > gcc/c-family/ChangeLog:
> >
> > 	* c-attribs.cc (CLONES_USES_TARGET): New macro.
> > 	(attr_target_exclusions): Use new macro.
> > 	(attr_target_clones_exclusions): Ditto, and add target_version.
> > 	(attr_target_version_exclusions): New.
> > 	(c_common_attribute_table): Add target_version.
> > 	(handle_target_version_attribute): New.
> >
> > gcc/ada/ChangeLog:
> >
> > 	* gcc-interface/utils.cc (CLONES_USES_TARGET): New macro.
> > 	(attr_target_exclusions): Use new macro.
> > 	(attr_target_clones_exclusions): Ditto.
> >
> > gcc/d/ChangeLog:
> >
> > 	* d-attribs.cc (CLONES_USES_TARGET): New macro.
> > 	(attr_target_exclusions): Use new macro.
> > 	(attr_target_clones_exclusions): Ditto.
> >
> > gcc/cp/ChangeLog:
> >
> > 	* decl2.cc (check_classfn): Update comment to include
> > 	target_version attributes.
> >
> >
> > diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> > index e33a63948cebdeafc3abcdd539a35141969ad978..8850943cb3326568b4679a73405f50487aa1b7c6 100644
> > --- a/gcc/ada/gcc-interface/utils.cc
> > +++ b/gcc/ada/gcc-interface/utils.cc
> > @@ -143,16 +143,21 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
> >    { NULL, false, false, false },
> >  };
> >  
> > +#define CLONES_USES_TARGET \
> > +  (strcmp (targetm.target_option.expanded_clones_attribute, \
> > +	   "target") == 0)
> > +
> 
> Sorry for the slower review on this part.  I was hoping inspiration
> would strike for a way to resolve this, but it hasn't, so:
> 
> The codebase usually avoids static variables that need dynamic
> initialisation.  So although macros are not the preferred way of
> doing things, I think one is probably appropriate here.  How about:
> 
>   TARGET_HAS_FMV_TARGET_ATTRIBUTE
> 
> with the default being true, and with AArch64 defining it to false?
> 
> This would replace the expanded_clones_attribute hook, with:
> 
>   const char *new_attr_name = targetm.target_option.expanded_clones_attribute;
> 
> becoming:
> 
>   const char *new_attr_name = (TARGET_HAS_FMV_TARGET_ATTRIBUTE
> 			       ? "target" : "target_version");
> 
> I realise this is anything but elegant, but I think it's probably
> the least worst option, given where we are.

I thought this could be an issue, and had deliberately not committed patches
2+3 in this series in case fixing this required reverting to specific runtime
checks within each handler.

I've changed it to use your suggestion in the next version.
 
> >  static const struct attribute_spec::exclusions attr_target_exclusions[] =
> >  {
> > -  { "target_clones", true, true, true },
> > +  { "target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
> > +    CLONES_USES_TARGET },
> >    { NULL, false, false, false },
> >  };
> >  
> >  static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
> >  {
> >    { "always_inline", true, true, true },
> > -  { "target", true, true, true },
> > +  { "target", CLONES_USES_TARGET, CLONES_USES_TARGET, CLONES_USES_TARGET },
> >    { NULL, false, false, false },
> >  };
> >  
> > diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> > index f9fd258598914ce2112ecaaeaad6c63cd69a44e2..27533023ef5c481ba085c2f0c605dfb992987b3e 100644
> > --- a/gcc/attribs.cc
> > +++ b/gcc/attribs.cc
> > @@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
> >       options to the attribute((target(...))) list.  */
> >    if (TREE_CODE (*node) == FUNCTION_DECL
> >        && current_target_pragma
> > -      && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
> > +      && targetm.target_option.valid_attribute_p (*node,
> > +						  get_identifier("target"),
> 
> Formatting nit: should be a space before ("target")

Fixed.
 
> >  						  current_target_pragma, 0))
> >      {
> >        tree cur_attr = lookup_attribute ("target", attributes);
> > @@ -1241,8 +1242,9 @@ make_dispatcher_decl (const tree decl)
> >    return func_decl;  
> >  }
> >  
> > -/* Returns true if decl is multi-versioned and DECL is the default function,
> > -   that is it is not tagged with target specific optimization.  */
> > +/* Returns true if DECL is multi-versioned using the target attribute, and this
> > +   is the default version.  This function can only be used for targets that do
> > +   not support the "target_version" attribute.  */
> >  
> >  bool
> >  is_function_default_version (const tree decl)
> > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> > index b3b41ef123a0f171f57acb1b7f7fdde716428c00..8e33b7c3f4a9e7dcaa299eeff0eea92240f7ef0a 100644
> > --- a/gcc/c-family/c-attribs.cc
> > +++ b/gcc/c-family/c-attribs.cc
> > @@ -149,6 +149,7 @@ static tree handle_alloc_align_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_target_attribute (tree *, tree, tree, int, bool *);
> > +static tree handle_target_version_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> >  static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > @@ -228,16 +229,29 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
> >    ATTR_EXCL (NULL, false, false, false),
> >  };
> >  
> > +#define CLONES_USES_TARGET \
> > +  (strcmp (targetm.target_option.expanded_clones_attribute, \
> > +	   "target") == 0)
> > +
> >  static const struct attribute_spec::exclusions attr_target_exclusions[] =
> >  {
> > -  ATTR_EXCL ("target_clones", true, true, true),
> > +  ATTR_EXCL ("target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
> > +	     CLONES_USES_TARGET),
> >    ATTR_EXCL (NULL, false, false, false),
> >  };
> >  
> >  static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
> >  {
> >    ATTR_EXCL ("always_inline", true, true, true),
> > -  ATTR_EXCL ("target", true, true, true),
> > +  ATTR_EXCL ("target", CLONES_USES_TARGET, CLONES_USES_TARGET,
> > +	     CLONES_USES_TARGET),
> > +  ATTR_EXCL ("target_version", true, true, true),
> > +  ATTR_EXCL (NULL, false, false, false),
> > +};
> > +
> > +static const struct attribute_spec::exclusions attr_target_version_exclusions[] =
> > +{
> > +  ATTR_EXCL ("target_clones", true, true, true),
> 
> Just FTR: I suppose this should also include "target" if there is ever
> a port that uses "target" and "target_version" for the same thing, but
> there's no need to predict that case.

I don't think these need to be mutually exclusive, even in that case, although
if we wanted to make it work then we'd probably have to improve how the
existing code handles target attribute multiversioning.

> >    ATTR_EXCL (NULL, false, false, false),
> >  };
> >  
> > @@ -505,6 +519,9 @@ const struct attribute_spec c_common_attribute_table[] =
> >    { "target",                 1, -1, true, false, false, false,
> >  			      handle_target_attribute,
> >  			      attr_target_exclusions },
> > +  { "target_version",         1, 1, true, false, false, false,
> > +			      handle_target_version_attribute,
> > +			      attr_target_version_exclusions },
> >    { "target_clones",          1, -1, true, false, false, false,
> >  			      handle_target_clones_attribute,
> >  			      attr_target_clones_exclusions },
> > @@ -5670,6 +5687,25 @@ handle_target_attribute (tree *node, tree name, tree args, int flags,
> >    return NULL_TREE;
> >  }
> >  
> > +/* Handle a "target_version" attribute.  */
> > +
> > +static tree
> > +handle_target_version_attribute (tree *node, tree name, tree args, int flags,
> > +				  bool *no_add_attrs)
> > +{
> > +  /* Ensure we have a function type.  */
> > +  if (TREE_CODE (*node) != FUNCTION_DECL)
> > +    {
> > +      warning (OPT_Wattributes, "%qE attribute ignored", name);
> > +      *no_add_attrs = true;
> > +    }
> > +  else if (!targetm.target_option.valid_version_attribute_p (*node, name, args,
> > +							     flags))
> > +    *no_add_attrs = true;
> > +
> > +  return NULL_TREE;
> > +}
> > +
> >  /* Handle a "target_clones" attribute.  */
> >  
> >  static tree
> > diff --git a/gcc/cgraphclones.cc b/gcc/cgraphclones.cc
> > index 29d28ef895a73a223695cbb86aafbc845bbe7688..8af6b23d8c0306920e0fdcb3559ef047a16689f4 100644
> > --- a/gcc/cgraphclones.cc
> > +++ b/gcc/cgraphclones.cc
> > @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-eh.h"
> >  #include "tree-cfg.h"
> >  #include "tree-inline.h"
> > +#include "attribs.h"
> >  #include "dumpfile.h"
> >  #include "gimple-pretty-print.h"
> >  #include "alloc-pool.h"
> > @@ -1048,7 +1049,17 @@ cgraph_node::create_version_clone_with_body
> >        location_t saved_loc = input_location;
> >        tree v = TREE_VALUE (target_attributes);
> >        input_location = DECL_SOURCE_LOCATION (new_decl);
> > -      bool r = targetm.target_option.valid_attribute_p (new_decl, NULL, v, 1);
> > +      bool r;
> > +      tree name_id = get_attribute_name (target_attributes);
> > +      const char* name_str = IDENTIFIER_POINTER (name_id);
> 
> Formatting nit, sorry, but: "const char* name_str".

Fixed.

> > +      if (strcmp (name_str, "target") == 0)
> > +	r = targetm.target_option.valid_attribute_p (new_decl, name_id, v, 1);
> > +      else if (strcmp (name_str, "target_version") == 0)
> > +	r = targetm.target_option.valid_version_attribute_p (new_decl, name_id,
> > +							     v, 1);
> > +      else
> > +	gcc_assert(false);
> 
> gcc_unreachable ();

Fixed.

> LGTM otherwise, thanks.
> 
> Richard
> 
> > +
> >        input_location = saved_loc;
> >        if (!r)
> >  	return NULL;
> > diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> > index 9e666e5eecee07ae7c742c3a2b27e85899945c4e..e607aa14d284d545d122e04b0eae1247fd301882 100644
> > --- a/gcc/cp/decl2.cc
> > +++ b/gcc/cp/decl2.cc
> > @@ -832,8 +832,8 @@ check_classfn (tree ctype, tree function, tree template_parms)
> >        tree c2 = get_constraints (fndecl);
> >  
> >        /* While finding a match, same types and params are not enough
> > -	 if the function is versioned.  Also check version ("target")
> > -	 attributes.  */
> > +	 if the function is versioned.  Also check for different target
> > +	 specific attributes.  */
> >        if (same_type_p (TREE_TYPE (TREE_TYPE (function)),
> >  		       TREE_TYPE (TREE_TYPE (fndecl)))
> >  	  && compparms (p1, p2)
> > diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
> > index c0dc0e24ded871c136e54e5527e901d16cfa5ceb..7fe68565e70dd1124aac63601416dad68600a34e 100644
> > --- a/gcc/d/d-attribs.cc
> > +++ b/gcc/d/d-attribs.cc
> > @@ -126,16 +126,22 @@ static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
> >    ATTR_EXCL (NULL, false, false, false),
> >  };
> >  
> > +#define CLONES_USES_TARGET \
> > +  (strcmp (targetm.target_option.expanded_clones_attribute, \
> > +	   "target") == 0)
> > +
> >  static const struct attribute_spec::exclusions attr_target_exclusions[] =
> >  {
> > -  ATTR_EXCL ("target_clones", true, true, true),
> > +  ATTR_EXCL ("target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
> > +	     CLONES_USES_TARGET),
> >    ATTR_EXCL (NULL, false, false, false),
> >  };
> >  
> >  static const struct attribute_spec::exclusions attr_target_clones_exclusions[] =
> >  {
> >    ATTR_EXCL ("always_inline", true, true, true),
> > -  ATTR_EXCL ("target", true, true, true),
> > +  ATTR_EXCL ("target", CLONES_USES_TARGET, CLONES_USES_TARGET,
> > +	     CLONES_USES_TARGET),
> >    ATTR_EXCL (NULL, false, false, false),
> >  };
> >  
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index d83ca73b1aff90d3c181436afedc162b977a4158..6f6b133803f4574fcf0112b1385eec861112ddd5 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -10644,6 +10644,23 @@ the function declaration to hold a pointer to a target-specific
> >  @code{struct cl_target_option} structure.
> >  @end deftypefn
> >  
> > +@deftypefn {Target Hook} bool TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P (tree @var{fndecl}, tree @var{name}, tree @var{args}, int @var{flags})
> > +This hook is called to parse @code{attribute(target_version("..."))},
> > +which allows setting target-specific options on individual function versions.
> > +These function-specific options may differ
> > +from the options specified on the command line.  The hook should return
> > +@code{true} if the options are valid.
> > +
> > +The hook should set the @code{DECL_FUNCTION_SPECIFIC_TARGET} field in
> > +the function declaration to hold a pointer to a target-specific
> > +@code{struct cl_target_option} structure.
> > +@end deftypefn
> > +
> > +@deftypevr {Target Hook} {const char *} TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
> > +Contains the name of the attribute used for the version description string
> > +when expanding clones for a function with the target_clones attribute.
> > +@end deftypevr
> > +
> >  @deftypefn {Target Hook} void TARGET_OPTION_SAVE (struct cl_target_option *@var{ptr}, struct gcc_options *@var{opts}, struct gcc_options *@var{opts_set})
> >  This hook is called to save any additional target-specific information
> >  in the @code{struct cl_target_option} structure for function-specific
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 3d3ae12cc2ff62025b1138430de501a33961fd90..149c88f627be20a9a35ead2eaebdb704e51927fa 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -7028,6 +7028,10 @@ on this implementation detail.
> >  
> >  @hook TARGET_OPTION_VALID_ATTRIBUTE_P
> >  
> > +@hook TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P
> > +
> > +@hook TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
> > +
> >  @hook TARGET_OPTION_SAVE
> >  
> >  @hook TARGET_OPTION_RESTORE
> > diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
> > index a2ed048d7dd28ec470953fcd8a0dc86817e4b7dc..3db57c2b13d612a37240d9dcf58ad21b2286633c 100644
> > --- a/gcc/multiple_target.cc
> > +++ b/gcc/multiple_target.cc
> > @@ -66,10 +66,6 @@ create_dispatcher_calls (struct cgraph_node *node)
> >  {
> >    ipa_ref *ref;
> >  
> > -  if (!DECL_FUNCTION_VERSIONED (node->decl)
> > -      || !is_function_default_version (node->decl))
> > -    return;
> > -
> >    if (!targetm.has_ifunc_p ())
> >      {
> >        error_at (DECL_SOURCE_LOCATION (node->decl),
> > @@ -377,6 +373,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
> >        return false;
> >      }
> >  
> > +  const char *new_attr_name = targetm.target_option.expanded_clones_attribute;
> >    cgraph_function_version_info *decl1_v = NULL;
> >    cgraph_function_version_info *decl2_v = NULL;
> >    cgraph_function_version_info *before = NULL;
> > @@ -392,7 +389,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
> >        char *attr = attrs[i];
> >  
> >        /* Create new target clone.  */
> > -      tree attributes = make_attribute ("target", attr,
> > +      tree attributes = make_attribute (new_attr_name, attr,
> >  					DECL_ATTRIBUTES (node->decl));
> >  
> >        char *suffix = XNEWVEC (char, strlen (attr) + 1);
> > @@ -430,7 +427,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
> >    XDELETEVEC (attr_str);
> >  
> >    /* Setting new attribute to initial function.  */
> > -  tree attributes = make_attribute ("target", "default",
> > +  tree attributes = make_attribute (new_attr_name, "default",
> >  				    DECL_ATTRIBUTES (node->decl));
> >    DECL_ATTRIBUTES (node->decl) = attributes;
> >    node->local = false;
> > diff --git a/gcc/target.def b/gcc/target.def
> > index 0996da0f71a85f8217a41ceb08de8b21087e4ed9..1d2e0d8bf03a8b949ec636e6a78a111308d3dd71 100644
> > --- a/gcc/target.def
> > +++ b/gcc/target.def
> > @@ -6533,6 +6533,31 @@ the function declaration to hold a pointer to a target-specific\n\
> >   bool, (tree fndecl, tree name, tree args, int flags),
> >   default_target_option_valid_attribute_p)
> >  
> > +/* Function to validate the attribute((target_version(...))) strings.  If
> > +   the option is validated, the hook should also fill in
> > +   DECL_FUNCTION_SPECIFIC_TARGET in the function decl node.  */
> > +DEFHOOK
> > +(valid_version_attribute_p,
> > + "This hook is called to parse @code{attribute(target_version(\"...\"))},\n\
> > +which allows setting target-specific options on individual function versions.\n\
> > +These function-specific options may differ\n\
> > +from the options specified on the command line.  The hook should return\n\
> > +@code{true} if the options are valid.\n\
> > +\n\
> > +The hook should set the @code{DECL_FUNCTION_SPECIFIC_TARGET} field in\n\
> > +the function declaration to hold a pointer to a target-specific\n\
> > +@code{struct cl_target_option} structure.",
> > + bool, (tree fndecl, tree name, tree args, int flags),
> > + default_target_option_valid_version_attribute_p)
> > +
> > +/* Attribute to be used when expanding clones for functions with
> > +   target_clones attribute.  */
> > +DEFHOOKPOD
> > +(expanded_clones_attribute,
> > + "Contains the name of the attribute used for the version description string\n\
> > +when expanding clones for a function with the target_clones attribute.",
> > + const char *, "target")
> > +
> >  /* Function to save any extra target state in the target options structure.  */
> >  DEFHOOK
> >  (save,
> > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > index 189549cb1c742c37c17623141989b492a7c2b2f8..ff2957fd9fd8389e23992281b35e8e5467072f7d 100644
> > --- a/gcc/targhooks.h
> > +++ b/gcc/targhooks.h
> > @@ -192,6 +192,7 @@ extern bool default_hard_regno_scratch_ok (unsigned int);
> >  extern bool default_mode_dependent_address_p (const_rtx, addr_space_t);
> >  extern bool default_new_address_profitable_p (rtx, rtx_insn *, rtx);
> >  extern bool default_target_option_valid_attribute_p (tree, tree, tree, int);
> > +extern bool default_target_option_valid_version_attribute_p (tree, tree, tree, int);
> >  extern bool default_target_option_pragma_parse (tree, tree);
> >  extern bool default_target_can_inline_p (tree, tree);
> >  extern bool default_update_ipa_fn_target_info (unsigned int &, const gimple *);
> > diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> > index 4f5b240f8d65eeeaf73418c9f1e2c2684b257cfa..b693352d7eae555912477b6e431dd9c016105007 100644
> > --- a/gcc/targhooks.cc
> > +++ b/gcc/targhooks.cc
> > @@ -1789,7 +1789,19 @@ default_target_option_valid_attribute_p (tree ARG_UNUSED (fndecl),
> >  					 int ARG_UNUSED (flags))
> >  {
> >    warning (OPT_Wattributes,
> > -	   "target attribute is not supported on this machine");
> > +	   "%<target%> attribute is not supported on this machine");
> > +
> > +  return false;
> > +}
> > +
> > +bool
> > +default_target_option_valid_version_attribute_p (tree ARG_UNUSED (fndecl),
> > +						 tree ARG_UNUSED (name),
> > +						 tree ARG_UNUSED (args),
> > +						 int ARG_UNUSED (flags))
> > +{
> > +  warning (OPT_Wattributes,
> > +	   "%<target_version%> attribute is not supported on this machine");
> >  
> >    return false;
> >  }
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index 086b55f0375435d53a1604b6659da4f19fce3d17..d7841af19b20b0dc0ae28b433d5150e9c4763eff 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -3500,8 +3500,8 @@ extern vec<tree, va_gc> **decl_debug_args_insert (tree);
> >     (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization)
> >  
> >  /* In FUNCTION_DECL, this is set if this function has other versions generated
> > -   using "target" attributes.  The default version is the one which does not
> > -   have any "target" attribute set. */
> > +   to support different architecture feature sets, e.g. using "target" or
> > +   "target_version" attributes.  */
> >  #define DECL_FUNCTION_VERSIONED(NODE)\
> >     (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function)
> >  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 5/5] aarch64: Add function multiversioning support
  2023-11-24 16:22   ` Richard Sandiford
@ 2023-12-04 13:23     ` Andrew Carlotti
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Carlotti @ 2023-12-04 13:23 UTC (permalink / raw)
  To: gcc-patches, rguenther, richard.earnshaw, richard.sandiford

On Fri, Nov 24, 2023 at 04:22:54PM +0000, Richard Sandiford wrote:
> Andrew Carlotti <andrew.carlotti@arm.com> writes:
> > This adds initial support for function multiversioning on aarch64 using
> > the target_version and target_clones attributes.  This loosely follows
> > the Beta specification in the ACLE [1], although with some differences
> > that still need to be resolved (possibly as follow-up patches).
> >
> > Existing function multiversioning implementations are broken in various
> > ways when used across translation units.  This includes placing
> > resolvers in the wrong translation units, and using symbol mangling that
> > callers to unintentionally bypass the resolver in some circumstances.
> > Fixing these issues for aarch64 will require modifications to our ACLE
> > specification.  It will also require further adjustments to existing
> > middle end code, to facilitate different mangling and resolver
> > placement while preserving existing target behaviours.
> >
> > The list of function multiversioning features specified in the ACLE is
> > also inconsistent with the list of features supported in target option
> > extensions.  I intend to resolve some or all of these inconsistencies at
> > a later stage.
> >
> > The target_version attribute is currently only supported in C++, since
> > this is the only frontend with existing support for multiversioning
> > using the target attribute.  On the other hand, this patch happens to
> > enable multiversioning with the target_clones attribute in Ada and D, as
> > well as the entire C family, using their existing frontend support.
> >
> > This patch also does not support the following aspects of the Beta
> > specification:
> >
> > - The target_clones attribute should allow an implicit unlisted
> >   "default" version.
> > - There should be an option to disable function multiversioning at
> >   compile time.
> > - Unrecognised target names in a target_clones attribute should be
> >   ignored (with an optional warning).  This current patch raises an
> >   error instead.
> >
> > [1] https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning
> >
> > ---
> >
> > I believe the support present in this patch correctly handles function
> > multiversioning within a single translation unit for all features in the ACLE
> > specification with option extension support.
> >
> > Is it ok to push this patch in its current state? I'd then continue working on
> > incremental improvements to the supported feature extensions and the ABI issues
> > in followup patches, in along with corresponding changes and improvements to
> > the ACLE specification.
> >
> >
> > gcc/ChangeLog:
> >
> > 	* config/aarch64/aarch64-feature-deps.h (fmv_deps_<FEAT_NAME>):
> > 	Define aarch64_feature_flags mask foreach FMV feature.
> > 	* config/aarch64/aarch64-option-extensions.def: Use new macros
> > 	to define FMV feature extensions.
> > 	* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
> > 	Check for target_version attribute after processing target
> > 	attribute.
> > 	(aarch64_fmv_feature_data): New.
> > 	(aarch64_parse_fmv_features): New.
> > 	(aarch64_process_target_version_attr): New.
> > 	(aarch64_option_valid_version_attribute_p): New.
> > 	(get_feature_mask_for_version): New.
> > 	(compare_feature_masks): New.
> > 	(aarch64_compare_version_priority): New.
> > 	(build_ifunc_arg_type): New.
> > 	(make_resolver_func): New.
> > 	(add_condition_to_bb): New.
> > 	(compare_feature_version_info): New.
> > 	(dispatch_function_versions): New.
> > 	(aarch64_generate_version_dispatcher_body): New.
> > 	(aarch64_get_function_versions_dispatcher): New.
> > 	(aarch64_common_function_versions): New.
> > 	(aarch64_mangle_decl_assembler_name): New.
> > 	(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
> > 	(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
> > 	(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
> > 	(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
> > 	(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
> > 	(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
> > 	(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
> > 	* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
> > 	  new value to report duplicate FMV feature.
> > 	* common/config/aarch64/cpuinfo.h: New file.
> >
> > libgcc/ChangeLog:
> >
> > 	* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
> > 	  copy in gcc/common
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
> > 	* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
> > 	* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
> 
> Thanks, mostly looks good, but some comments below:
> 
> > diff --git a/gcc/common/config/aarch64/cpuinfo.h b/gcc/common/config/aarch64/cpuinfo.h
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..1690b6eee48e960d0ae675f8e8b05e6f182b56a3
> > --- /dev/null
> > +++ b/gcc/common/config/aarch64/cpuinfo.h
> > @@ -0,0 +1,94 @@
> > +/* CPU feature detection for AArch64 architecture.
> > +   Copyright (C) 2023 Free Software Foundation, Inc.
> > +
> > +   This file is part of GCC.
> > +
> > +   This file is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by the
> > +   Free Software Foundation; either version 3, or (at your option) any
> > +   later version.
> > +
> > +   This file is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   Under Section 7 of GPL version 3, you are granted additional
> > +   permissions described in the GCC Runtime Library Exception, version
> > +   3.1, as published by the Free Software Foundation.
> > +
> > +   You should have received a copy of the GNU General Public License and
> > +   a copy of the GCC Runtime Library Exception along with this program;
> > +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +   <http://www.gnu.org/licenses/>.  */
> > +
> > +/* This enum is used in libgcc feature detection, and in the function
> > +   multiversioning implementation in aarch64.cc.  The enum should use the same
> > +   values as the corresponding enum in LLVM's compiler-rt, to faciliate
> > +   compatibility between compilers.  */
> > +
> > +enum CPUFeatures {
> > +  FEAT_RNG,
> > +  FEAT_FLAGM,
> > +  FEAT_FLAGM2,
> > +  FEAT_FP16FML,
> > +  FEAT_DOTPROD,
> > +  FEAT_SM4,
> > +  FEAT_RDM,
> > +  FEAT_LSE,
> > +  FEAT_FP,
> > +  FEAT_SIMD,
> > +  FEAT_CRC,
> > +  FEAT_SHA1,
> > +  FEAT_SHA2,
> > +  FEAT_SHA3,
> > +  FEAT_AES,
> > +  FEAT_PMULL,
> > +  FEAT_FP16,
> > +  FEAT_DIT,
> > +  FEAT_DPB,
> > +  FEAT_DPB2,
> > +  FEAT_JSCVT,
> > +  FEAT_FCMA,
> > +  FEAT_RCPC,
> > +  FEAT_RCPC2,
> > +  FEAT_FRINTTS,
> > +  FEAT_DGH,
> > +  FEAT_I8MM,
> > +  FEAT_BF16,
> > +  FEAT_EBF16,
> > +  FEAT_RPRES,
> > +  FEAT_SVE,
> > +  FEAT_SVE_BF16,
> > +  FEAT_SVE_EBF16,
> > +  FEAT_SVE_I8MM,
> > +  FEAT_SVE_F32MM,
> > +  FEAT_SVE_F64MM,
> > +  FEAT_SVE2,
> > +  FEAT_SVE_AES,
> > +  FEAT_SVE_PMULL128,
> > +  FEAT_SVE_BITPERM,
> > +  FEAT_SVE_SHA3,
> > +  FEAT_SVE_SM4,
> > +  FEAT_SME,
> > +  FEAT_MEMTAG,
> > +  FEAT_MEMTAG2,
> > +  FEAT_MEMTAG3,
> > +  FEAT_SB,
> > +  FEAT_PREDRES,
> > +  FEAT_SSBS,
> > +  FEAT_SSBS2,
> > +  FEAT_BTI,
> > +  FEAT_LS64,
> > +  FEAT_LS64_V,
> > +  FEAT_LS64_ACCDATA,
> > +  FEAT_WFXT,
> > +  FEAT_SME_F64,
> > +  FEAT_SME_I64,
> > +  FEAT_SME2,
> > +  FEAT_RCPC3,
> > +  FEAT_MAX,
> > +  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
> > +		    in __aarch64_cpu_features.  */
> > +  FEAT_INIT      /* Used as flag of features initialization completion.  */
> > +};
> > diff --git a/gcc/config/aarch64/aarch64-feature-deps.h b/gcc/config/aarch64/aarch64-feature-deps.h
> > index 7b85a8860de57f6727644c03296cef192ad0990c..8f20582e1efdd4817138480bee8cdb27fa7f3dfe 100644
> > --- a/gcc/config/aarch64/aarch64-feature-deps.h
> > +++ b/gcc/config/aarch64/aarch64-feature-deps.h
> > @@ -115,6 +115,13 @@ get_flags_off (aarch64_feature_flags mask)
> >    constexpr auto cpu_##CORE_IDENT = ARCH_IDENT ().enable | get_enable FEATURES;
> >  #include "config/aarch64/aarch64-cores.def"
> >  
> > +/* Define fmv_deps_<NAME> variables for each FMV feature, giving the transitive
> > +   closure of all the features that the FMV feature enables.  */
> > +#define AARCH64_FMV_FEATURE(A, FEAT_NAME, OPT_FLAGS) \
> > +  constexpr auto fmv_deps_##FEAT_NAME = get_enable OPT_FLAGS;
> > +#include "config/aarch64/aarch64-option-extensions.def"
> > +
> > +
> >  }
> >  }
> >  
> > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
> > index 825f3bf775899e2e5cffb1867b82766d632c8708..07df403491494d6dfe19095872ab32b9d60e9690 100644
> > --- a/gcc/config/aarch64/aarch64-option-extensions.def
> > +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> > @@ -17,17 +17,22 @@
> >     along with GCC; see the file COPYING3.  If not see
> >     <http://www.gnu.org/licenses/>.  */
> >  
> > -/* This is a list of ISA extentsions in AArch64.
> > +/* This is a list of ISA extensions in AArch64.
> >  
> > -   Before using #include to read this file, define a macro:
> > +   Before using #include to read this file, define one of the following
> > +   macros:
> >  
> >        AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,
> >  			    EXPLICIT_OFF, FEATURE_STRING)
> >  
> > +      AARCH64_FMV_FEATURE(NAME, FEAT_NAME, IDENT)
> > +
> >     - NAME is the name of the extension, represented as a string constant.
> >  
> >     - IDENT is the canonical internal name for this flag.
> >  
> > +   - FEAT_NAME is the unprefixed name used in the CPUFeatures enum.
> > +
> >     - REQUIRES is a list of features that must be enabled whenever this
> >       feature is enabled.  The relationship is implicitly transitive:
> >       if A appears in B's REQUIRES and B appears in C's REQUIRES then
> > @@ -58,45 +63,96 @@
> >       that are required.  Their order is not important.  An empty string means
> >       do not detect this feature during auto detection.
> >  
> > -   The list of features must follow topological order wrt REQUIRES
> > -   and EXPLICIT_ON.  For example, if A is in B's REQUIRES list, A must
> > -   come before B.  This is enforced by aarch64-feature-deps.h.
> > +   - OPT_FLAGS is a list of feature IDENTS that should be enabled (along with
> > +     their transitive dependencies) when the specified FMV feature is present.
> > +
> > +   Where a feature is present as both an extension and a function
> > +   multiversioning feature, and IDENT matches the FEAT_NAME suffix, then these
> > +   can be listed here simultaneously using the macro:
> > +
> > +      AARCH64_OPT_FMV_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,
> > +				EXPLICIT_OFF, FEATURE_STRING)
> > +
> > +   The list of features extensions must follow topological order wrt REQUIRES
> > +   and EXPLICIT_ON.  For example, if A is in B's REQUIRES list, A must come
> > +   before B.  This is enforced by aarch64-feature-deps.h.
> > +
> > +   The list of multiversioning features must be ordered by increasing priority,
> > +   as defined in https://github.com/ARM-software/acle/blob/main/main/acle.md
> >  
> >     NOTE: Any changes to the AARCH64_OPT_EXTENSION macro need to be mirrored in
> >     config.gcc.  */
> >  
> > +#ifndef AARCH64_OPT_EXTENSION
> > +#define AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON, \
> > +			      EXPLICIT_OFF, FEATURE_STRING)
> > +#endif
> > +
> > +#ifndef AARCH64_FMV_FEATURE
> > +#define AARCH64_FMV_FEATURE(NAME, FEAT_NAME, OPT_FLAGS)
> > +#endif
> > +
> > +#define AARCH64_OPT_FMV_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON,   \
> > +				  EXPLICIT_OFF, FEATURE_STRING)		\
> > +AARCH64_OPT_EXTENSION(NAME, IDENT, REQUIRES, EXPLICIT_ON, EXPLICIT_OFF,	\
> > +		      FEATURE_STRING)					\
> > +AARCH64_FMV_FEATURE(NAME, IDENT, (IDENT))
> > +
> > +
> >  AARCH64_OPT_EXTENSION("fp", FP, (), (), (), "fp")
> >  
> >  AARCH64_OPT_EXTENSION("simd", SIMD, (FP), (), (), "asimd")
> >  
> > -AARCH64_OPT_EXTENSION("crc", CRC, (), (), (), "crc32")
> > +AARCH64_OPT_FMV_EXTENSION("rng", RNG, (), (), (), "rng")
> >  
> > -AARCH64_OPT_EXTENSION("lse", LSE, (), (), (), "atomics")
> > +AARCH64_OPT_FMV_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
> >  
> > -/* +nofp16 disables an implicit F16FML, even though an implicit F16FML
> > -   does not imply F16.  See F16FML for more details.  */
> > -AARCH64_OPT_EXTENSION("fp16", F16, (FP), (), (F16FML), "fphp asimdhp")
> > +AARCH64_FMV_FEATURE("flagm2", FLAGM2, (FLAGM))
> > +
> > +AARCH64_FMV_FEATURE("fp16fml", FP16FML, (F16FML))
> > +
> > +AARCH64_OPT_FMV_EXTENSION("dotprod", DOTPROD, (SIMD), (), (), "asimddp")
> >  
> > -AARCH64_OPT_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
> > +AARCH64_OPT_FMV_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
> >  
> >  /* An explicit +rdma implies +simd, but +rdma+nosimd still enables scalar
> >     RDMA instructions.  */
> >  AARCH64_OPT_EXTENSION("rdma", RDMA, (), (SIMD), (), "asimdrdm")
> >  
> > -AARCH64_OPT_EXTENSION("dotprod", DOTPROD, (SIMD), (), (), "asimddp")
> > +AARCH64_FMV_FEATURE("rmd", RDM, (RDMA))
> > +
> > +AARCH64_OPT_FMV_EXTENSION("lse", LSE, (), (), (), "atomics")
> > +
> > +AARCH64_FMV_FEATURE("fp", FP, (FP))
> > +
> > +AARCH64_FMV_FEATURE("simd", SIMD, (SIMD))
> > +
> > +AARCH64_OPT_FMV_EXTENSION("crc", CRC, (), (), (), "crc32")
> >  
> > -AARCH64_OPT_EXTENSION("aes", AES, (SIMD), (), (), "aes")
> > +AARCH64_FMV_FEATURE("sha1", SHA1, ())
> >  
> > -AARCH64_OPT_EXTENSION("sha2", SHA2, (SIMD), (), (), "sha1 sha2")
> > +AARCH64_OPT_FMV_EXTENSION("sha2", SHA2, (SIMD), (), (), "sha1 sha2")
> > +
> > +AARCH64_FMV_FEATURE("sha3", SHA3, (SHA3))
> > +
> > +AARCH64_OPT_FMV_EXTENSION("aes", AES, (SIMD), (), (), "aes")
> > +
> > +AARCH64_FMV_FEATURE("pmull", PMULL, ())
> >  
> >  /* +nocrypto disables AES, SHA2 and SM4, and anything that depends on them
> >     (such as SHA3 and the SVE2 crypto extensions).  */
> >  AARCH64_OPT_EXTENSION("crypto", CRYPTO, (AES, SHA2), (), (AES, SHA2, SM4),
> >  		      "aes pmull sha1 sha2")
> >  
> > +/* Listing sha3 after crypto means we pass "+aes+sha3" to the assembler
> > +   instead of "+sha3+crypto".  */
> >  AARCH64_OPT_EXTENSION("sha3", SHA3, (SHA2), (), (), "sha3 sha512")
> >  
> > -AARCH64_OPT_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
> > +/* +nofp16 disables an implicit F16FML, even though an implicit F16FML
> > +   does not imply F16.  See F16FML for more details.  */
> > +AARCH64_OPT_EXTENSION("fp16", F16, (FP), (), (F16FML), "fphp asimdhp")
> > +
> > +AARCH64_FMV_FEATURE("fp16", FP16, (F16))
> >  
> >  /* An explicit +fp16fml implies +fp16, but a dependence on it does not.
> >     Thus -march=armv8.4-a implies F16FML but not F16.  -march=armv8.4-a+fp16
> > @@ -104,51 +160,117 @@ AARCH64_OPT_EXTENSION("sm4", SM4, (SIMD), (), (), "sm3 sm4")
> >     -march=armv8.4-a+nofp16+fp16 enables F16 but not F16FML.  */
> >  AARCH64_OPT_EXTENSION("fp16fml", F16FML, (), (F16), (), "asimdfhm")
> >  
> > -AARCH64_OPT_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
> > +AARCH64_FMV_FEATURE("dit", DIT, ())
> >  
> > -AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
> > +AARCH64_FMV_FEATURE("dpb", DPB, ())
> >  
> > -AARCH64_OPT_EXTENSION("rng", RNG, (), (), (), "rng")
> > +AARCH64_FMV_FEATURE("dpb2", DPB2, ())
> >  
> > -AARCH64_OPT_EXTENSION("memtag", MEMTAG, (), (), (), "")
> > +AARCH64_FMV_FEATURE("jscvt", JSCVT, ())
> >  
> > -AARCH64_OPT_EXTENSION("sb", SB, (), (), (), "sb")
> > +AARCH64_FMV_FEATURE("fcma", FCMA, (SIMD))
> >  
> > -AARCH64_OPT_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
> > +AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
> >  
> > -AARCH64_OPT_EXTENSION("predres", PREDRES, (), (), (), "")
> > +AARCH64_FMV_FEATURE("rcpc2", RCPC2, (RCPC))
> >  
> > -AARCH64_OPT_EXTENSION("sve2", SVE2, (SVE), (), (), "sve2")
> > +AARCH64_FMV_FEATURE("rcpc3", RCPC3, (RCPC))
> >  
> > -AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), (), (), "svesm4")
> > +AARCH64_FMV_FEATURE("frintts", FRINTTS, ())
> > +
> > +AARCH64_FMV_FEATURE("dgh", DGH, ())
> > +
> > +AARCH64_OPT_FMV_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
> > +
> > +/* An explicit +bf16 implies +simd, but +bf16+nosimd still enables scalar BF16
> > +   instructions.  */
> > +AARCH64_OPT_FMV_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
> > +
> > +AARCH64_FMV_FEATURE("ebf16", EBF16, (BF16))
> > +
> > +AARCH64_FMV_FEATURE("rpres", RPRES, ())
> > +
> > +AARCH64_OPT_FMV_EXTENSION("sve", SVE, (SIMD, F16), (), (), "sve")
> > +
> > +AARCH64_FMV_FEATURE("sve-bf16", SVE_BF16, (SVE, BF16))
> > +
> > +AARCH64_FMV_FEATURE("sve-ebf16", SVE_EBF16, (SVE, BF16))
> > +
> > +AARCH64_FMV_FEATURE("sve-i8mm", SVE_I8MM, (SVE, I8MM))
> > +
> > +AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
> > +
> > +AARCH64_FMV_FEATURE("f32mm", SVE_F32MM, (F32MM))
> > +
> > +AARCH64_OPT_EXTENSION("f64mm", F64MM, (SVE), (), (), "f64mm")
> > +
> > +AARCH64_FMV_FEATURE("f64mm", SVE_F64MM, (F64MM))
> > +
> > +AARCH64_OPT_FMV_EXTENSION("sve2", SVE2, (SVE), (), (), "sve2")
> >  
> >  AARCH64_OPT_EXTENSION("sve2-aes", SVE2_AES, (SVE2, AES), (), (), "sveaes")
> >  
> > -AARCH64_OPT_EXTENSION("sve2-sha3", SVE2_SHA3, (SVE2, SHA3), (), (), "svesha3")
> > +AARCH64_FMV_FEATURE("sve2-aes", SVE_AES, (SVE2, AES))
> > +
> > +AARCH64_FMV_FEATURE("sve2-pmull128", SVE_PMULL128, (SVE2))
> >  
> >  AARCH64_OPT_EXTENSION("sve2-bitperm", SVE2_BITPERM, (SVE2), (), (),
> >  		      "svebitperm")
> >  
> > -AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
> > +AARCH64_FMV_FEATURE("sve2-bitperm", SVE_BITPERM, (SVE2_BITPERM))
> >  
> > -AARCH64_OPT_EXTENSION("i8mm", I8MM, (SIMD), (), (), "i8mm")
> > +AARCH64_OPT_EXTENSION("sve2-sha3", SVE2_SHA3, (SVE2, SHA3), (), (), "svesha3")
> >  
> > -AARCH64_OPT_EXTENSION("f32mm", F32MM, (SVE), (), (), "f32mm")
> > +AARCH64_FMV_FEATURE("sve2-sha3", SVE_SHA3, (SVE2_SHA3))
> >  
> > -AARCH64_OPT_EXTENSION("f64mm", F64MM, (SVE), (), (), "f64mm")
> > +AARCH64_OPT_EXTENSION("sve2-sm4", SVE2_SM4, (SVE2, SM4), (), (), "svesm4")
> >  
> > -/* An explicit +bf16 implies +simd, but +bf16+nosimd still enables scalar BF16
> > -   instructions.  */
> > -AARCH64_OPT_EXTENSION("bf16", BF16, (FP), (SIMD), (), "bf16")
> > +AARCH64_FMV_FEATURE("sve2-sm4", SVE_SM4, (SVE2_SM4))
> > +
> > +AARCH64_FMV_FEATURE("sme", SME, ())
> >  
> > -AARCH64_OPT_EXTENSION("flagm", FLAGM, (), (), (), "flagm")
> > +AARCH64_OPT_FMV_EXTENSION("memtag", MEMTAG, (), (), (), "")
> > +
> > +AARCH64_FMV_FEATURE("memtag2", MEMTAG2, (MEMTAG))
> > +
> > +AARCH64_FMV_FEATURE("memtag3", MEMTAG3, (MEMTAG))
> > +
> > +AARCH64_OPT_FMV_EXTENSION("sb", SB, (), (), (), "sb")
> > +
> > +AARCH64_OPT_FMV_EXTENSION("predres", PREDRES, (), (), (), "")
> > +
> > +AARCH64_OPT_FMV_EXTENSION("ssbs", SSBS, (), (), (), "ssbs")
> > +
> > +AARCH64_FMV_FEATURE("ssbs2", SSBS2, (SSBS))
> > +
> > +AARCH64_FMV_FEATURE("bti", BTI, ())
> > +
> > +AARCH64_OPT_EXTENSION("profile", PROFILE, (), (), (), "")
> > +
> > +AARCH64_OPT_EXTENSION("tme", TME, (), (), (), "")
> >  
> >  AARCH64_OPT_EXTENSION("pauth", PAUTH, (), (), (), "paca pacg")
> >  
> >  AARCH64_OPT_EXTENSION("ls64", LS64, (), (), (), "")
> >  
> > +AARCH64_FMV_FEATURE("ls64", LS64, ())
> > +
> > +AARCH64_FMV_FEATURE("ls64_v", LS64_V, ())
> > +
> > +AARCH64_FMV_FEATURE("ls64_accdata", LS64_ACCDATA, (LS64))
> > +
> > +AARCH64_FMV_FEATURE("wfxt", WFXT, ())
> > +
> > +AARCH64_FMV_FEATURE("sme-f64f64", SME_F64, ())
> > +
> > +AARCH64_FMV_FEATURE("sme-i64i64", SME_I64, ())
> > +
> > +AARCH64_FMV_FEATURE("sme2", SME2, ())
> > +
> >  AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
> >  
> >  AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
> >  
> > +#undef AARCH64_OPT_FMV_EXTENSION
> >  #undef AARCH64_OPT_EXTENSION
> > +#undef AARCH64_FMV_FEATURE
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 800a8b0e11005416fb4e4b1222717629b16f3745..8721c0a923c53af2c2413ed90ccb05fa698c1f85 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -84,6 +84,7 @@
> >  #include "aarch64-feature-deps.h"
> >  #include "config/arm/aarch-common.h"
> >  #include "config/arm/aarch-common-protos.h"
> > +#include "common/config/aarch64/cpuinfo.h"
> >  #include "ssa.h"
> >  
> >  /* This file should be included last.  */
> > @@ -19525,6 +19526,8 @@ aarch64_process_target_attr (tree args)
> >    return true;
> >  }
> >  
> > +static bool aarch64_process_target_version_attr (tree args);
> > +
> >  /* Implement TARGET_OPTION_VALID_ATTRIBUTE_P.  This is used to
> >     process attribute ((target ("..."))).  */
> >  
> > @@ -19580,6 +19583,19 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, tree args, int)
> >  			      TREE_TARGET_OPTION (target_option_current_node));
> >  
> >    ret = aarch64_process_target_attr (args);
> > +  if (ret)
> > +    {
> > +      tree version_attr = lookup_attribute ("target_version",
> > +					    DECL_ATTRIBUTES (fndecl));
> > +      if (version_attr != NULL_TREE)
> > +	{
> > +	  /* Reapply any target_version attribute after target attribute.
> > +	     This should be equivalent to applying the target_version once
> > +	     after processing all target attributes.  */
> > +	  tree version_args = TREE_VALUE (version_attr);
> > +	  ret = aarch64_process_target_version_attr (version_args);
> > +	}
> > +    }
> >  
> >    /* Set up any additional state.  */
> >    if (ret)
> > @@ -19610,6 +19626,821 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, tree args, int)
> >    return ret;
> >  }
> >  
> > +typedef unsigned long long aarch64_fmv_feature_mask;
> > +
> > +typedef struct
> > +{
> > +  const char *name;
> > +  aarch64_fmv_feature_mask feature_mask;
> > +  aarch64_feature_flags opt_flags;
> > +} aarch64_fmv_feature_datum;
> > +
> > +#define AARCH64_FMV_FEATURE(NAME, FEAT_NAME, C) \
> > +  {NAME, 1ULL << FEAT_##FEAT_NAME, ::feature_deps::fmv_deps_##FEAT_NAME},
> > +
> > +/* FMV features are listed in priority order, to make it easier to sort target
> > +   strings.  */
> > +static aarch64_fmv_feature_datum aarch64_fmv_feature_data[] = {
> > +#include "config/aarch64/aarch64-option-extensions.def"
> > +};
> > +
> > +
> > +/* Parse a non-default fmv feature string, as found in a target_version or
> > +   target_clones attribute.  */
> 
> The comment says non-default, but the function does handle "default".
> 
> It would be good to describe the arguments too.  E.g. something like:
> 
> /* Parse function multi-versioning feature string STR, as found in a
>    target_version or target_clones attribute.  Add the selected FMV
>    features to *FEATURE_MASK and the associated -march ISA extensions
>    to *ISA_FLAGS.  If parsing fails due to an invalid or duplicate
>    feature name, store that feature name in *INVALID_EXTENSION.  */

Updated (with slightly different wording).

> > +
> > +static enum aarch_parse_opt_result
> > +aarch64_parse_fmv_features (const char *str, aarch64_feature_flags *isa_flags,
> > +			    aarch64_fmv_feature_mask *feature_mask,
> > +			    std::string *invalid_extension)
> > +{
> > +  if (strcmp (str, "default") == 0)
> > +    return AARCH_PARSE_OK;
> > +
> > +  while (str != NULL && *str != 0)
> > +    {
> > +      const char *ext;
> > +      size_t len;
> > +
> > +      ext = strchr (str, '+');
> > +
> > +      if (ext != NULL)
> > +	len = ext - str;
> > +      else
> > +	len = strlen (str);
> > +
> > +      if (len == 0)
> > +	return AARCH_PARSE_MISSING_ARG;
> > +
> > +      static const int num_features = ARRAY_SIZE (aarch64_fmv_feature_data);
> > +      int i;
> > +      for (i = 0; i < num_features; i++)
> > +	{
> > +	  if (strlen (aarch64_fmv_feature_data[i].name) == len
> > +	      && strncmp (aarch64_fmv_feature_data[i].name, str, len) == 0)
> > +	    {
> > +	      if (isa_flags)
> > +		*isa_flags |= aarch64_fmv_feature_data[i].opt_flags;
> > +	      if (feature_mask)
> > +		{
> > +		  auto old_feature_mask = *feature_mask;
> > +		  *feature_mask |= aarch64_fmv_feature_data[i].feature_mask;
> > +		  if (*feature_mask == old_feature_mask)
> > +		    {
> > +		      /* Duplicate feature.  */
> > +		      if (invalid_extension)
> > +			*invalid_extension = std::string (str, len);
> > +		      return AARCH_PARSE_DUPLICATE_FEATURE;
> > +		    }
> > +		}
> > +	      break;
> > +	    }
> > +	}
> > +
> > +      if (i == num_features)
> > +	{
> > +	  /* Feature not found in list.  */
> > +	  if (invalid_extension)
> > +	    *invalid_extension = std::string (str, len);
> > +	  return AARCH_PARSE_INVALID_FEATURE;
> > +	}
> > +
> > +      str = ext;
> > +    }
> 
> Does this work for "feat1+feat2"?  It looks like str would be set to
> "+feat2" for the second iteration, and then the strchr would likewise
> return "+feat2", giving an empty string.

This was broken - thanks for spotting.  Fixed in the next version.
 
> > +
> > +  return AARCH_PARSE_OK;
> > +}
> > +
> > +/* Parse the tree in ARGS that contains the target_version attribute
> > +   information and update the global target options space.  */
> > +
> > +static bool
> > +aarch64_process_target_version_attr (tree args)
> > +{
> > +  if (TREE_CODE (args) == TREE_LIST)
> > +    {
> > +      if (TREE_CHAIN (args))
> > +	{
> > +	  error ("attribute %<target_version%> has multiple values");
> > +	  return false;
> > +	}
> > +      args = TREE_VALUE (args);
> > +    }
> > +
> > +  if (!args || TREE_CODE (args) != STRING_CST)
> > +    {
> > +      error ("attribute %<target_version%> argument not a string");
> > +      return false;
> > +    }
> > +
> > +  const char *str = TREE_STRING_POINTER (args);
> > +
> > +  enum aarch_parse_opt_result parse_res;
> > +  auto isa_flags = aarch64_asm_isa_flags;
> > +
> > +
> > +  std::string invalid_extension;
> > +  parse_res = aarch64_parse_fmv_features (str, &isa_flags, NULL,
> > +					  &invalid_extension);
> > +
> > +  if (parse_res == AARCH_PARSE_OK)
> > +    {
> > +      aarch64_set_asm_isa_flags (isa_flags);
> > +      return true;
> > +    }
> > +
> > +  switch (parse_res)
> > +    {
> > +      case AARCH_PARSE_MISSING_ARG:
> > +	error ("missing value in %<target_version%> attribute");
> > +	break;
> > +
> > +      case AARCH_PARSE_INVALID_FEATURE:
> > +	error ("invalid feature modifier %qs of value %qs in "
> > +	       "%<target_version%> attribute", invalid_extension.c_str (),
> > +	       str);
> > +	break;
> > +
> > +      case AARCH_PARSE_DUPLICATE_FEATURE:
> > +	error ("duplicate feature modifier %qs of value %qs in "
> > +	       "%<target_version%> attribute", invalid_extension.c_str (),
> > +	       str);
> > +	break;
> > +
> > +      default:
> > +	gcc_unreachable ();
> > +    }
> 
> Formating nit: the convention is for cases to line up with the "{"
> of the switch, so the switch body between { and } above should be
> indented by 2 fewer columns.

Fixed.

> > +
> > +  return false;
> > +}
> > +
> > +/* Implement TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P.  This is used to
> > +   process attribute ((target ("..."))).  */
> 
> attribute ((target_version ("...")))  ?

Fixed.

> > +
> > +static bool
> > +aarch64_option_valid_version_attribute_p (tree fndecl, tree, tree args, int)
> > +{
> > +  struct cl_target_option cur_target;
> > +  bool ret;
> > +  tree new_target;
> > +  tree existing_target = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
> > +
> > +  /* Save the current target options to restore at the end.  */
> > +  cl_target_option_save (&cur_target, &global_options, &global_options_set);
> > +
> > +  /* If fndecl already has some target attributes applied to it, unpack
> > +     them so that we add this attribute on top of them, rather than
> > +     overwriting them.  */
> > +  if (existing_target)
> > +    {
> > +      struct cl_target_option *existing_options
> > +	= TREE_TARGET_OPTION (existing_target);
> > +
> > +      if (existing_options)
> > +	cl_target_option_restore (&global_options, &global_options_set,
> > +				  existing_options);
> > +    }
> > +  else
> > +    cl_target_option_restore (&global_options, &global_options_set,
> > +			      TREE_TARGET_OPTION (target_option_current_node));
> > +
> > +  ret = aarch64_process_target_version_attr (args);
> > +
> > +  /* Set up any additional state.  */
> > +  if (ret)
> > +    {
> > +      aarch64_override_options_internal (&global_options);
> > +      new_target = build_target_option_node (&global_options,
> > +					     &global_options_set);
> > +    }
> > +  else
> > +    new_target = NULL;
> > +
> > +  if (fndecl && ret)
> > +    {
> > +      DECL_FUNCTION_SPECIFIC_TARGET (fndecl) = new_target;
> > +    }
> > +
> > +  cl_target_option_restore (&global_options, &global_options_set, &cur_target);
> > +
> > +  return ret;
> > +}
> > +
> > +/* This parses the attribute arguments to target_version in DECL and the
> > +   feature mask required to select those targets.  No adjustments are made to
> > +   add or remove redundant feature requirements.  */
> > +
> > +static aarch64_fmv_feature_mask
> > +get_feature_mask_for_version (tree decl)
> > +{
> > +  tree version_attr = lookup_attribute ("target_version",
> > +					DECL_ATTRIBUTES (decl));
> > +  if (version_attr == NULL)
> > +    return 0;
> > +
> > +  const char *version_string = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE
> > +						    (version_attr)));
> > +  enum aarch_parse_opt_result parse_res;
> > +  aarch64_fmv_feature_mask feature_mask = 0ULL;
> > +
> > +  parse_res = aarch64_parse_fmv_features (version_string, NULL, &feature_mask,
> > +					  NULL);
> > +
> > +  /* We should have detected any errors before getting here.  */
> > +  gcc_assert (parse_res == AARCH_PARSE_OK);
> > +
> > +  return feature_mask;
> > +}
> > +
> > +/* Compare priorities of two feature masks. Return:
> > +     1: mask1 is higher priority
> > +    -1: mask2 is higher priority
> > +     0: masks are equal.  */
> > +
> > +static int
> > +compare_feature_masks (aarch64_fmv_feature_mask mask1,
> > +		       aarch64_fmv_feature_mask mask2)
> > +{
> > +  int pop1 = popcount_hwi(mask1);
> > +  int pop2 = popcount_hwi(mask2);
> 
> Nit: should be a space before "(mask1" and "(mask2".

Fixed.
 
> > +  if (pop1 > pop2)
> > +    return 1;
> > +  if (pop2 > pop1)
> > +    return -1;
> > +
> > +  auto diff_mask = mask1 ^ mask2;
> > +  if (diff_mask == 0ULL)
> > +    return 0;
> > +  for (int i = FEAT_MAX - 1; i > 0; i--)
> > +    {
> > +      auto bit_mask = aarch64_fmv_feature_data[i].feature_mask;
> > +      if (diff_mask & bit_mask)
> > +	return (mask1 & bit_mask) ? 1 : -1;
> > +    }
> > +  gcc_unreachable();
> > +}
> 
> Still not sure that this is the right criteria to use, but I suppose
> we can adjust it post-commit to match any changes in the spec.
> 
> > +
> > +int
> > +aarch64_compare_version_priority (tree decl1, tree decl2)
> > +{
> > +  auto mask1 = get_feature_mask_for_version (decl1);
> > +  auto mask2 = get_feature_mask_for_version (decl2);
> > +
> > +  return compare_feature_masks (mask1, mask2);
> > +}
> > +
> > +/* Build the struct __ifunc_arg_t type:
> > +
> > +   struct __ifunc_arg_t
> > +   {
> > +     unsigned long _size; // Size of the struct, so it can grow.
> > +     unsigned long _hwcap;
> > +     unsigned long _hwcap2;
> > +   }
> > + */
> 
> This isn't ILP32-friendly, but I agree we need to stick to the types
> that glibc uses.
> 
> > +
> > +static tree
> > +build_ifunc_arg_type ()
> > +{
> > +  tree ifunc_arg_type = lang_hooks.types.make_type (RECORD_TYPE);
> > +  tree field1 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> > +			    get_identifier ("_size"),
> > +			    long_unsigned_type_node);
> > +  tree field2 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> > +			    get_identifier ("_hwcap"),
> > +			    long_unsigned_type_node);
> > +  tree field3 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> > +			    get_identifier ("_hwcap2"),
> > +			    long_unsigned_type_node);
> > +
> > +  DECL_FIELD_CONTEXT (field1) = ifunc_arg_type;
> > +  DECL_FIELD_CONTEXT (field2) = ifunc_arg_type;
> > +  DECL_FIELD_CONTEXT (field3) = ifunc_arg_type;
> > +
> > +  TYPE_FIELDS (ifunc_arg_type) = field1;
> > +  DECL_CHAIN (field1) = field2;
> > +  DECL_CHAIN (field2) = field3;
> > +
> > +  layout_type (ifunc_arg_type);
> > +
> > +  tree const_type = build_qualified_type (ifunc_arg_type, TYPE_QUAL_CONST);
> > +  tree pointer_type = build_pointer_type (const_type);
> > +
> > +  return pointer_type;
> > +}
> > +
> > +/* Make the resolver function decl to dispatch the versions of
> > +   a multi-versioned function,  DEFAULT_DECL.  IFUNC_ALIAS_DECL is
> > +   ifunc alias that will point to the created resolver.  Create an
> > +   empty basic block in the resolver and store the pointer in
> > +   EMPTY_BB.  Return the decl of the resolver function.  */
> > +
> > +static tree
> > +make_resolver_func (const tree default_decl,
> > +		    const tree ifunc_alias_decl,
> > +		    basic_block *empty_bb)
> > +{
> > +  tree decl, type, t;
> > +
> > +  /* Create resolver function name based on default_decl.  */
> > +  tree decl_name = clone_function_name (default_decl, "resolver");
> > +  const char *resolver_name = IDENTIFIER_POINTER (decl_name);
> > +
> > +  /* The resolver function should have signature
> > +     (void *) resolver (uint64_t, const __ifunc_arg_t *) */
> > +  type = build_function_type_list (ptr_type_node,
> > +				   uint64_type_node,
> > +				   build_ifunc_arg_type(),
> > +				   NULL_TREE);
> > +
> > +  decl = build_fn_decl (resolver_name, type);
> > +  SET_DECL_ASSEMBLER_NAME (decl, decl_name);
> > +
> > +  DECL_NAME (decl) = decl_name;
> > +  TREE_USED (decl) = 1;
> > +  DECL_ARTIFICIAL (decl) = 1;
> > +  DECL_IGNORED_P (decl) = 1;
> > +  TREE_PUBLIC (decl) = 0;
> > +  DECL_UNINLINABLE (decl) = 1;
> > +
> > +  /* Resolver is not external, body is generated.  */
> > +  DECL_EXTERNAL (decl) = 0;
> > +  DECL_EXTERNAL (ifunc_alias_decl) = 0;
> > +
> > +  DECL_CONTEXT (decl) = NULL_TREE;
> > +  DECL_INITIAL (decl) = make_node (BLOCK);
> > +  DECL_STATIC_CONSTRUCTOR (decl) = 0;
> > +
> > +  if (DECL_COMDAT_GROUP (default_decl)
> > +      || TREE_PUBLIC (default_decl))
> > +    {
> > +      /* In this case, each translation unit with a call to this
> > +	 versioned function will put out a resolver.  Ensure it
> > +	 is comdat to keep just one copy.  */
> > +      DECL_COMDAT (decl) = 1;
> > +      make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
> > +    }
> > +  else
> > +    TREE_PUBLIC (ifunc_alias_decl) = 0;
> > +
> > +  /* Build result decl and add to function_decl. */
> > +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node);
> > +  DECL_CONTEXT (t) = decl;
> > +  DECL_ARTIFICIAL (t) = 1;
> > +  DECL_IGNORED_P (t) = 1;
> > +  DECL_RESULT (decl) = t;
> > +
> > +  /* Build parameter decls and add to function_decl. */
> > +  tree arg1 = build_decl (UNKNOWN_LOCATION, PARM_DECL,
> > +			  get_identifier ("hwcap"),
> > +			  uint64_type_node);
> > +  tree arg2 = build_decl (UNKNOWN_LOCATION, PARM_DECL,
> > +			  get_identifier ("arg"),
> > +			  build_ifunc_arg_type());
> > +  DECL_CONTEXT (arg1) = decl;
> > +  DECL_CONTEXT (arg2) = decl;
> > +  DECL_ARTIFICIAL (arg1) = 1;
> > +  DECL_ARTIFICIAL (arg2) = 1;
> > +  DECL_IGNORED_P (arg1) = 1;
> > +  DECL_IGNORED_P (arg2) = 1;
> > +  DECL_ARG_TYPE (arg1) = uint64_type_node;
> > +  DECL_ARG_TYPE (arg2) = build_ifunc_arg_type();
> 
> Nit: space before second "(".

Fixed, along with the earlier instance of this mistake.

> > +  DECL_ARGUMENTS (decl) = arg1;
> > +  TREE_CHAIN (arg1) = arg2;
> > +
> > +  gimplify_function_tree (decl);
> > +  push_cfun (DECL_STRUCT_FUNCTION (decl));
> > +  *empty_bb = init_lowered_empty_function (decl, false,
> > +					   profile_count::uninitialized ());
> > +
> > +  cgraph_node::add_new_function (decl, true);
> > +  symtab->call_cgraph_insertion_hooks (cgraph_node::get_create (decl));
> > +
> > +  pop_cfun ();
> > +
> > +  gcc_assert (ifunc_alias_decl != NULL);
> > +  /* Mark ifunc_alias_decl as "ifunc" with resolver as resolver_name.  */
> > +  DECL_ATTRIBUTES (ifunc_alias_decl)
> > +    = make_attribute ("ifunc", resolver_name,
> > +		      DECL_ATTRIBUTES (ifunc_alias_decl));
> > +
> > +  /* Create the alias for dispatch to resolver here.  */
> > +  cgraph_node::create_same_body_alias (ifunc_alias_decl, decl);
> > +  return decl;
> > +}
> > +
> > +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL
> > +   to return a pointer to VERSION_DECL if all feature bits specified in
> > +   FEATURE_MASK are not set in MASK_VAR.  This function will be called during
> > +   version dispatch to decide which function version to execute.  It returns
> > +   the basic block at the end, to which more conditions can be added.  */
> > +static basic_block
> > +add_condition_to_bb (tree function_decl, tree version_decl,
> > +		     aarch64_fmv_feature_mask feature_mask,
> > +		     tree mask_var, basic_block new_bb)
> > +{
> > +  gimple *return_stmt;
> > +  tree convert_expr, result_var;
> > +  gimple *convert_stmt;
> > +  gimple *if_else_stmt;
> > +
> > +  basic_block bb1, bb2, bb3;
> > +  edge e12, e23;
> > +
> > +  gimple_seq gseq;
> > +
> > +  push_cfun (DECL_STRUCT_FUNCTION (function_decl));
> > +
> > +  gcc_assert (new_bb != NULL);
> > +  gseq = bb_seq (new_bb);
> > +
> > +
> > +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node,
> > +			 build_fold_addr_expr (version_decl));
> > +  result_var = create_tmp_var (ptr_type_node);
> > +  convert_stmt = gimple_build_assign (result_var, convert_expr);
> > +  return_stmt = gimple_build_return (result_var);
> > +
> > +
> 
> Nit: just one blank line (before and after the block).  Some other instances
> in the patch too.

Fixed all new occurrences of "\n\n\n".
 
> > +  if (feature_mask == 0ULL)
> > +    {
> > +      /* Default version.  */
> > +      gimple_seq_add_stmt (&gseq, convert_stmt);
> > +      gimple_seq_add_stmt (&gseq, return_stmt);
> > +      set_bb_seq (new_bb, gseq);
> > +      gimple_set_bb (convert_stmt, new_bb);
> > +      gimple_set_bb (return_stmt, new_bb);
> > +      pop_cfun ();
> > +      return new_bb;
> > +    }
> > +
> > +  tree and_expr_var = create_tmp_var (long_long_unsigned_type_node);
> > +  tree and_expr = build2 (BIT_AND_EXPR,
> > +			  long_long_unsigned_type_node,
> > +			  mask_var,
> > +			  build_int_cst (long_long_unsigned_type_node,
> > +					 feature_mask));
> > +  gimple *and_stmt = gimple_build_assign (and_expr_var, and_expr);
> > +  gimple_set_block (and_stmt, DECL_INITIAL (function_decl));
> > +  gimple_set_bb (and_stmt, new_bb);
> > +  gimple_seq_add_stmt (&gseq, and_stmt);
> > +
> > +  tree zero_llu = build_int_cst (long_long_unsigned_type_node, 0);
> > +  if_else_stmt = gimple_build_cond (EQ_EXPR, and_expr_var, zero_llu,
> > +				    NULL_TREE, NULL_TREE);
> > +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl));
> > +  gimple_set_bb (if_else_stmt, new_bb);
> > +  gimple_seq_add_stmt (&gseq, if_else_stmt);
> > +
> > +  gimple_seq_add_stmt (&gseq, convert_stmt);
> > +  gimple_seq_add_stmt (&gseq, return_stmt);
> > +  set_bb_seq (new_bb, gseq);
> > +
> > +  bb1 = new_bb;
> > +  e12 = split_block (bb1, if_else_stmt);
> > +  bb2 = e12->dest;
> > +  e12->flags &= ~EDGE_FALLTHRU;
> > +  e12->flags |= EDGE_TRUE_VALUE;
> > +
> > +  e23 = split_block (bb2, return_stmt);
> > +
> > +  gimple_set_bb (convert_stmt, bb2);
> > +  gimple_set_bb (return_stmt, bb2);
> > +
> > +  bb3 = e23->dest;
> > +  make_edge (bb1, bb3, EDGE_FALSE_VALUE);
> > +
> > +  remove_edge (e23);
> > +  make_edge (bb2, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
> > +
> > +  pop_cfun ();
> > +
> > +  return bb3;
> > +}
> > +
> > +/* Used when sorting the decls into dispatch order.  */
> > +static int compare_feature_version_info (const void *p1, const void *p2)
> 
> Formatting nit: new line after "static int".
> 
> > +{
> > +  struct _function_version_info
> > +    {
> > +      tree version_decl;
> > +      aarch64_fmv_feature_mask feature_mask;
> > +    };
> 
> Think we should move this struct out of the function so that it can
> be shared by dispatch_function_versions.  Alternatively, the comparison
> function could be a lambda within dispatch_function_versions.

Rewritten as a lambda, and reordered within dispatch_function_versions so that
processing the list of function versions happens after all the preliminary
codegen.

> It's best to avoid names starting with "_", since those are reserved
> for the implementation.
> 
> > +  const _function_version_info v1 = *(const _function_version_info *)p1;
> > +  const _function_version_info v2 = *(const _function_version_info *)p2;
> > +  return - compare_feature_masks (v1.feature_mask, v2.feature_mask);
> > +}
> > +
> > +static int
> > +dispatch_function_versions (tree dispatch_decl,
> > +			    void *fndecls_p,
> > +			    basic_block *empty_bb)
> 
> Missing function comment.

Added (same as i386).
 
> > +{
> > +  gimple *ifunc_cpu_init_stmt;
> > +  gimple_seq gseq;
> > +  vec<tree> *fndecls;
> > +  unsigned int num_versions = 0;
> > +  unsigned int actual_versions = 0;
> > +  unsigned int i;
> > +
> > +  struct _function_version_info
> > +    {
> > +      tree version_decl;
> > +      aarch64_fmv_feature_mask feature_mask;
> > +    } *function_version_info;
> > +
> > +  gcc_assert (dispatch_decl != NULL
> > +	      && fndecls_p != NULL
> > +	      && empty_bb != NULL);
> > +
> > +  /*fndecls_p is actually a vector.  */
> > +  fndecls = static_cast<vec<tree> *> (fndecls_p);
> > +
> > +  /* At least one more version other than the default.  */
> > +  num_versions = fndecls->length ();
> > +  gcc_assert (num_versions >= 2);
> > +
> > +  function_version_info = (struct _function_version_info *)
> > +    XNEWVEC (struct _function_version_info, (num_versions));
> > +
> > +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl));
> > +
> > +  gseq = bb_seq (*empty_bb);
> > +  /* Function version dispatch is via IFUNC.  IFUNC resolvers fire before
> > +     constructors, so explicity call __init_cpu_features_resolver here.  */
> > +  tree init_fn_type = build_function_type_list (void_type_node,
> > +						long_unsigned_type_node,
> > +						build_ifunc_arg_type(),
> > +						NULL);
> > +  tree init_fn_id = get_identifier ("__init_cpu_features_resolver");
> > +  tree init_fn_decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL,
> > +				  init_fn_id, init_fn_type);
> > +  tree arg1 = DECL_ARGUMENTS (dispatch_decl);
> > +  tree arg2 = TREE_CHAIN (arg1);
> > +  ifunc_cpu_init_stmt = gimple_build_call (init_fn_decl, 2, arg1, arg2);
> > +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt);
> > +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb);
> > +
> > +  /* Build the struct type for __aarch64_cpu_features.  */
> > +  tree global_type = lang_hooks.types.make_type (RECORD_TYPE);
> > +  tree field1 = build_decl (UNKNOWN_LOCATION, FIELD_DECL,
> > +			    get_identifier ("features"),
> > +			    long_long_unsigned_type_node);
> > +  DECL_FIELD_CONTEXT (field1) = global_type;
> > +  TYPE_FIELDS (global_type) = field1;
> > +  layout_type (global_type);
> > +
> > +  tree global_var = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> > +				get_identifier ("__aarch64_cpu_features"),
> > +				global_type);
> > +  DECL_EXTERNAL (global_var) = 1;
> > +  tree mask_var = create_tmp_var (long_long_unsigned_type_node);
> > +
> > +  tree component_expr = build3 (COMPONENT_REF, long_long_unsigned_type_node,
> > +				global_var, field1, NULL_TREE);
> > +  gimple *component_stmt = gimple_build_assign (mask_var, component_expr);
> > +  gimple_set_block (component_stmt, DECL_INITIAL (dispatch_decl));
> > +  gimple_set_bb (component_stmt, *empty_bb);
> > +  gimple_seq_add_stmt (&gseq, component_stmt);
> > +
> > +  tree not_expr = build1 (BIT_NOT_EXPR, long_long_unsigned_type_node, mask_var);
> > +  gimple *not_stmt = gimple_build_assign (mask_var, not_expr);
> > +  gimple_set_block (not_stmt, DECL_INITIAL (dispatch_decl));
> > +  gimple_set_bb (not_stmt, *empty_bb);
> > +  gimple_seq_add_stmt (&gseq, not_stmt);
> > +
> > +  set_bb_seq (*empty_bb, gseq);
> > +
> > +  pop_cfun ();
> > +
> > +  for (tree version_decl : *fndecls)
> > +    {
> > +      aarch64_fmv_feature_mask feature_mask;
> > +      /* Get attribute string, parse it and find the right features.  */
> > +      feature_mask = get_feature_mask_for_version (version_decl);
> > +      function_version_info [actual_versions].version_decl = version_decl;
> > +      function_version_info [actual_versions].feature_mask = feature_mask;
> > +      actual_versions++;
> > +    }
> > +
> > +  /* Sort the versions according to descending order of dispatch priority.  */
> > +  qsort (function_version_info, actual_versions,
> > +	 sizeof (struct _function_version_info), compare_feature_version_info);
> > +
> > +  for (i = 0; i < actual_versions; ++i)
> > +    *empty_bb = add_condition_to_bb (dispatch_decl,
> > +				     function_version_info[i].version_decl,
> > +				     function_version_info[i].feature_mask,
> > +				     mask_var,
> > +				     *empty_bb);
> > +
> > +  free (function_version_info);
> > +  return 0;
> > +}
> > +
> > +
> > +tree
> > +aarch64_generate_version_dispatcher_body (void *node_p)
> 
> Missing function comment.  Since the function implements a defined interface,
> the comment can just be:
> 
> /* Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY.  */

Done.
 
> > +{
> > +  tree resolver_decl;
> > +  basic_block empty_bb;
> > +  tree default_ver_decl;
> > +  struct cgraph_node *versn;
> > +  struct cgraph_node *node;
> > +
> > +  struct cgraph_function_version_info *node_version_info = NULL;
> > +  struct cgraph_function_version_info *versn_info = NULL;
> > +
> > +  node = (cgraph_node *)node_p;
> > +
> > +  node_version_info = node->function_version ();
> > +  gcc_assert (node->dispatcher_function
> > +	      && node_version_info != NULL);
> > +
> > +  if (node_version_info->dispatcher_resolver)
> > +    return node_version_info->dispatcher_resolver;
> > +
> > +  /* The first version in the chain corresponds to the default version.  */
> > +  default_ver_decl = node_version_info->next->this_node->decl;
> > +
> > +  /* node is going to be an alias, so remove the finalized bit.  */
> > +  node->definition = false;
> > +
> > +  resolver_decl = make_resolver_func (default_ver_decl,
> > +				      node->decl, &empty_bb);
> > +
> > +  node_version_info->dispatcher_resolver = resolver_decl;
> > +
> > +  push_cfun (DECL_STRUCT_FUNCTION (resolver_decl));
> > +
> > +  auto_vec<tree, 2> fn_ver_vec;
> > +
> > +  for (versn_info = node_version_info->next; versn_info;
> > +       versn_info = versn_info->next)
> > +    {
> > +      versn = versn_info->this_node;
> > +      /* Check for virtual functions here again, as by this time it should
> > +	 have been determined if this function needs a vtable index or
> > +	 not.  This happens for methods in derived classes that override
> > +	 virtual methods in base classes but are not explicitly marked as
> > +	 virtual.  */
> > +      if (DECL_VINDEX (versn->decl))
> > +	sorry ("virtual function multiversioning not supported");
> > +
> > +      fn_ver_vec.safe_push (versn->decl);
> > +    }
> > +
> > +  dispatch_function_versions (resolver_decl, &fn_ver_vec, &empty_bb);
> > +  cgraph_edge::rebuild_edges ();
> > +  pop_cfun ();
> > +  return resolver_decl;
> > +}
> > +
> > +/* Make a dispatcher declaration for the multi-versioned function DECL.
> > +   Calls to DECL function will be replaced with calls to the dispatcher
> > +   by the front-end.  Returns the decl of the dispatcher function.  */
> > +
> > +tree
> > +aarch64_get_function_versions_dispatcher (void *decl)
> > +{
> > +  tree fn = (tree) decl;
> > +  struct cgraph_node *node = NULL;
> > +  struct cgraph_node *default_node = NULL;
> > +  struct cgraph_function_version_info *node_v = NULL;
> > +  struct cgraph_function_version_info *first_v = NULL;
> > +
> > +  tree dispatch_decl = NULL;
> > +
> > +  struct cgraph_function_version_info *default_version_info = NULL;
> > +
> > +  gcc_assert (fn != NULL && DECL_FUNCTION_VERSIONED (fn));
> > +
> > +  node = cgraph_node::get (fn);
> > +  gcc_assert (node != NULL);
> > +
> > +  node_v = node->function_version ();
> > +  gcc_assert (node_v != NULL);
> > +
> > +  if (node_v->dispatcher_resolver != NULL)
> > +    return node_v->dispatcher_resolver;
> > +
> > +  /* Find the default version and make it the first node.  */
> > +  first_v = node_v;
> > +  /* Go to the beginning of the chain.  */
> > +  while (first_v->prev != NULL)
> > +    first_v = first_v->prev;
> > +  default_version_info = first_v;
> > +  while (default_version_info != NULL)
> > +    {
> > +      if (get_feature_mask_for_version
> > +	    (default_version_info->this_node->decl) == 0ULL)
> > +	break;
> > +      default_version_info = default_version_info->next;
> > +    }
> > +
> > +  /* If there is no default node, just return NULL.  */
> > +  if (default_version_info == NULL)
> > +    return NULL;
> > +
> > +  /* Make default info the first node.  */
> > +  if (first_v != default_version_info)
> > +    {
> > +      default_version_info->prev->next = default_version_info->next;
> > +      if (default_version_info->next)
> > +	default_version_info->next->prev = default_version_info->prev;
> > +      first_v->prev = default_version_info;
> > +      default_version_info->next = first_v;
> > +      default_version_info->prev = NULL;
> > +    }
> > +
> > +  default_node = default_version_info->this_node;
> > +
> > +  if (targetm.has_ifunc_p ())
> > +    {
> > +      struct cgraph_function_version_info *it_v = NULL;
> > +      struct cgraph_node *dispatcher_node = NULL;
> > +      struct cgraph_function_version_info *dispatcher_version_info = NULL;
> > +
> > +      /* Right now, the dispatching is done via ifunc.  */
> > +      dispatch_decl = make_dispatcher_decl (default_node->decl);
> > +      TREE_NOTHROW (dispatch_decl) = TREE_NOTHROW (fn);
> > +
> > +      dispatcher_node = cgraph_node::get_create (dispatch_decl);
> > +      gcc_assert (dispatcher_node != NULL);
> > +      dispatcher_node->dispatcher_function = 1;
> > +      dispatcher_version_info
> > +	= dispatcher_node->insert_new_function_version ();
> > +      dispatcher_version_info->next = default_version_info;
> > +      dispatcher_node->definition = 1;
> > +
> > +      /* Set the dispatcher for all the versions.  */
> > +      it_v = default_version_info;
> > +      while (it_v != NULL)
> > +	{
> > +	  it_v->dispatcher_resolver = dispatch_decl;
> > +	  it_v = it_v->next;
> > +	}
> > +    }
> > +  else
> > +    {
> > +      error_at (DECL_SOURCE_LOCATION (default_node->decl),
> > +		"multiversioning needs %<ifunc%> which is not supported "
> > +		"on this target");
> > +    }
> > +
> > +  return dispatch_decl;
> > +}
> > +
> > +bool
> > +aarch64_common_function_versions (tree fn1, tree fn2)
> 
> Missing comment here too.  Same for other functions later.

Added.
 
> > +{
> > +  if (TREE_CODE (fn1) != FUNCTION_DECL
> > +      || TREE_CODE (fn2) != FUNCTION_DECL)
> > +    return false;
> > +
> > +  return (aarch64_compare_version_priority (fn1, fn2) != 0);
> > +}
> > +
> > +
> > +tree
> > +aarch64_mangle_decl_assembler_name (tree decl, tree id)
> > +{
> > +  /* For function version, add the target suffix to the assembler name.  */
> > +  if (TREE_CODE (decl) == FUNCTION_DECL
> > +      && DECL_FUNCTION_VERSIONED (decl))
> > +    {
> > +      aarch64_fmv_feature_mask feature_mask = get_feature_mask_for_version (decl);
> > +
> > +      /* No suffix for the default version.  */
> > +      if (feature_mask == 0ULL)
> > +	return id;
> > +
> > +      char suffix[2048];
> > +      int pos = 0;
> > +      const char *base = IDENTIFIER_POINTER (id);
> > +
> > +      for (int i = 1; i < FEAT_MAX; i++)
> 
> Why does this start at 1 rather than 0?  Think it deserves a comment.

It starts at 1 because that array used to have a "default" entry at the start.
Now it's just a bug - thanks for spotting.  Fixed in the next version.

> > +	{
> > +	  if (feature_mask & aarch64_fmv_feature_data[i].feature_mask)
> > +	    {
> > +	      suffix[pos] = 'M';
> > +	      strcpy (&suffix[pos+1], aarch64_fmv_feature_data[i].name);
> > +	      pos += strlen(aarch64_fmv_feature_data[i].name) + 1;
> > +	    }
> > +	}
> > +      suffix[pos] = '\0';
> > +
> > +      char *ret = XNEWVEC (char, strlen (base) + strlen (suffix) + 3);
> > +      sprintf (ret, "%s._%s", base, suffix);
> 
> It isn't obvious that the limit of 2048 is or will stay safe.  Probably
> best to build the suffix using a std::string instead.

It would be safe for now, because we have <64 features, each of which
contributes <32 characters.  But regardless, it's ugly confusing code that I
have now significantly improved by using std::string instead.

(The only reason I wrote it this way in the first place was because that's how
x86 did it, and I hadn't yet encountered usage of std::string elsewhere in
gcc.)
 
> Thanks,
> Richard
> 
> > +
> > +      if (DECL_ASSEMBLER_NAME_SET_P (decl))
> > +	SET_DECL_RTL (decl, NULL);
> > +
> > +      id = get_identifier (ret);
> > +    }
> > +  return id;
> > +}
> > +
> > +
> >  /* Helper for aarch64_can_inline_p.  In the case where CALLER and CALLEE are
> >     tri-bool options (yes, no, don't care) and the default value is
> >     DEF, determine whether to reject inlining.  */
> > @@ -28457,6 +29288,13 @@ aarch64_libgcc_floating_mode_supported_p
> >  #undef TARGET_OPTION_VALID_ATTRIBUTE_P
> >  #define TARGET_OPTION_VALID_ATTRIBUTE_P aarch64_option_valid_attribute_p
> >  
> > +#undef TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P
> > +#define TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P \
> > +  aarch64_option_valid_version_attribute_p
> > +
> > +#undef TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE
> > +#define TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE "target_version"
> > +
> >  #undef TARGET_SET_CURRENT_FUNCTION
> >  #define TARGET_SET_CURRENT_FUNCTION aarch64_set_current_function
> >  
> > @@ -28787,6 +29625,24 @@ aarch64_libgcc_floating_mode_supported_p
> >  #undef TARGET_CONST_ANCHOR
> >  #define TARGET_CONST_ANCHOR 0x1000000
> >  
> > +#undef TARGET_OPTION_FUNCTION_VERSIONS
> > +#define TARGET_OPTION_FUNCTION_VERSIONS aarch64_common_function_versions
> > +
> > +#undef TARGET_COMPARE_VERSION_PRIORITY
> > +#define TARGET_COMPARE_VERSION_PRIORITY aarch64_compare_version_priority
> > +
> > +#undef TARGET_GENERATE_VERSION_DISPATCHER_BODY
> > +#define TARGET_GENERATE_VERSION_DISPATCHER_BODY \
> > +  aarch64_generate_version_dispatcher_body
> > +
> > +#undef TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
> > +#define TARGET_GET_FUNCTION_VERSIONS_DISPATCHER \
> > +  aarch64_get_function_versions_dispatcher
> > +
> > +#undef TARGET_MANGLE_DECL_ASSEMBLER_NAME
> > +#define TARGET_MANGLE_DECL_ASSEMBLER_NAME aarch64_mangle_decl_assembler_name
> > +
> > +
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >  
> >  #include "gt-aarch64.h"
> > diff --git a/gcc/config/arm/aarch-common.h b/gcc/config/arm/aarch-common.h
> > index c6a67f0d05cc75d85d019e1cc910c37173884c03..70f01fd3da6919dd98cfe92bfc4c54b7d2cba72c 100644
> > --- a/gcc/config/arm/aarch-common.h
> > +++ b/gcc/config/arm/aarch-common.h
> > @@ -23,7 +23,7 @@
> >  #define GCC_AARCH_COMMON_H
> >  
> >  /* Enum describing the various ways that the
> > -   aarch*_parse_{arch,tune,cpu,extension} functions can fail.
> > +   aarch*_parse_{arch,tune,cpu,extension,fmv_extension} functions can fail.
> >     This way their callers can choose what kind of error to give.  */
> >  
> >  enum aarch_parse_opt_result
> > @@ -31,7 +31,8 @@ enum aarch_parse_opt_result
> >    AARCH_PARSE_OK,			/* Parsing was successful.  */
> >    AARCH_PARSE_MISSING_ARG,		/* Missing argument.  */
> >    AARCH_PARSE_INVALID_FEATURE,		/* Invalid feature modifier.  */
> > -  AARCH_PARSE_INVALID_ARG		/* Invalid arch, tune, cpu arg.  */
> > +  AARCH_PARSE_INVALID_ARG,		/* Invalid arch, tune, cpu arg.  */
> > +  AARCH_PARSE_DUPLICATE_FEATURE		/* Duplicate feature modifier.  */
> >  };
> >  
> >  /* Function types -msign-return-address should sign.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
> > index 8499f87c39b173491a89626af56f4e193b1d12b5..8b7d7d2d8a00f6d5a6a35ffca28be7f1ff4cb9c7 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_0.c
> > @@ -7,6 +7,6 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto} } } */
> >  
> >  /* Test a normal looking procinfo.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
> > index 551669091c7010379a4c5247a27c517c4e67ef98..234a1ce1d7b4714e64c95c15488784d73c0552f2 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_13.c
> > @@ -7,6 +7,6 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto} } } */
> >  
> >  /* Test one with mixed order of feature bits.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> > index 2f963bb2312711691f6f1c5989a100b88671ad52..bd3ea96a785de507578729a621ec4ae7bad8a516 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_16.c
> > @@ -7,6 +7,6 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\+sve2} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+sve2} } } */
> >  
> >  /* Test a normal looking procinfo.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
> > index c68a697aa3e97ef52fd7e90233c5bb4ac8dbddd9..33e6319b46dcebc717e8a415484093e980660fb5 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_17.c
> > @@ -7,6 +7,6 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+dotprod\+crypto\+sve2} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+sve2} } } */
> >  
> >  /* Test a normal looking procinfo.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> > index b5f0a3005f50cbf01edbcb8aefcc3c34aa11207f..abae7a7d1453f79f879ff5e24f7c67e819db1dbb 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_18.c
> > @@ -7,7 +7,7 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8.6-a\+crc\+fp16\+aes\+sha3\+rng} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8.6-a\+rng\+crc\+aes\+sha3\+fp16} } } */
> >  
> >  /* Test one where the boundary of buffer size would overwrite the last
> >     character read when stitching the fgets-calls together.  With the
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
> > index 980d3f79dfb03b0d8eb68f691bf2dedf80aed87d..a5b4b4d3442c6522a8cdadf4eebd3b5460e37213 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_19.c
> > @@ -7,7 +7,7 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+profile\+memtag\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+nopauth\n} } } */
> > +/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+memtag\+profile\+nopauth\n} } } */
> >  
> >  /* Test one that if the kernel doesn't report the availability of a mandatory
> >     feature that it has turned it off for whatever reason.  As such compilers
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
> > index 117df2b0b6cd5751d9f5175b4343aad9825a6c43..e12aa543d02924f268729f96fe1f17181287f097 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_20.c
> > @@ -7,7 +7,7 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+profile\+memtag\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\n} } } */
> > +/* { dg-final { scan-assembler {\.arch armv9-a\+crc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+memtag\+profile\n} } } */
> >  
> >  /* Check whether features that don't have a midr name during detection are
> >     correctly ignored.  These features shouldn't affect the native detection.
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
> > index efbd02cbdc0638db85e776f1e79043709c11df21..920e1d65711cbcb77b07441597180c0159ccabf9 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_21.c
> > @@ -7,7 +7,7 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+lse\+rcpc\+rdma\+dotprod\+fp16fml\+sb\+ssbs\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+flagm\n} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8-a\+flagm\+dotprod\+rdma\+lse\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\n} } } */
> >  
> >  /* Check that an Armv8-A core doesn't fall apart on extensions without midr
> >     values.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
> > index d431d4938265d024891b464ac3d069607b21d8e7..416a29b514ab7599a7092e26e3716ec8a50cc895 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_22.c
> > @@ -7,7 +7,7 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8-a\+crc\+lse\+rcpc\+rdma\+dotprod\+fp16fml\+sb\+ssbs\+sve2-sm4\+sve2-aes\+sve2-sha3\+sve2-bitperm\+i8mm\+bf16\+flagm\+pauth\n} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8-a\+flagm\+dotprod\+rdma\+lse\+crc\+fp16fml\+rcpc\+i8mm\+bf16\+sve2-aes\+sve2-bitperm\+sve2-sha3\+sve2-sm4\+sb\+ssbs\+pauth\n} } } */
> >  
> >  /* Check that an Armv8-A core doesn't fall apart on extensions without midr
> >     values and that it enables optional features.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
> > index 7608e8845a662219488effcdb8277006dcf457a9..907249c5c1e6a440731533407df0ff7caadcbf74 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_6.c
> > @@ -7,7 +7,7 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8-a\+fp16\+crypto} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+fp16} } } */
> >  
> > -/* Test one where the feature bits for crypto and fp16 are given in
> > -   same order as declared in options file.  */
> > +/* Test one where the crypto and fp16 options are specified in different
> > +   order from what is in the options file.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
> > index 72b14b4f6ed0d50a4fc8a35931fbd232b09d2b61..b68a07a7c16b7a3cc9a896cca152d78e5cf9ea2f 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_7.c
> > @@ -7,7 +7,7 @@ int main()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8-a\+fp16\+crypto} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8-a\+crypto\+fp16} } } */
> >  
> > -/* Test one where the crypto and fp16 options are specified in different
> > -   order from what is in the options file.  */
> > +/* Test one where the feature bits for crypto and fp16 are given in
> > +   same order as declared in options file.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/options_set_17.c b/gcc/testsuite/gcc.target/aarch64/options_set_17.c
> > index c490e1f47a0a7a3adcbb7e96a3974d5651a023e8..4c53edd5cb92f83b3d34454c85062ff3f67b50ee 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/options_set_17.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/options_set_17.c
> > @@ -6,6 +6,6 @@ int main ()
> >    return 0;
> >  }
> >  
> > -/* { dg-final { scan-assembler {\.arch armv8\.2-a\+crc\+dotprod} } } */
> > +/* { dg-final { scan-assembler {\.arch armv8\.2-a\+dotprod\+crc} } } */
> >  
> >   /* dotprod needs to be emitted pre armv8.4.  */
> > diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
> > index 0888ca4ed058430f524b99cb0e204bd996fa0e55..78664d5a4287be0369a4b02e1b8ab4a885869352 100644
> > --- a/libgcc/config/aarch64/cpuinfo.c
> > +++ b/libgcc/config/aarch64/cpuinfo.c
> > @@ -22,6 +22,8 @@
> >     see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> >     <http://www.gnu.org/licenses/>.  */
> >  
> > +#include "common/config/aarch64/cpuinfo.h"
> > +
> >  #if defined(__has_include)
> >  #if __has_include(<sys/auxv.h>)
> >  #include <sys/auxv.h>
> > @@ -39,73 +41,6 @@ typedef struct __ifunc_arg_t {
> >  #if __has_include(<asm/hwcap.h>)
> >  #include <asm/hwcap.h>
> >  
> > -/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
> > -enum CPUFeatures {
> > -  FEAT_RNG,
> > -  FEAT_FLAGM,
> > -  FEAT_FLAGM2,
> > -  FEAT_FP16FML,
> > -  FEAT_DOTPROD,
> > -  FEAT_SM4,
> > -  FEAT_RDM,
> > -  FEAT_LSE,
> > -  FEAT_FP,
> > -  FEAT_SIMD,
> > -  FEAT_CRC,
> > -  FEAT_SHA1,
> > -  FEAT_SHA2,
> > -  FEAT_SHA3,
> > -  FEAT_AES,
> > -  FEAT_PMULL,
> > -  FEAT_FP16,
> > -  FEAT_DIT,
> > -  FEAT_DPB,
> > -  FEAT_DPB2,
> > -  FEAT_JSCVT,
> > -  FEAT_FCMA,
> > -  FEAT_RCPC,
> > -  FEAT_RCPC2,
> > -  FEAT_FRINTTS,
> > -  FEAT_DGH,
> > -  FEAT_I8MM,
> > -  FEAT_BF16,
> > -  FEAT_EBF16,
> > -  FEAT_RPRES,
> > -  FEAT_SVE,
> > -  FEAT_SVE_BF16,
> > -  FEAT_SVE_EBF16,
> > -  FEAT_SVE_I8MM,
> > -  FEAT_SVE_F32MM,
> > -  FEAT_SVE_F64MM,
> > -  FEAT_SVE2,
> > -  FEAT_SVE_AES,
> > -  FEAT_SVE_PMULL128,
> > -  FEAT_SVE_BITPERM,
> > -  FEAT_SVE_SHA3,
> > -  FEAT_SVE_SM4,
> > -  FEAT_SME,
> > -  FEAT_MEMTAG,
> > -  FEAT_MEMTAG2,
> > -  FEAT_MEMTAG3,
> > -  FEAT_SB,
> > -  FEAT_PREDRES,
> > -  FEAT_SSBS,
> > -  FEAT_SSBS2,
> > -  FEAT_BTI,
> > -  FEAT_LS64,
> > -  FEAT_LS64_V,
> > -  FEAT_LS64_ACCDATA,
> > -  FEAT_WFXT,
> > -  FEAT_SME_F64,
> > -  FEAT_SME_I64,
> > -  FEAT_SME2,
> > -  FEAT_RCPC3,
> > -  FEAT_MAX,
> > -  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
> > -		    in __aarch64_cpu_features.  */
> > -  FEAT_INIT      /* Used as flag of features initialization completion.  */
> > -};
> > -
> >  /* Architecture features used in Function Multi Versioning.  */
> >  struct {
> >    unsigned long long features;

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-12-04 13:24 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-17  2:49 [PATCH v2 0/5] target_version and aarch64 function multiversioning Andrew Carlotti
2023-11-17  2:51 ` [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc Andrew Carlotti
2023-11-20 15:46   ` Richard Sandiford
2023-12-04 10:31     ` Andrew Carlotti
2023-11-17  2:53 ` [PATCH v2 2/5] c-family: Simplify attribute exclusion handling Andrew Carlotti
2023-11-19 21:45   ` Jeff Law
2023-11-17  2:54 ` [PATCH v2 3/5] ada: Improve " Andrew Carlotti
2023-11-17 10:45   ` Marc Poulhiès
2023-11-17 11:15     ` Andrew Carlotti
2023-11-20  8:26       ` Marc Poulhiès
2023-11-17  2:55 ` [PATCH v2 4/5] Add support for target_version attribute Andrew Carlotti
2023-11-29 17:53   ` Richard Sandiford
2023-12-04 11:14     ` Andrew Carlotti
2023-11-17  2:56 ` [PATCH v2 5/5] aarch64: Add function multiversioning support Andrew Carlotti
2023-11-24 16:22   ` Richard Sandiford
2023-12-04 13:23     ` Andrew Carlotti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).