[GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
@ 2020-01-10 18:47 Stam Markianos-Wright
  2020-01-13 10:33 ` Kyrill Tkachov
  0 siblings, 1 reply; 6+ messages in thread
From: Stam Markianos-Wright @ 2020-01-10 18:47 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Earnshaw, Richard Sandiford, Ramana Radhakrishnan,
	Kyrylo Tkachov, nickc

[-- Attachment #1: Type: text/plain, Size: 3285 bytes --]

Hi all,

This is a respin of patch:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html

which has now been split into two (similar to the Aarch64 version).

This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
It also adds a new machine_mode (BFmode) for this type and accompanying Vector
modes V4BFmode and V8BFmode.

The second patch in this series uses existing target hooks to restrict type use.

Regression testing on arm-none-eabi passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for test suite effective_target update.

Ok for trunk?

Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a 



gcc/ChangeLog:

2020-01-10  Stam Markianos-Wright  <stam.markianos-wright@arm.com>

	* config.gcc: Add arm_bf16.h.
	* config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix comment.
	(arm_simd_builtin_std_type): Add BFmode.
	(arm_init_simd_builtin_types): Define element types for vector types.
	(arm_init_bf16_types):  New function.
	(arm_init_builtins): Add arm_init_bf16_types function call.
	* config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes.
	* config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
	* config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
	(arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
	(arm_vector_mode_supported_p): Add V4BF, V8BF.
	(arm_mangle_type):
	* config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
          VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node,
          arm_bf16_ptr_type_node.
	* config/arm/arm.md: New enabled_for_bfmode_scalar,
          enabled_for_bfmode_vector attributes. Add BFmode to movhf expand.
          pattern and define_split between ARM registers.
	* config/arm/arm_bf16.h: New file.
	* config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
	* config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New.
          (VQXMOV): Add V8BF.
	* config/arm/neon.md: Add BF vector types to NEON move patterns.
	* config/arm/vfp.md: Add BFmode to movhf patterns.

gcc/testsuite/ChangeLog:

2020-01-10  Stam Markianos-Wright  <stam.markianos-wright@arm.com>

	* g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
	* g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
	* gcc.target/arm/bfloat16_scalar_1_1.c: New test.
	* gcc.target/arm/bfloat16_scalar_1_2.c: New test.
	* gcc.target/arm/bfloat16_scalar_2_1.c: New test.
	* gcc.target/arm/bfloat16_scalar_2_2.c: New test.
	* gcc.target/arm/bfloat16_scalar_3_1.c: New test.
	* gcc.target/arm/bfloat16_scalar_3_2.c: New test.
	* gcc.target/arm/bfloat16_scalar_4.c: New test.
	* gcc.target/arm/bfloat16_simd_1_1.c: New test.
	* gcc.target/arm/bfloat16_simd_1_2.c: New test.
	* gcc.target/arm/bfloat16_simd_2_1.c: New test.
	* gcc.target/arm/bfloat16_simd_2_2.c: New test.
	* gcc.target/arm/bfloat16_simd_3_1.c: New test.
	* gcc.target/arm/bfloat16_simd_3_2.c: New test.




[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: BFmode1of2-A32.patch --]
[-- Type: text/x-patch; name="BFmode1of2-A32.patch", Size: 57421 bytes --]

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c3d6464f3e6adaa1db818a61de00cff8e00ae08e..6a7a4725fe5e99fba16b40b18cfebb84984d06b8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -344,7 +344,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index df84560588a842ce3c69c589367625f6098cb5bb..7f279cca6688c6f11948159666ee647ae533c61d 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -315,12 +315,14 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
 #define v4hf_UP  E_V4HFmode
+#define v4bf_UP  E_V4BFmode
 #define v2si_UP  E_V2SImode
 #define v2sf_UP  E_V2SFmode
 #define di_UP    E_DImode
 #define v16qi_UP E_V16QImode
 #define v8hi_UP  E_V8HImode
 #define v8hf_UP  E_V8HFmode
+#define v8bf_UP  E_V8BFmode
 #define v4si_UP  E_V4SImode
 #define v4sf_UP  E_V4SFmode
 #define v2di_UP  E_V2DImode
@@ -328,9 +330,10 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define ei_UP	 E_EImode
 #define oi_UP	 E_OImode
 #define hf_UP	 E_HFmode
+#define bf_UP    E_BFmode
 #define si_UP	 E_SImode
 #define void_UP	 E_VOIDmode
-
+#define sf_UP	 E_SFmode
 #define UP(X) X##_UP
 
 typedef struct {
@@ -806,6 +809,11 @@ static struct arm_simd_type_info arm_simd_types [] = {
 
 /* The user-visible __fp16 type.  */
 tree arm_fp16_type_node = NULL_TREE;
+
+/* Back-end node type for brain float (bfloat) types.  */
+tree arm_bf16_type_node = NULL_TREE;
+tree arm_bf16_ptr_type_node = NULL_TREE;
+
 static tree arm_simd_intOI_type_node = NULL_TREE;
 static tree arm_simd_intEI_type_node = NULL_TREE;
 static tree arm_simd_intCI_type_node = NULL_TREE;
@@ -856,7 +864,7 @@ const char *
 arm_mangle_builtin_type (const_tree type)
 {
   const char *mangle;
-  /* Walk through all the AArch64 builtins types tables to filter out the
+  /* Walk through all the Arm builtins types tables to filter out the
      incoming type.  */
   if ((mangle = arm_mangle_builtin_vector_type (type))
       || (mangle = arm_mangle_builtin_scalar_type (type)))
@@ -897,6 +905,8 @@ arm_simd_builtin_std_type (machine_mode mode,
       return float_type_node;
     case E_DFmode:
       return double_type_node;
+    case E_BFmode:
+      return arm_bf16_type_node;
     default:
       gcc_unreachable ();
     }
@@ -1002,6 +1012,10 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Float32x2_t].eltype = float_type_node;
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
+  /* Init Bfloat vector types with underlying __bf16 scalar type.  */
+  arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
+  arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
+
   for (i = 0; i < nelts; i++)
     {
       tree eltype = arm_simd_types[i].eltype;
@@ -1187,6 +1201,19 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum *d,
   arm_builtin_decls[fcode] = fndecl;
 }
 
+/* Initialize the backend REAL_TYPE type supporting bfloat types.  */
+static void
+arm_init_bf16_types (void)
+{
+  arm_bf16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (arm_bf16_type_node) = 16;
+  SET_TYPE_MODE (arm_bf16_type_node, BFmode);
+  layout_type (arm_bf16_type_node);
+
+  lang_hooks.types.register_builtin_type (arm_bf16_type_node, "__bf16");
+  arm_bf16_ptr_type_node = build_pointer_type (arm_bf16_type_node);
+}
+
 /* Set up ACLE builtins, even builtins for instructions that are not
    in the current target ISA to allow the user to compile particular modules
    with different target specific options that differ from the command line
@@ -1955,6 +1982,8 @@ arm_init_builtins (void)
      arm_init_neon_builtins which uses it.  */
   arm_init_fp16_builtins ();
 
+  arm_init_bf16_types ();
+
   if (TARGET_MAYBE_HARD_FLOAT)
     {
       arm_init_neon_builtins ();
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index 21a77031d155acf4988e07b8ec3e8f2ba4a00149..ea92ef35723f979c8bb1f6bfb4fbeb6cd1e4b6e9 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -78,6 +78,11 @@ VECTOR_MODES (FLOAT, 8);      /*            V4HF V2SF */
 VECTOR_MODES (FLOAT, 16);     /*       V8HF V4SF V2DF */
 VECTOR_MODE (FLOAT, HF, 2);   /*                 V2HF */
 
+FLOAT_MODE (BF, 2, 0);
+ADJUST_FLOAT_FORMAT (BF, &arm_bfloat_half_format);
+VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
+VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
+
 /* Fraction and accumulator vector modes.  */
 VECTOR_MODES (FRACT, 4);      /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4);     /* V4UQQ V2UHQ */
diff --git a/gcc/config/arm/arm-simd-builtin-types.def b/gcc/config/arm/arm-simd-builtin-types.def
index 5b57bc2313ccb2c38595b45d4f1c4c5d0368ac4a..ea3c9f97b71f03ac28d83266bcdaddcd0d42678b 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -48,3 +48,5 @@
   ENTRY (Float16x8_t, V8HF, none, 128, float16, 19)
   ENTRY (Float32x4_t, V4SF, none, 128, float32, 19)
 
+  ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
+  ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8bf393e620f2db24f506d35d06d45877c801fbb5..0c1530645ee3c27e76e64d39f056c5e87952708f 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -81,6 +81,11 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
    the backend.  Defined in arm-builtins.c.  */
 extern tree arm_fp16_type_node;
 
+/* This type is the user-visible __bf16.  We need it in a few places in
+   the backend.  Defined in arm-builtins.c.  */
+extern tree arm_bf16_type_node;
+extern tree arm_bf16_ptr_type_node;
+
 \f
 #undef  CPP_SPEC
 #define CPP_SPEC "%(subtarget_cpp_spec)					\
@@ -1019,12 +1024,14 @@ extern int arm_arch_bf16;
 /* Modes valid for Neon D registers.  */
 #define VALID_NEON_DREG_MODE(MODE) \
   ((MODE) == V2SImode || (MODE) == V4HImode || (MODE) == V8QImode \
-   || (MODE) == V4HFmode || (MODE) == V2SFmode || (MODE) == DImode)
+   || (MODE) == V4HFmode || (MODE) == V2SFmode || (MODE) == DImode \
+   || (MODE) == V4BFmode)
 
 /* Modes valid for Neon Q registers.  */
 #define VALID_NEON_QREG_MODE(MODE) \
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
-   || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
+   || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode \
+   || (MODE) == V8BFmode)
 
 /* Structure modes valid for Neon registers.  */
 #define VALID_NEON_STRUCT_MODE(MODE) \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 07231d722b978b5c99eb5a27d8ad8ece3d6c80fd..9bd228b543315f8acedc4925430825ac282e04cd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -6065,7 +6065,7 @@ aapcs_vfp_sub_candidate (const_tree type, machine_mode *modep)
     {
     case REAL_TYPE:
       mode = TYPE_MODE (type);
-      if (mode != DFmode && mode != SFmode && mode != HFmode)
+      if (mode != DFmode && mode != SFmode && mode != HFmode && mode != BFmode)
 	return -1;
 
       if (*modep == VOIDmode)
@@ -24539,17 +24539,11 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 
   if (TARGET_HARD_FLOAT && IS_VFP_REGNUM (regno))
     {
-      if (mode == SFmode || mode == SImode)
-	return VFP_REGNO_OK_FOR_SINGLE (regno);
-
       if (mode == DFmode)
 	return VFP_REGNO_OK_FOR_DOUBLE (regno);
 
-      if (mode == HFmode)
-	return VFP_REGNO_OK_FOR_SINGLE (regno);
-
-      /* VFP registers can hold HImode values.  */
-      if (mode == HImode)
+      if (mode == HFmode || mode == BFmode || mode == HImode
+	  || mode == SFmode || mode == SImode)
 	return VFP_REGNO_OK_FOR_SINGLE (regno);
 
       if (TARGET_NEON)
@@ -28109,7 +28103,8 @@ arm_vector_mode_supported_p (machine_mode mode)
   /* Neon also supports V2SImode, etc. listed in the clause below.  */
   if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
       || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
-      || mode == V2DImode || mode == V8HFmode))
+      || mode == V2DImode || mode == V8HFmode || mode == V4BFmode
+      || mode == V8BFmode))
     return true;
 
   if ((TARGET_NEON || TARGET_IWMMXT)
@@ -29013,9 +29008,14 @@ arm_mangle_type (const_tree type)
       && lang_hooks.types_compatible_p (CONST_CAST_TREE (type), va_list_type))
     return "St9__va_list";
 
-  /* Half-precision float.  */
+  /* Half-precision floating point types.  */
   if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
-    return "Dh";
+    {
+      if (TYPE_MODE (type) == BFmode)
+	return "u6__bf16";
+      else
+	return "Dh";
+    }
 
   /* Try mangling as a Neon type, TYPE_NAME is non-NULL if this is a
      builtin type.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f89a2d412df8afe621241958b29a8a7d58dce284..6ebeba2f8e078fb9c8facfcd8e81085c6978630c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6181,8 +6181,8 @@
 )
 
 (define_split
-  [(set (match_operand:ANY64 0 "arm_general_register_operand" "")
-	(match_operand:ANY64 1 "arm_general_register_operand" ""))]
+  [(set (match_operand:ANY64_BF 0 "arm_general_register_operand" "")
+	(match_operand:ANY64_BF 1 "arm_general_register_operand" ""))]
   "TARGET_EITHER && reload_completed"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2) (match_dup 3))]
@@ -7130,52 +7130,52 @@
    (set_attr "length" "2,4,4,2,4,2,2,4,4")]
 )
 
-;; HFmode moves
-(define_expand "movhf"
-  [(set (match_operand:HF 0 "general_operand")
-	(match_operand:HF 1 "general_operand"))]
+;; HFmode and BFmode moves.
+(define_expand "mov<mode>"
+  [(set (match_operand:HFBF 0 "general_operand")
+	(match_operand:HFBF 1 "general_operand"))]
   "TARGET_EITHER"
   "
-  gcc_checking_assert (aligned_operand (operands[0], HFmode));
-  gcc_checking_assert (aligned_operand (operands[1], HFmode));
+  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
+  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
-        operands[1] = force_reg (HFmode, operands[1]);
+	operands[1] = force_reg (<MODE>mode, operands[1]);
     }
   else /* TARGET_THUMB1 */
     {
       if (can_create_pseudo_p ())
         {
            if (!REG_P (operands[0]))
-	     operands[1] = force_reg (HFmode, operands[1]);
+	     operands[1] = force_reg (<MODE>mode, operands[1]);
         }
     }
   "
 )
 
-(define_insn "*arm32_movhf"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=r,m,r,r")
-	(match_operand:HF 1 "general_operand"	   " m,r,r,F"))]
+(define_insn "*arm32_mov<mode>"
+  [(set (match_operand:HFBF 0 "nonimmediate_operand" "=r,m,r,r")
+	(match_operand:HFBF 1 "general_operand"	   " m,r,r,F"))]
   "TARGET_32BIT && !TARGET_HARD_FLOAT
-   && (	  s_register_operand (operands[0], HFmode)
-       || s_register_operand (operands[1], HFmode))"
+   && (	  s_register_operand (operands[0], <MODE>mode)
+       || s_register_operand (operands[1], <MODE>mode))"
   "*
   switch (which_alternative)
     {
     case 0:	/* ARM register from memory */
-      return \"ldrh%?\\t%0, %1\\t%@ __fp16\";
+      return \"ldrh%?\\t%0, %1\\t%@ __<fporbf>\";
     case 1:	/* memory from ARM register */
-      return \"strh%?\\t%1, %0\\t%@ __fp16\";
+      return \"strh%?\\t%1, %0\\t%@ __<fporbf>\";
     case 2:	/* ARM register from ARM register */
-      return \"mov%?\\t%0, %1\\t%@ __fp16\";
+      return \"mov%?\\t%0, %1\\t%@ __<fporbf>\";
     case 3:	/* ARM register from constant */
       {
 	long bits;
 	rtx ops[4];
 
 	bits = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (operands[1]),
-			       HFmode);
+			       <MODE>mode);
 	ops[0] = operands[0];
 	ops[1] = GEN_INT (bits);
 	ops[2] = GEN_INT (bits & 0xff00);
diff --git a/gcc/config/arm/arm_bf16.h b/gcc/config/arm/arm_bf16.h
new file mode 100644
index 0000000000000000000000000000000000000000..decf23f38346c033f9d7502ce82e11ce81b9bc3a
--- /dev/null
+++ b/gcc/config/arm/arm_bf16.h
@@ -0,0 +1,41 @@
+/* Arm BF16 intrinsics include file.
+
+   Copyright (C) 2019-2020 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+#ifndef _GCC_ARM_BF16_H
+#define _GCC_ARM_BF16_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef __bf16 bfloat16_t;
+typedef float float32_t;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index db8db53614ad1d5fc591579180fdbd7e6152229f..3c78f435009ab027f92693d00ab5b40960d5419d 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -39,6 +39,7 @@ extern "C" {
 #endif
 
 #include <arm_fp16.h>
+#include <arm_bf16.h>
 #include <stdint.h>
 
 typedef __simd64_int8_t int8x8_t;
@@ -83,6 +84,9 @@ typedef __simd128_uint64_t uint64x2_t;
 
 typedef float float32_t;
 
+typedef __simd128_bfloat16_t bfloat16x8_t;
+typedef __simd64_bfloat16_t bfloat16x4_t;
+
 /* The Poly types are user visible and live in their own world,
    keep them that way.  */
 typedef __builtin_neon_poly8 poly8_t;
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 7da8b74abc0fae6ac1dcc7fef45a2e75cf936414..33e29509f00a89fa23d0546687c0e4643f0b32d2 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -28,6 +28,10 @@
 ;; registers.
 (define_mode_iterator ANY64 [DI DF V8QI V4HI V4HF V2SI V2SF])
 
+;; Additional definition of ANY64 that also includes the special V4BF mode.
+;; BFmode is allowed only on define_split between ARM registers.
+(define_mode_iterator ANY64_BF [DI DF V8QI V4HI V4BF V4HF V2SI V2SF])
+
 (define_mode_iterator ANY128 [V2DI V2DF V16QI V8HI V4SI V4SF])
 
 ;; A list of integer modes that are up to one word long
@@ -80,6 +84,10 @@
 ;; Double-width vector modes plus 64-bit elements.
 (define_mode_iterator VDX [V8QI V4HI V4HF V2SI V2SF DI])
 
+;; Double-width vector modes plus 64-bit elements,
+;; with V4BFmode added, suitable for moves.
+(define_mode_iterator VDXMOV [V8QI V4HI V4HF V4BF V2SI V2SF DI])
+
 ;; Double-width vector modes, with V4HF - for vldN_lane and vstN_lane.
 (define_mode_iterator VD_LANE [V8QI V4HI V4HF V2SI V2SF])
 
@@ -101,8 +109,8 @@
 ;; Quad-width vector modes without floating-point elements.
 (define_mode_iterator VQI [V16QI V8HI V4SI])
 
-;; Quad-width vector modes, with TImode added, for moves.
-(define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI])
+;; Quad-width vector modes, with TImode and V8BFmode added, suitable for moves.
+(define_mode_iterator VQXMOV [V16QI V8HI V8HF V8BF V4SI V4SF V2DI TI])
 
 ;; Opaque structure types wider than TImode.
 (define_mode_iterator VSTRUCT [EI OI CI XI])
@@ -201,6 +209,12 @@
 ;; Vector modes for 16-bit floating-point support.
 (define_mode_iterator VH [V8HF V4HF])
 
+;; 16-bit floating-point vector modes suitable for moving (includes BFmode).
+(define_mode_iterator VHFBF [V8HF V4HF V4BF V8BF])
+
+;; 16-bit floating-point scalar modes suitable for moving (includes BFmode).
+(define_mode_iterator HFBF [HF BF])
+
 ;; Iterators used for fixed-point support.
 (define_mode_iterator FIXED [QQ HQ SQ UQQ UHQ USQ HA SA UHA USA])
 
@@ -485,6 +499,9 @@
 ;; vtbl<n> suffix for NEON vector modes.
 (define_mode_attr VTAB_n [(TI "2") (EI "3") (OI "4")])
 
+;; fp16 or bf16 marker for 16-bit float modes.
+(define_mode_attr fporbf [(HF "fp16") (BF "bf16")])
+
 ;; (Opposite) mode to convert to/from for NEON mode conversions.
 (define_mode_attr V_CVTTO [(V2SI "V2SF") (V2SF "V2SI")
                (V4SI "V4SF") (V4SF "V4SI")])
@@ -804,6 +821,7 @@
 		     (V4HF "") (V8HF "_q")
 		     (V2SF "") (V4SF "_q")
 		     (V4HF "") (V8HF "_q")
+		     (V4BF "") (V8BF "_q")
 		     (DI "")   (V2DI "_q")
 		     (DF "")   (V2DF "_q")
 		     (HF "")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index dace9470c4131458f43d8a2c012ba042fca86ee6..6087ca6f2badde6a492bb515a2cb5846f3d4ad8e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -34,9 +34,9 @@
   [(set_attr "type" "neon_store1_1reg")])
 
 (define_insn "*neon_mov<mode>"
-  [(set (match_operand:VDX 0 "nonimmediate_operand"
+  [(set (match_operand:VDXMOV 0 "nonimmediate_operand"
 	  "=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")
-	(match_operand:VDX 1 "general_operand"
+	(match_operand:VDXMOV 1 "general_operand"
 	  " w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]
   "TARGET_NEON
    && (register_operand (operands[0], <MODE>mode)
@@ -161,8 +161,8 @@
 })
 
 (define_expand "mov<mode>"
-  [(set (match_operand:VH 0 "s_register_operand")
-	(match_operand:VH 1 "s_register_operand"))]
+  [(set (match_operand:VHFBF 0 "s_register_operand")
+	(match_operand:VHFBF 1 "s_register_operand"))]
   "TARGET_NEON"
 {
   gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 28f2b77373818624defedaa5d20ac133985c1f3b..6fe64c34a03d319968b335374d8db98c207f3301 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -363,32 +363,32 @@
    (set_attr "arch"           "t2,any,any,any,a,t2,any,any,any,any,any,any")]
 )
 
-;; HFmode moves
+;; HFmode and BFmode moves
 
-(define_insn "*movhf_vfp_fp16"
-  [(set (match_operand:HF 0 "nonimmediate_operand"
-			  "= r,m,t,r,t,r,t,t,Um,r")
-	(match_operand:HF 1 "general_operand"
-			  "  m,r,t,r,r,t,Dv,Um,t,F"))]
+(define_insn "*mov<mode>_vfp_<mode>16"
+  [(set (match_operand:HFBF 0 "nonimmediate_operand"
+			  "= ?r,?m,t,r,t,r,t, t, Um,r")
+	(match_operand:HFBF 1 "general_operand"
+			  "  m,r,t,r,r,t,Dv,Um,t, F"))]
   "TARGET_32BIT
    && TARGET_VFP_FP16INST
-   && (s_register_operand (operands[0], HFmode)
-       || s_register_operand (operands[1], HFmode))"
+   && (s_register_operand (operands[0], <MODE>mode)
+       || s_register_operand (operands[1], <MODE>mode))"
  {
   switch (which_alternative)
     {
     case 0: /* ARM register from memory.  */
-      return \"ldrh%?\\t%0, %1\\t%@ __fp16\";
+      return \"ldrh%?\\t%0, %1\\t%@ __<fporbf>\";
     case 1: /* Memory from ARM register.  */
-      return \"strh%?\\t%1, %0\\t%@ __fp16\";
+      return \"strh%?\\t%1, %0\\t%@ __<fporbf>\";
     case 2: /* S register from S register.  */
-      return \"vmov\\t%0, %1\t%@ __fp16\";
+      return \"vmov\\t%0, %1\t%@ __<fporbf>\";
     case 3: /* ARM register from ARM register.  */
-      return \"mov%?\\t%0, %1\\t%@ __fp16\";
+      return \"mov%?\\t%0, %1\\t%@ __<fporbf>\";
     case 4: /* S register from ARM register.  */
     case 5: /* ARM register from S register.  */
     case 6: /* S register from immediate.  */
-      return \"vmov.f16\\t%0, %1\t%@ __fp16\";
+      return \"vmov.f16\\t%0, %1\t%@ __<fporbf>\";
     case 7: /* S register from memory.  */
       return \"vld1.16\\t{%z0}, %A1\";
     case 8: /* Memory from S register.  */
@@ -399,7 +399,7 @@
 	rtx ops[4];
 
 	bits = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (operands[1]),
-			       HFmode);
+			       <MODE>mode);
 	ops[0] = operands[0];
 	ops[1] = GEN_INT (bits);
 	ops[2] = GEN_INT (bits & 0xff00);
@@ -442,14 +442,14 @@
       (const_int 8))])]
 )
 
-(define_insn "*movhf_vfp_neon"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "= t,Um,r,m,t,r,t,r,r")
-	(match_operand:HF 1 "general_operand"	   " Um, t,m,r,t,r,r,t,F"))]
+(define_insn "*mov<mode>_vfp_neon"
+  [(set (match_operand:HFBF 0 "nonimmediate_operand" "= t,Um,?r,?m,t,r,t,r,r")
+	(match_operand:HFBF 1 "general_operand"	     " Um, t, m, r,t,r,r,t,F"))]
   "TARGET_32BIT
    && TARGET_HARD_FLOAT && TARGET_NEON_FP16
    && !TARGET_VFP_FP16INST
-   && (   s_register_operand (operands[0], HFmode)
-       || s_register_operand (operands[1], HFmode))"
+   && (   s_register_operand (operands[0], <MODE>mode)
+       || s_register_operand (operands[1], <MODE>mode))"
   "*
   switch (which_alternative)
     {
@@ -458,13 +458,13 @@
     case 1:     /* memory from S register */
       return \"vst1.16\\t{%z1}, %A0\";
     case 2:     /* ARM register from memory */
-      return \"ldrh\\t%0, %1\\t%@ __fp16\";
+      return \"ldrh\\t%0, %1\\t%@ __<fporbf>\";
     case 3:     /* memory from ARM register */
-      return \"strh\\t%1, %0\\t%@ __fp16\";
+      return \"strh\\t%1, %0\\t%@ __<fporbf>\";
     case 4:	/* S register from S register */
       return \"vmov.f32\\t%0, %1\";
     case 5:	/* ARM register from ARM register */
-      return \"mov\\t%0, %1\\t%@ __fp16\";
+      return \"mov\\t%0, %1\\t%@ __<fporbf>\";
     case 6:	/* S register from ARM register */
       return \"vmov\\t%0, %1\";
     case 7:	/* ARM register from S register */
@@ -475,7 +475,7 @@
 	rtx ops[4];
 
 	bits = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (operands[1]),
-			       HFmode);
+			       <MODE>mode);
 	ops[0] = operands[0];
 	ops[1] = GEN_INT (bits);
 	ops[2] = GEN_INT (bits & 0xff00);
@@ -498,26 +498,26 @@
 )
 
 ;; FP16 without element load/store instructions.
-(define_insn "*movhf_vfp"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=r,m,t,r,t,r,r")
-	(match_operand:HF 1 "general_operand"	   " m,r,t,r,r,t,F"))]
+(define_insn "*mov<mode>_vfp"
+  [(set (match_operand:HFBF 0 "nonimmediate_operand" "=r,m,t,r,t,r,r")
+	(match_operand:HFBF 1 "general_operand"	   " m,r,t,r,r,t,F"))]
   "TARGET_32BIT
    && TARGET_HARD_FLOAT
    && !TARGET_NEON_FP16
    && !TARGET_VFP_FP16INST
-   && (   s_register_operand (operands[0], HFmode)
-       || s_register_operand (operands[1], HFmode))"
+   && (   s_register_operand (operands[0], <MODE>mode)
+       || s_register_operand (operands[1], <MODE>mode))"
   "*
   switch (which_alternative)
     {
     case 0:     /* ARM register from memory */
-      return \"ldrh\\t%0, %1\\t%@ __fp16\";
+      return \"ldrh\\t%0, %1\\t%@ __<fporbf>\";
     case 1:     /* memory from ARM register */
-      return \"strh\\t%1, %0\\t%@ __fp16\";
+      return \"strh\\t%1, %0\\t%@ __<fporbf>\";
     case 2:	/* S register from S register */
       return \"vmov.f32\\t%0, %1\";
     case 3:	/* ARM register from ARM register */
-      return \"mov\\t%0, %1\\t%@ __fp16\";
+      return \"mov\\t%0, %1\\t%@ __<fporbf>\";
     case 4:	/* S register from ARM register */
       return \"vmov\\t%0, %1\";
     case 5:	/* ARM register from S register */
@@ -528,7 +528,7 @@
 	rtx ops[4];
 
 	bits = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (operands[1]),
-			       HFmode);
+			       <MODE>mode);
 	ops[0] = operands[0];
 	ops[1] = GEN_INT (bits);
 	ops[2] = GEN_INT (bits & 0xff00);
diff --git a/gcc/testsuite/g++.dg/abi/mangle-neon.C b/gcc/testsuite/g++.dg/abi/mangle-neon.C
index 9fabf4df00e450cf072af6cd2cba71fc72684d5d..57a9db269222bea6f81e30ec29bcc6837ea7fbd6 100644
--- a/gcc/testsuite/g++.dg/abi/mangle-neon.C
+++ b/gcc/testsuite/g++.dg/abi/mangle-neon.C
@@ -31,6 +31,9 @@ void f18 (int8x16_t, int8x16_t) {}
 void f19 (poly8_t a) {}
 void f20 (poly16_t a) {}
 
+void f21 (bfloat16x4_t a) {}
+void f22 (bfloat16x8_t a) {}
+
 // { dg-final { scan-assembler "_Z2f015__simd64_int8_t:" } }
 // { dg-final { scan-assembler "_Z2f116__simd64_int16_t:" } }
 // { dg-final { scan-assembler "_Z2f216__simd64_int32_t:" } }
@@ -52,3 +55,5 @@ void f20 (poly16_t a) {}
 // { dg-final { scan-assembler "_Z3f1816__simd128_int8_tS_:" } }
 // { dg-final { scan-assembler "_Z3f19a:" } }
 // { dg-final { scan-assembler "_Z3f20s:" } }
+// { dg-final { scan-assembler "_Z3f2120__simd64_bfloat16_t:" } }
+// { dg-final { scan-assembler "_Z3f2220__simd128_bfloat16_t:" } }
diff --git a/gcc/testsuite/g++.dg/ext/arm-bf16/bf16-mangle-1.C b/gcc/testsuite/g++.dg/ext/arm-bf16/bf16-mangle-1.C
new file mode 100644
index 0000000000000000000000000000000000000000..f634ed1a4404806d5922ae198e9eccd354093f55
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/arm-bf16/bf16-mangle-1.C
@@ -0,0 +1,13 @@
+/* { dg-do compile { target arm*-*-* } } */
+
+/* Test mangling */
+
+/* { dg-final { scan-assembler "\t.global\t_Z1fPu6__bf16" } } */
+void f (__bf16 *x) { }
+
+/* { dg-final { scan-assembler "\t.global\t_Z1gPu6__bf16S_" } } */
+void g (__bf16 *x, __bf16 *y) { }
+
+/* { dg-final { scan-assembler "\t.global\t_ZN1SIu6__bf16u6__bf16E1iE" } } */
+template <typename T, typename U> struct S { static int i; };
+template <> int S<__bf16, __bf16>::i = 3;
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_1.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..efcc56105dc4532a1b2d3eaa4ee3b264b928a06c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_1.c
@@ -0,0 +1,118 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon }  */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_bf16.h>
+
+/*
+**stacktest1:
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**bfloat_mov_ww:
+**	...
+**	vmov.f32	s1, s15
+**	...
+**	bx	lr
+*/
+void bfloat_mov_ww (void)
+{
+  register bfloat16_t x asm ("s15");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_rw:
+**	...
+**	vmov	s1, r4
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rw (void)
+{
+  register bfloat16_t x asm ("r4");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_wr:
+**	...
+**	vmov	r4, s1
+**	...
+**	bx	lr
+*/
+void bfloat_mov_wr (void)
+{
+  register bfloat16_t x asm ("s1");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rr:
+**	...
+**	mov	r4, r5	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rr (void)
+{
+  register bfloat16_t x asm ("r5");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rm:
+**	...
+**	strh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rm (void)
+{
+  register bfloat16_t x asm ("r4");
+  volatile bfloat16_t y;
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" : : : "memory");
+}
+
+/*
+**bfloat_mov_mr:
+**	...
+**	ldrh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_mr (void)
+{
+  volatile bfloat16_t x;
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : : : "memory");
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..8293cafcc147c958d6adebcf058d76e00f8c29c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_1_2.c
@@ -0,0 +1,119 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a+bf16 -mfloat-abi=softfp -mfpu=auto" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_bf16.h>
+
+/*
+**stacktest1:
+**	...
+**	strh	r[0-9]+, \[r[0-9]+\]	@ __bf16
+**	ldrh	r[0-9]+, \[sp, #[0-9]+\]	@ __bf16
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**bfloat_mov_ww:
+**	...
+**	vmov.f32	s1, s15
+**	...
+**	bx	lr
+*/
+void bfloat_mov_ww (void)
+{
+  register bfloat16_t x asm ("s15");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_rw:
+**	...
+**	vmov	s1, r4
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rw (void)
+{
+  register bfloat16_t x asm ("r4");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_wr:
+**	...
+**	vmov	r4, s1
+**	...
+**	bx	lr
+*/
+void bfloat_mov_wr (void)
+{
+  register bfloat16_t x asm ("s1");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rr:
+**	...
+**	mov	r4, r5	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rr (void)
+{
+  register bfloat16_t x asm ("r5");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rm:
+**	...
+**	strh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rm (void)
+{
+  register bfloat16_t x asm ("r4");
+  volatile bfloat16_t y;
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" : : : "memory");
+}
+
+/*
+**bfloat_mov_mr:
+**	...
+**	ldrh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_mr (void)
+{
+  volatile bfloat16_t x;
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : : : "memory");
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_1.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..e84f837e1627f031c9798fa8cb08c589029c373b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_1.c
@@ -0,0 +1,124 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard -mfpu=neon-fp-armv8" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_bf16.h>
+
+#pragma GCC push_options
+#pragma GCC target ("+bf16")
+
+/*
+**stacktest1:
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**bfloat_mov_ww:
+**	...
+**	vmov.f32	s1, s15
+**	...
+**	bx	lr
+*/
+void bfloat_mov_ww (void)
+{
+  register bfloat16_t x asm ("s15");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_rw:
+**	...
+**	vmov	s1, r4
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rw (void)
+{
+  register bfloat16_t x asm ("r4");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_wr:
+**	...
+**	vmov	r4, s1
+**	...
+**	bx	lr
+*/
+void bfloat_mov_wr (void)
+{
+  register bfloat16_t x asm ("s1");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rr:
+**	...
+**	mov	r4, r5	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rr (void)
+{
+  register bfloat16_t x asm ("r5");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rm:
+**	...
+**	strh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rm (void)
+{
+  register bfloat16_t x asm ("r4");
+  volatile bfloat16_t y;
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" : : : "memory");
+}
+
+/*
+**bfloat_mov_mr:
+**	...
+**	ldrh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_mr (void)
+{
+  volatile bfloat16_t x;
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : : : "memory");
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+#pragma GCC pop_options
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_2.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..93ec059819ad169400648903b86ed1ccc6e521e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_2_2.c
@@ -0,0 +1,124 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=softfp -mfpu=neon-fp-armv8" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_bf16.h>
+
+#pragma GCC push_options
+#pragma GCC target ("+bf16")
+
+/*
+**stacktest1:
+**	...
+**	strh	r[0-9]+, \[r[0-9]+\]	@ __bf16
+**	ldrh	r[0-9]+, \[sp, #[0-9]+\]	@ __bf16
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**bfloat_mov_ww:
+**	...
+**	vmov.f32	s1, s15
+**	...
+**	bx	lr
+*/
+void bfloat_mov_ww (void)
+{
+  register bfloat16_t x asm ("s15");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_rw:
+**	...
+**	vmov	s1, r4
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rw (void)
+{
+  register bfloat16_t x asm ("r4");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_wr:
+**	...
+**	vmov	r4, s1
+**	...
+**	bx	lr
+*/
+void bfloat_mov_wr (void)
+{
+  register bfloat16_t x asm ("s1");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rr:
+**	...
+**	mov	r4, r5	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rr (void)
+{
+  register bfloat16_t x asm ("r5");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rm:
+**	...
+**	strh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rm (void)
+{
+  register bfloat16_t x asm ("r4");
+  volatile bfloat16_t y;
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" : : : "memory");
+}
+
+/*
+**bfloat_mov_mr:
+**	...
+**	ldrh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_mr (void)
+{
+  volatile bfloat16_t x;
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : : : "memory");
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+#pragma GCC pop_options
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_1.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..a1a7069032211a115e10d3d7adbc559b5af05e51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_1.c
@@ -0,0 +1,119 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard -mfpu=neon-fp-armv8" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_bf16.h>
+
+/*
+**stacktest1:
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**bfloat_mov_ww:
+**	...
+**	vmov.f32	s1, s15
+**	...
+**	bx	lr
+*/
+void bfloat_mov_ww (void)
+{
+  register bfloat16_t x asm ("s15");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_rw:
+**	...
+**	vmov	s1, r4
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rw (void)
+{
+  register bfloat16_t x asm ("r4");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_wr:
+**	...
+**	vmov	r4, s1
+**	...
+**	bx	lr
+*/
+void bfloat_mov_wr (void)
+{
+  register bfloat16_t x asm ("s1");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rr:
+**	...
+**	mov	r4, r5	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rr (void)
+{
+  register bfloat16_t x asm ("r5");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rm:
+**	...
+**	strh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rm (void)
+{
+  register bfloat16_t x asm ("r4");
+  volatile bfloat16_t y;
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" : : : "memory");
+}
+
+/*
+**bfloat_mov_mr:
+**	...
+**	ldrh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_mr (void)
+{
+  volatile bfloat16_t x;
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : : : "memory");
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_2.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..f49072613f05735237ec803eb431cdf135fd06e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_3_2.c
@@ -0,0 +1,119 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=softfp -mfpu=neon-fp-armv8" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_bf16.h>
+
+/*
+**stacktest1:
+**	...
+**	strh	r[0-9]+, \[r[0-9]+\]	@ __bf16
+**	ldrh	r[0-9]+, \[sp, #[0-9]+\]	@ __bf16
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**bfloat_mov_ww:
+**	...
+**	vmov.f32	s1, s15
+**	...
+**	bx	lr
+*/
+void bfloat_mov_ww (void)
+{
+  register bfloat16_t x asm ("s15");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_rw:
+**	...
+**	vmov	s1, r4
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rw (void)
+{
+  register bfloat16_t x asm ("r4");
+  register bfloat16_t y asm ("s1");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "t" (y));
+}
+
+/*
+**bfloat_mov_wr:
+**	...
+**	vmov	r4, s1
+**	...
+**	bx	lr
+*/
+void bfloat_mov_wr (void)
+{
+  register bfloat16_t x asm ("s1");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=t" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rr:
+**	...
+**	mov	r4, r5	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rr (void)
+{
+  register bfloat16_t x asm ("r5");
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
+/*
+**bfloat_mov_rm:
+**	...
+**	strh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_rm (void)
+{
+  register bfloat16_t x asm ("r4");
+  volatile bfloat16_t y;
+  asm volatile ("#foo" : "=r" (x));
+  y = x;
+  asm volatile ("#foo" : : : "memory");
+}
+
+/*
+**bfloat_mov_mr:
+**	...
+**	ldrh	r4, \[.*\]	@ __bf16
+**	...
+**	bx	lr
+*/
+void bfloat_mov_mr (void)
+{
+  volatile bfloat16_t x;
+  register bfloat16_t y asm ("r4");
+  asm volatile ("#foo" : : : "memory");
+  y = x;
+  asm volatile ("#foo" :: "r" (y));
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_4.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_4.c
new file mode 100644
index 0000000000000000000000000000000000000000..9623941d01fc1db32c32f5dd6c4504f20d0b0ddb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_4.c
@@ -0,0 +1,16 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon }  */
+/* { dg-additional-options "-std=c99 -pedantic-errors -O3 --save-temps" } */
+
+#include <arm_bf16.h>
+
+_Complex bfloat16_t stacktest1 (_Complex bfloat16_t __a)
+{
+  volatile _Complex bfloat16_t b = __a;
+  return b;
+}
+
+/* { dg-error {ISO C does not support plain 'complex' meaning 'double complex'} "" { target *-*-* } 8 } */
+/* { dg-error {expected '=', ',', ';', 'asm' or '__attribute__' before 'stacktest1'} "" { target *-*-* } 8 } */
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_1.c b/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..cad7d54d8e3dab2a12e099ef34d1948f37c416f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_1.c
@@ -0,0 +1,91 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon }  */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+/*
+**stacktest1:
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**stacktest2:
+**	...
+**	vstr	d[0-9]+, \[sp\]
+**	vldr	d[0-9]+, \[sp\]
+**	...
+**	bx	lr
+*/
+bfloat16x4_t stacktest2 (bfloat16x4_t __a)
+{
+  volatile bfloat16x4_t b = __a;
+  return b;
+}
+
+/*
+**stacktest3:
+**	...
+**	vst1.64	{d[0-9]+-d[0-9]+}, \[sp:[0-9]+\]
+**	vld1.64	{d[0-9]+-d[0-9]+}, \[sp:[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16x8_t stacktest3 (bfloat16x8_t __a)
+{
+  volatile bfloat16x8_t b = __a;
+  return b;
+}
+
+/*  Test compilation of __attribute__ vectors of 8, 16, 32, etc. BFloats.  */
+typedef bfloat16_t v8bf __attribute__((vector_size(16)));
+typedef bfloat16_t v16bf __attribute__((vector_size(32)));
+typedef bfloat16_t v32bf __attribute__((vector_size(64)));
+typedef bfloat16_t v64bf __attribute__((vector_size(128)));
+typedef bfloat16_t v128bf __attribute__((vector_size(256)));
+
+v8bf stacktest4 (v8bf __a)
+{
+  volatile v8bf b = __a;
+  return b;
+}
+
+v16bf stacktest5 (v16bf __a)
+{
+  volatile v16bf b = __a;
+  return b;
+}
+
+v32bf stacktest6 (v32bf __a)
+{
+  volatile v32bf b = __a;
+  return b;
+}
+
+v64bf stacktest7 (v64bf __a)
+{
+  volatile v64bf b = __a;
+  return b;
+}
+
+v128bf stacktest8 (v128bf __a)
+{
+  volatile v128bf b = __a;
+  return b;
+}
+
+/* Test use of constant values to assign values to vectors.  */
+
+typedef bfloat16_t v2bf __attribute__((vector_size(4)));
+v2bf c2 (void) { return (v2bf) 0x12345678; }
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_2.c b/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..4ffcc54de5e3a0519338a2d42a13b18c3a8f39b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_1_2.c
@@ -0,0 +1,93 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a+bf16 -mfloat-abi=softfp -mfpu=auto" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+/*
+**stacktest1:
+**	...
+**	strh	r[0-9]+, \[r[0-9]+\]	@ __bf16
+**	ldrh	r[0-9]+, \[sp, #[0-9]+\]	@ __bf16
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**stacktest2:
+**	...
+**	strd	r[0-9]+, \[sp\]
+**	ldrd	r[0-9]+, \[sp\]
+**	...
+**	bx	lr
+*/
+bfloat16x4_t stacktest2 (bfloat16x4_t __a)
+{
+  volatile bfloat16x4_t b = __a;
+  return b;
+}
+
+/*
+**stacktest3:
+**	...
+**	stm	sp, {r[0-9]+-r[0-9]+}
+**	ldmia	sp, {r[0-9]+-r[0-9]+}
+**	...
+**	bx	lr
+*/
+bfloat16x8_t stacktest3 (bfloat16x8_t __a)
+{
+  volatile bfloat16x8_t b = __a;
+  return b;
+}
+
+/*  Test compilation of __attribute__ vectors of 8, 16, 32, etc. BFloats.  */
+typedef bfloat16_t v8bf __attribute__((vector_size(16)));
+typedef bfloat16_t v16bf __attribute__((vector_size(32)));
+typedef bfloat16_t v32bf __attribute__((vector_size(64)));
+typedef bfloat16_t v64bf __attribute__((vector_size(128)));
+typedef bfloat16_t v128bf __attribute__((vector_size(256)));
+
+v8bf stacktest4 (v8bf __a)
+{
+  volatile v8bf b = __a;
+  return b;
+}
+
+v16bf stacktest5 (v16bf __a)
+{
+  volatile v16bf b = __a;
+  return b;
+}
+
+v32bf stacktest6 (v32bf __a)
+{
+  volatile v32bf b = __a;
+  return b;
+}
+
+v64bf stacktest7 (v64bf __a)
+{
+  volatile v64bf b = __a;
+  return b;
+}
+
+v128bf stacktest8 (v128bf __a)
+{
+  volatile v128bf b = __a;
+  return b;
+}
+
+/* Test use of constant values to assign values to vectors.  */
+
+typedef bfloat16_t v2bf __attribute__((vector_size(4)));
+v2bf c2 (void) { return (v2bf) 0x12345678; }
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_1.c b/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..05ee4d878ec091e16f6ae4ed5bdf8ad117dab772
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_1.c
@@ -0,0 +1,97 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard -mfpu=neon-fp-armv8" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+#pragma GCC push_options
+#pragma GCC target ("+bf16")
+
+/*
+**stacktest1:
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**stacktest2:
+**	...
+**	vstr	d[0-9]+, \[sp\]
+**	vldr	d[0-9]+, \[sp\]
+**	...
+**	bx	lr
+*/
+bfloat16x4_t stacktest2 (bfloat16x4_t __a)
+{
+  volatile bfloat16x4_t b = __a;
+  return b;
+}
+
+/*
+**stacktest3:
+**	...
+**	vst1.64	{d[0-9]+-d[0-9]+}, \[sp:[0-9]+\]
+**	vld1.64	{d[0-9]+-d[0-9]+}, \[sp:[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16x8_t stacktest3 (bfloat16x8_t __a)
+{
+  volatile bfloat16x8_t b = __a;
+  return b;
+}
+
+/*  Test compilation of __attribute__ vectors of 8, 16, 32, etc. BFloats.  */
+typedef bfloat16_t v8bf __attribute__((vector_size(16)));
+typedef bfloat16_t v16bf __attribute__((vector_size(32)));
+typedef bfloat16_t v32bf __attribute__((vector_size(64)));
+typedef bfloat16_t v64bf __attribute__((vector_size(128)));
+typedef bfloat16_t v128bf __attribute__((vector_size(256)));
+
+v8bf stacktest4 (v8bf __a)
+{
+  volatile v8bf b = __a;
+  return b;
+}
+
+v16bf stacktest5 (v16bf __a)
+{
+  volatile v16bf b = __a;
+  return b;
+}
+
+v32bf stacktest6 (v32bf __a)
+{
+  volatile v32bf b = __a;
+  return b;
+}
+
+v64bf stacktest7 (v64bf __a)
+{
+  volatile v64bf b = __a;
+  return b;
+}
+
+v128bf stacktest8 (v128bf __a)
+{
+  volatile v128bf b = __a;
+  return b;
+}
+
+/* Test use of constant values to assign values to vectors.  */
+
+typedef bfloat16_t v2bf __attribute__((vector_size(4)));
+v2bf c2 (void) { return (v2bf) 0x12345678; }
+
+#pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_2.c b/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..15fba316d356c6da1f0667bd8115193df4e38ada
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_2_2.c
@@ -0,0 +1,97 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=softfp -mfpu=neon-fp-armv8" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+#pragma GCC push_options
+#pragma GCC target ("+bf16")
+
+/*
+**stacktest1:
+**	...
+**	strh	r[0-9]+, \[r[0-9]+\]	@ __bf16
+**	ldrh	r[0-9]+, \[sp, #[0-9]+\]	@ __bf16
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**stacktest2:
+**	...
+**	strd	r[0-9]+, \[sp\]
+**	ldrd	r[0-9]+, \[sp\]
+**	...
+**	bx	lr
+*/
+bfloat16x4_t stacktest2 (bfloat16x4_t __a)
+{
+  volatile bfloat16x4_t b = __a;
+  return b;
+}
+
+/*
+**stacktest3:
+**	...
+**	stm	sp, {r[0-9]+-r[0-9]+}
+**	ldmia	sp, {r[0-9]+-r[0-9]+}
+**	...
+**	bx	lr
+*/
+bfloat16x8_t stacktest3 (bfloat16x8_t __a)
+{
+  volatile bfloat16x8_t b = __a;
+  return b;
+}
+
+/*  Test compilation of __attribute__ vectors of 8, 16, 32, etc. BFloats.  */
+typedef bfloat16_t v8bf __attribute__((vector_size(16)));
+typedef bfloat16_t v16bf __attribute__((vector_size(32)));
+typedef bfloat16_t v32bf __attribute__((vector_size(64)));
+typedef bfloat16_t v64bf __attribute__((vector_size(128)));
+typedef bfloat16_t v128bf __attribute__((vector_size(256)));
+
+v8bf stacktest4 (v8bf __a)
+{
+  volatile v8bf b = __a;
+  return b;
+}
+
+v16bf stacktest5 (v16bf __a)
+{
+  volatile v16bf b = __a;
+  return b;
+}
+
+v32bf stacktest6 (v32bf __a)
+{
+  volatile v32bf b = __a;
+  return b;
+}
+
+v64bf stacktest7 (v64bf __a)
+{
+  volatile v64bf b = __a;
+  return b;
+}
+
+v128bf stacktest8 (v128bf __a)
+{
+  volatile v128bf b = __a;
+  return b;
+}
+
+/* Test use of constant values to assign values to vectors.  */
+
+typedef bfloat16_t v2bf __attribute__((vector_size(4)));
+v2bf c2 (void) { return (v2bf) 0x12345678; }
+
+#pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_1.c b/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..b9b7606d0352307b2b741a84bd8d901cf2437007
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_1.c
@@ -0,0 +1,93 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=hard -mfpu=neon-fp-armv8" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+/*
+**stacktest1:
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**stacktest2:
+**	...
+**	vstr	d[0-9]+, \[sp\]
+**	vldr	d[0-9]+, \[sp\]
+**	...
+**	bx	lr
+*/
+bfloat16x4_t stacktest2 (bfloat16x4_t __a)
+{
+  volatile bfloat16x4_t b = __a;
+  return b;
+}
+
+/*
+**stacktest3:
+**	...
+**	vst1.64	{d[0-9]+-d[0-9]+}, \[sp:[0-9]+\]
+**	vld1.64	{d[0-9]+-d[0-9]+}, \[sp:[0-9]+\]
+**	...
+**	bx	lr
+*/
+bfloat16x8_t stacktest3 (bfloat16x8_t __a)
+{
+  volatile bfloat16x8_t b = __a;
+  return b;
+}
+
+/*  Test compilation of __attribute__ vectors of 8, 16, 32, etc. BFloats.  */
+typedef bfloat16_t v8bf __attribute__((vector_size(16)));
+typedef bfloat16_t v16bf __attribute__((vector_size(32)));
+typedef bfloat16_t v32bf __attribute__((vector_size(64)));
+typedef bfloat16_t v64bf __attribute__((vector_size(128)));
+typedef bfloat16_t v128bf __attribute__((vector_size(256)));
+
+v8bf stacktest4 (v8bf __a)
+{
+  volatile v8bf b = __a;
+  return b;
+}
+
+v16bf stacktest5 (v16bf __a)
+{
+  volatile v16bf b = __a;
+  return b;
+}
+
+v32bf stacktest6 (v32bf __a)
+{
+  volatile v32bf b = __a;
+  return b;
+}
+
+v64bf stacktest7 (v64bf __a)
+{
+  volatile v64bf b = __a;
+  return b;
+}
+
+v128bf stacktest8 (v128bf __a)
+{
+  volatile v128bf b = __a;
+  return b;
+}
+
+/* Test use of constant values to assign values to vectors.  */
+
+typedef bfloat16_t v2bf __attribute__((vector_size(4)));
+v2bf c2 (void) { return (v2bf) 0x12345678; }
+
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_2.c b/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..ab1fe101af4ab3ad68dba9848b7d5b875ebf426c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_simd_3_2.c
@@ -0,0 +1,94 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-additional-options "-march=armv8.2-a -mfloat-abi=softfp -mfpu=neon-fp-armv8" } */
+/* { dg-additional-options "-O3 --save-temps -std=gnu90" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+/*
+**stacktest1:
+**	...
+**	strh	r[0-9]+, \[r[0-9]+\]	@ __bf16
+**	ldrh	r[0-9]+, \[sp, #[0-9]+\]	@ __bf16
+**	...
+**	bx	lr
+*/
+bfloat16_t stacktest1 (bfloat16_t __a)
+{
+  volatile bfloat16_t b = __a;
+  return b;
+}
+
+/*
+**stacktest2:
+**	...
+**	strd	r[0-9]+, \[sp\]
+**	ldrd	r[0-9]+, \[sp\]
+**	...
+**	bx	lr
+*/
+bfloat16x4_t stacktest2 (bfloat16x4_t __a)
+{
+  volatile bfloat16x4_t b = __a;
+  return b;
+}
+
+/*
+**stacktest3:
+**	...
+**	stm	sp, {r[0-9]+-r[0-9]+}
+**	ldmia	sp, {r[0-9]+-r[0-9]+}
+**	...
+**	bx	lr
+*/
+bfloat16x8_t stacktest3 (bfloat16x8_t __a)
+{
+  volatile bfloat16x8_t b = __a;
+  return b;
+}
+
+/*  Test compilation of __attribute__ vectors of 8, 16, 32, etc. BFloats.  */
+typedef bfloat16_t v8bf __attribute__((vector_size(16)));
+typedef bfloat16_t v16bf __attribute__((vector_size(32)));
+typedef bfloat16_t v32bf __attribute__((vector_size(64)));
+typedef bfloat16_t v64bf __attribute__((vector_size(128)));
+typedef bfloat16_t v128bf __attribute__((vector_size(256)));
+
+v8bf stacktest4 (v8bf __a)
+{
+  volatile v8bf b = __a;
+  return b;
+}
+
+v16bf stacktest5 (v16bf __a)
+{
+  volatile v16bf b = __a;
+  return b;
+}
+
+v32bf stacktest6 (v32bf __a)
+{
+  volatile v32bf b = __a;
+  return b;
+}
+
+v64bf stacktest7 (v64bf __a)
+{
+  volatile v64bf b = __a;
+  return b;
+}
+
+v128bf stacktest8 (v128bf __a)
+{
+  volatile v128bf b = __a;
+  return b;
+}
+
+/* Test use of constant values to assign values to vectors.  */
+
+typedef bfloat16_t v2bf __attribute__((vector_size(4)));
+v2bf c2 (void) { return (v2bf) 0x12345678; }
+
+


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
  2020-01-10 18:47 [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2] Stam Markianos-Wright
@ 2020-01-13 10:33 ` Kyrill Tkachov
  2020-01-16 16:01   ` Stam Markianos-Wright
  0 siblings, 1 reply; 6+ messages in thread
From: Kyrill Tkachov @ 2020-01-13 10:33 UTC (permalink / raw)
  To: Stam Markianos-Wright, gcc-patches
  Cc: Richard Earnshaw, Richard Sandiford, Ramana Radhakrishnan, nickc

Hi Stam,

On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:
> Hi all,
>
> This is a respin of patch:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>
> which has now been split into two (similar to the Aarch64 version).
>
> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
> It also adds a new machine_mode (BFmode) for this type and 
> accompanying Vector
> modes V4BFmode and V8BFmode.
>
> The second patch in this series uses existing target hooks to restrict 
> type use.
>
> Regression testing on arm-none-eabi passed successfully.
>
> This patch depends on:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>
> for test suite effective_target update.
>
> Ok for trunk?

This is ok, thanks.

You can commit it once the git conversion goes through :)

Kyrill


>
> Cheers,
> Stam
>
>
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Details on ARM Bfloat can be found here:
> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a 
>
>
>
>
> gcc/ChangeLog:
>
> 2020-01-10Â  Stam Markianos-Wright <stam.markianos-wright@arm.com>
>
> Â Â Â Â Â Â Â  * config.gcc: Add arm_bf16.h.
> Â Â Â Â Â Â Â  * config/arm/arm-builtins.c (arm_mangle_builtin_type):Â  Fix 
> comment.
> Â Â Â Â Â Â Â  (arm_simd_builtin_std_type): Add BFmode.
> Â Â Â Â Â Â Â  (arm_init_simd_builtin_types): Define element types for vector 
> types.
> Â Â Â Â Â Â Â  (arm_init_bf16_types):Â  New function.
> Â Â Â Â Â Â Â  (arm_init_builtins): Add arm_init_bf16_types function call.
> Â Â Â Â Â Â Â  * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector 
> modes.
> Â Â Â Â Â Â Â  * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
> Â Â Â Â Â Â Â  * config/arm/arm.c (aapcs_vfp_sub_candidate):Â  Add BFmode.
> Â Â Â Â Â Â Â  (arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
> Â Â Â Â Â Â Â  (arm_vector_mode_supported_p): Add V4BF, V8BF.
> Â Â Â Â Â Â Â  (arm_mangle_type):
> Â Â Â Â Â Â Â  * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
> Â Â Â Â Â Â Â Â Â  VALID_NEON_QREG_MODE respectively. Add export 
> arm_bf16_type_node,
> Â Â Â Â Â Â Â Â Â  arm_bf16_ptr_type_node.
> Â Â Â Â Â Â Â  * config/arm/arm.md: New enabled_for_bfmode_scalar,
> Â Â Â Â Â Â Â Â Â  enabled_for_bfmode_vector attributes. Add BFmode to movhf 
> expand.
> Â Â Â Â Â Â Â Â Â  pattern and define_split between ARM registers.
> Â Â Â Â Â Â Â  * config/arm/arm_bf16.h: New file.
> Â Â Â Â Â Â Â  * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
> Â Â Â Â Â Â Â  * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, 
> fporbf): New.
> Â Â Â Â Â Â Â Â Â  (VQXMOV): Add V8BF.
> Â Â Â Â Â Â Â  * config/arm/neon.md: Add BF vector types to NEON move patterns.
> Â Â Â Â Â Â Â  * config/arm/vfp.md: Add BFmode to movhf patterns.
>
> gcc/testsuite/ChangeLog:
>
> 2020-01-10Â  Stam Markianos-Wright <stam.markianos-wright@arm.com>
>
> Â Â Â Â Â Â Â  * g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
> Â Â Â Â Â Â Â  * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_scalar_1_1.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_scalar_1_2.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_scalar_2_1.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_scalar_2_2.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_scalar_3_1.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_scalar_3_2.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_scalar_4.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_simd_1_1.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_simd_1_2.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_simd_2_1.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_simd_2_2.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_simd_3_1.c: New test.
> Â Â Â Â Â Â Â  * gcc.target/arm/bfloat16_simd_3_2.c: New test.
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
  2020-01-13 10:33 ` Kyrill Tkachov
@ 2020-01-16 16:01   ` Stam Markianos-Wright
  2020-01-20 13:17     ` Christophe Lyon
  0 siblings, 1 reply; 6+ messages in thread
From: Stam Markianos-Wright @ 2020-01-16 16:01 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches
  Cc: Richard Earnshaw, Richard Sandiford, Ramana Radhakrishnan, nickc



On 1/13/20 10:05 AM, Kyrill Tkachov wrote:
> Hi Stam,
> 
> On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This is a respin of patch:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>>
>> which has now been split into two (similar to the Aarch64 version).
>>
>> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
>> It also adds a new machine_mode (BFmode) for this type and accompanying Vector
>> modes V4BFmode and V8BFmode.
>>
>> The second patch in this series uses existing target hooks to restrict type use.
>>
>> Regression testing on arm-none-eabi passed successfully.
>>
>> This patch depends on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>
>> for test suite effective_target update.
>>
>> Ok for trunk?
> 
> This is ok, thanks.
> 
> You can commit it once the git conversion goes through :)

Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b

Thank you!
Stam
> 
> Kyrill
> 
> 
>>
>> Cheers,
>> Stam
>>
>>
>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>
>> Details on ARM Bfloat can be found here:
>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a 
>>
>>
>>
>>
>> gcc/ChangeLog:
>>
>> 2020-01-10  Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>
>>         * config.gcc: Add arm_bf16.h.
>>         * config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix comment.
>>         (arm_simd_builtin_std_type): Add BFmode.
>>         (arm_init_simd_builtin_types): Define element types for vector types.
>>         (arm_init_bf16_types):  New function.
>>         (arm_init_builtins): Add arm_init_bf16_types function call.
>>         * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes.
>>         * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
>>         * config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
>>         (arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
>>         (arm_vector_mode_supported_p): Add V4BF, V8BF.
>>         (arm_mangle_type):
>>         * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
>>           VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node,
>>           arm_bf16_ptr_type_node.
>>         * config/arm/arm.md: New enabled_for_bfmode_scalar,
>>           enabled_for_bfmode_vector attributes. Add BFmode to movhf expand.
>>           pattern and define_split between ARM registers.
>>         * config/arm/arm_bf16.h: New file.
>>         * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
>>         * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New.
>>           (VQXMOV): Add V8BF.
>>         * config/arm/neon.md: Add BF vector types to NEON move patterns.
>>         * config/arm/vfp.md: Add BFmode to movhf patterns.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2020-01-10  Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>
>>         * g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
>>         * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
>>         * gcc.target/arm/bfloat16_scalar_1_1.c: New test.
>>         * gcc.target/arm/bfloat16_scalar_1_2.c: New test.
>>         * gcc.target/arm/bfloat16_scalar_2_1.c: New test.
>>         * gcc.target/arm/bfloat16_scalar_2_2.c: New test.
>>         * gcc.target/arm/bfloat16_scalar_3_1.c: New test.
>>         * gcc.target/arm/bfloat16_scalar_3_2.c: New test.
>>         * gcc.target/arm/bfloat16_scalar_4.c: New test.
>>         * gcc.target/arm/bfloat16_simd_1_1.c: New test.
>>         * gcc.target/arm/bfloat16_simd_1_2.c: New test.
>>         * gcc.target/arm/bfloat16_simd_2_1.c: New test.
>>         * gcc.target/arm/bfloat16_simd_2_2.c: New test.
>>         * gcc.target/arm/bfloat16_simd_3_1.c: New test.
>>         * gcc.target/arm/bfloat16_simd_3_2.c: New test.
>>
>>
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
  2020-01-16 16:01   ` Stam Markianos-Wright
@ 2020-01-20 13:17     ` Christophe Lyon
  2020-01-20 15:22       ` Stam Markianos-Wright
  0 siblings, 1 reply; 6+ messages in thread
From: Christophe Lyon @ 2020-01-20 13:17 UTC (permalink / raw)
  To: Stam Markianos-Wright
  Cc: Kyrill Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford,
	Ramana Radhakrishnan, nickc

Hi,


On Thu, 16 Jan 2020 at 16:59, Stam Markianos-Wright
<Stam.Markianos-Wright@arm.com> wrote:
>
>
>
> On 1/13/20 10:05 AM, Kyrill Tkachov wrote:
> > Hi Stam,
> >
> > On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:
> >> Hi all,
> >>
> >> This is a respin of patch:
> >>
> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
> >>
> >> which has now been split into two (similar to the Aarch64 version).
> >>
> >> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
> >> It also adds a new machine_mode (BFmode) for this type and accompanying Vector
> >> modes V4BFmode and V8BFmode.
> >>
> >> The second patch in this series uses existing target hooks to restrict type use.
> >>
> >> Regression testing on arm-none-eabi passed successfully.
> >>
> >> This patch depends on:
> >>
> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
> >>
> >> for test suite effective_target update.
> >>
> >> Ok for trunk?
> >
> > This is ok, thanks.
> >
> > You can commit it once the git conversion goes through :)
>
> Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b
>

This since commit, I've noticed many ICEs like:
Executing on host:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/
/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never  -fdiagnostics-urls=never    -O0
-mfp16-format=ieee       -lm  -o ./arm-fp16-ops-1.exe    (timeout =
800)
spawn -ignore SIGHUP
/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/
/gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -fdiagnostics-urls=never -O0
-mfp16-format=ieee -lm -o ./arm-fp16-ops-1.exe
during RTL pass: expand
In file included from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:3,
                 from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c:5:
/gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h: In function 'main':
/gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:12: internal compiler
error: in convert_mode_scalar, at expr.c:328
/gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:3: note: in expansion
of macro 'CHECK'
0x8cb089 convert_mode_scalar
        /gcc/expr.c:325
0x8cb089 convert_move(rtx_def*, rtx_def*, int)
        /gcc/expr.c:297
0x8cb32f convert_modes(machine_mode, machine_mode, rtx_def*, int)
        /gcc/expr.c:737
0xb8b2a0 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
rtx_def*, int, optab_methods)
        /gcc/optabs.c:1895
0x8bdebc expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
expand_modifier)
        /gcc/expr.c:9847
0x77e52a expand_gimple_stmt_1
        /gcc/cfgexpand.c:3784
0x77e52a expand_gimple_stmt
        /gcc/cfgexpand.c:3844
0x78068d expand_gimple_basic_block
        /gcc/cfgexpand.c:5884
0x78279c execute
        /gcc/cfgexpand.c:6539

This example is for gcc.dg/torture/arm-fp16-ops-1.c target arm-none-eabi.

You said you saw no regressions, am I missing something?
(this is still true as of todays' daily-bump
bec238768255acf0fe5b0993d05cf99f6331b79e)

Thanks,

Christophe



> Thank you!
> Stam
> >
> > Kyrill
> >
> >
> >>
> >> Cheers,
> >> Stam
> >>
> >>
> >> ACLE documents are at https://developer.arm.com/docs/101028/latest
> >> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
> >>
> >> Details on ARM Bfloat can be found here:
> >> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
> >>
> >>
> >>
> >>
> >> gcc/ChangeLog:
> >>
> >> 2020-01-10  Stam Markianos-Wright <stam.markianos-wright@arm.com>
> >>
> >>         * config.gcc: Add arm_bf16.h.
> >>         * config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix comment.
> >>         (arm_simd_builtin_std_type): Add BFmode.
> >>         (arm_init_simd_builtin_types): Define element types for vector types.
> >>         (arm_init_bf16_types):  New function.
> >>         (arm_init_builtins): Add arm_init_bf16_types function call.
> >>         * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes.
> >>         * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
> >>         * config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
> >>         (arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
> >>         (arm_vector_mode_supported_p): Add V4BF, V8BF.
> >>         (arm_mangle_type):
> >>         * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
> >>           VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node,
> >>           arm_bf16_ptr_type_node.
> >>         * config/arm/arm.md: New enabled_for_bfmode_scalar,
> >>           enabled_for_bfmode_vector attributes. Add BFmode to movhf expand.
> >>           pattern and define_split between ARM registers.
> >>         * config/arm/arm_bf16.h: New file.
> >>         * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
> >>         * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New.
> >>           (VQXMOV): Add V8BF.
> >>         * config/arm/neon.md: Add BF vector types to NEON move patterns.
> >>         * config/arm/vfp.md: Add BFmode to movhf patterns.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2020-01-10  Stam Markianos-Wright <stam.markianos-wright@arm.com>
> >>
> >>         * g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
> >>         * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
> >>         * gcc.target/arm/bfloat16_scalar_1_1.c: New test.
> >>         * gcc.target/arm/bfloat16_scalar_1_2.c: New test.
> >>         * gcc.target/arm/bfloat16_scalar_2_1.c: New test.
> >>         * gcc.target/arm/bfloat16_scalar_2_2.c: New test.
> >>         * gcc.target/arm/bfloat16_scalar_3_1.c: New test.
> >>         * gcc.target/arm/bfloat16_scalar_3_2.c: New test.
> >>         * gcc.target/arm/bfloat16_scalar_4.c: New test.
> >>         * gcc.target/arm/bfloat16_simd_1_1.c: New test.
> >>         * gcc.target/arm/bfloat16_simd_1_2.c: New test.
> >>         * gcc.target/arm/bfloat16_simd_2_1.c: New test.
> >>         * gcc.target/arm/bfloat16_simd_2_2.c: New test.
> >>         * gcc.target/arm/bfloat16_simd_3_1.c: New test.
> >>         * gcc.target/arm/bfloat16_simd_3_2.c: New test.
> >>
> >>
> >>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
  2020-01-20 13:17     ` Christophe Lyon
@ 2020-01-20 15:22       ` Stam Markianos-Wright
  2020-01-20 15:59         ` Christophe Lyon
  0 siblings, 1 reply; 6+ messages in thread
From: Stam Markianos-Wright @ 2020-01-20 15:22 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: Kyrill Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford,
	Ramana Radhakrishnan, nickc



On 1/20/20 1:07 PM, Christophe Lyon wrote:
> Hi,
> 
> 
> On Thu, 16 Jan 2020 at 16:59, Stam Markianos-Wright
> <Stam.Markianos-Wright@arm.com> wrote:
>>
>>
>>
>> On 1/13/20 10:05 AM, Kyrill Tkachov wrote:
>>> Hi Stam,
>>>
>>> On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:
>>>> Hi all,
>>>>
>>>> This is a respin of patch:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>>>>
>>>> which has now been split into two (similar to the Aarch64 version).
>>>>
>>>> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
>>>> It also adds a new machine_mode (BFmode) for this type and accompanying Vector
>>>> modes V4BFmode and V8BFmode.
>>>>
>>>> The second patch in this series uses existing target hooks to restrict type use.
>>>>
>>>> Regression testing on arm-none-eabi passed successfully.
>>>>
>>>> This patch depends on:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>>
>>>> for test suite effective_target update.
>>>>
>>>> Ok for trunk?
>>>
>>> This is ok, thanks.
>>>
>>> You can commit it once the git conversion goes through :)
>>
>> Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b
>>
> 
> This since commit, I've noticed many ICEs like:
> Executing on host:
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never  -fdiagnostics-urls=never    -O0
> -mfp16-format=ieee       -lm  -o ./arm-fp16-ops-1.exe    (timeout =
> 800)
> spawn -ignore SIGHUP
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never -fdiagnostics-urls=never -O0
> -mfp16-format=ieee -lm -o ./arm-fp16-ops-1.exe
> during RTL pass: expand
> In file included from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:3,
>                   from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c:5:
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h: In function 'main':
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:12: internal compiler
> error: in convert_mode_scalar, at expr.c:328
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:3: note: in expansion
> of macro 'CHECK'
> 0x8cb089 convert_mode_scalar
>          /gcc/expr.c:325
> 0x8cb089 convert_move(rtx_def*, rtx_def*, int)
>          /gcc/expr.c:297
> 0x8cb32f convert_modes(machine_mode, machine_mode, rtx_def*, int)
>          /gcc/expr.c:737
> 0xb8b2a0 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
> rtx_def*, int, optab_methods)
>          /gcc/optabs.c:1895
> 0x8bdebc expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> expand_modifier)
>          /gcc/expr.c:9847
> 0x77e52a expand_gimple_stmt_1
>          /gcc/cfgexpand.c:3784
> 0x77e52a expand_gimple_stmt
>          /gcc/cfgexpand.c:3844
> 0x78068d expand_gimple_basic_block
>          /gcc/cfgexpand.c:5884
> 0x78279c execute
>          /gcc/cfgexpand.c:6539
> 
> This example is for gcc.dg/torture/arm-fp16-ops-1.c target arm-none-eabi.
> 
> You said you saw no regressions, am I missing something?
> (this is still true as of todays' daily-bump
> bec238768255acf0fe5b0993d05cf99f6331b79e)
> 
> Thanks,
> 
> Christophe

Hi Christophe!

Yes I think this is a duplicate of 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 which Martin raised last Friday.

I'm working on this! I made the rookie mistake of doing my reg-testing on a 
non-final version of the patch rather than the _final_ final version - hence not 
picking this up until it was too late... Sorry about that!

I'm working on the fix now :)

Cheers,
Stam


> 
> 
> 
>> Thank you!
>> Stam
>>>
>>> Kyrill
>>>
>>>
>>>>
>>>> Cheers,
>>>> Stam
>>>>
>>>>
>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>>
>>>> Details on ARM Bfloat can be found here:
>>>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
>>>>
>>>>
>>>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2020-01-10  Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>>
>>>>          * config.gcc: Add arm_bf16.h.
>>>>          * config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix comment.
>>>>          (arm_simd_builtin_std_type): Add BFmode.
>>>>          (arm_init_simd_builtin_types): Define element types for vector types.
>>>>          (arm_init_bf16_types):  New function.
>>>>          (arm_init_builtins): Add arm_init_bf16_types function call.
>>>>          * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes.
>>>>          * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
>>>>          * config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
>>>>          (arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
>>>>          (arm_vector_mode_supported_p): Add V4BF, V8BF.
>>>>          (arm_mangle_type):
>>>>          * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
>>>>            VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node,
>>>>            arm_bf16_ptr_type_node.
>>>>          * config/arm/arm.md: New enabled_for_bfmode_scalar,
>>>>            enabled_for_bfmode_vector attributes. Add BFmode to movhf expand.
>>>>            pattern and define_split between ARM registers.
>>>>          * config/arm/arm_bf16.h: New file.
>>>>          * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
>>>>          * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New.
>>>>            (VQXMOV): Add V8BF.
>>>>          * config/arm/neon.md: Add BF vector types to NEON move patterns.
>>>>          * config/arm/vfp.md: Add BFmode to movhf patterns.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2020-01-10  Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>>
>>>>          * g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
>>>>          * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
>>>>          * gcc.target/arm/bfloat16_scalar_1_1.c: New test.
>>>>          * gcc.target/arm/bfloat16_scalar_1_2.c: New test.
>>>>          * gcc.target/arm/bfloat16_scalar_2_1.c: New test.
>>>>          * gcc.target/arm/bfloat16_scalar_2_2.c: New test.
>>>>          * gcc.target/arm/bfloat16_scalar_3_1.c: New test.
>>>>          * gcc.target/arm/bfloat16_scalar_3_2.c: New test.
>>>>          * gcc.target/arm/bfloat16_scalar_4.c: New test.
>>>>          * gcc.target/arm/bfloat16_simd_1_1.c: New test.
>>>>          * gcc.target/arm/bfloat16_simd_1_2.c: New test.
>>>>          * gcc.target/arm/bfloat16_simd_2_1.c: New test.
>>>>          * gcc.target/arm/bfloat16_simd_2_2.c: New test.
>>>>          * gcc.target/arm/bfloat16_simd_3_1.c: New test.
>>>>          * gcc.target/arm/bfloat16_simd_3_2.c: New test.
>>>>
>>>>
>>>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]
  2020-01-20 15:22       ` Stam Markianos-Wright
@ 2020-01-20 15:59         ` Christophe Lyon
  0 siblings, 0 replies; 6+ messages in thread
From: Christophe Lyon @ 2020-01-20 15:59 UTC (permalink / raw)
  To: Stam Markianos-Wright
  Cc: Kyrill Tkachov, gcc-patches, Richard Earnshaw, Richard Sandiford,
	Ramana Radhakrishnan, nickc

On Mon, 20 Jan 2020 at 16:02, Stam Markianos-Wright
<Stam.Markianos-Wright@arm.com> wrote:
>
>
>
> On 1/20/20 1:07 PM, Christophe Lyon wrote:
> > Hi,
> >
> >
> > On Thu, 16 Jan 2020 at 16:59, Stam Markianos-Wright
> > <Stam.Markianos-Wright@arm.com> wrote:
> >>
> >>
> >>
> >> On 1/13/20 10:05 AM, Kyrill Tkachov wrote:
> >>> Hi Stam,
> >>>
> >>> On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:
> >>>> Hi all,
> >>>>
> >>>> This is a respin of patch:
> >>>>
> >>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
> >>>>
> >>>> which has now been split into two (similar to the Aarch64 version).
> >>>>
> >>>> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
> >>>> It also adds a new machine_mode (BFmode) for this type and accompanying Vector
> >>>> modes V4BFmode and V8BFmode.
> >>>>
> >>>> The second patch in this series uses existing target hooks to restrict type use.
> >>>>
> >>>> Regression testing on arm-none-eabi passed successfully.
> >>>>
> >>>> This patch depends on:
> >>>>
> >>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
> >>>>
> >>>> for test suite effective_target update.
> >>>>
> >>>> Ok for trunk?
> >>>
> >>> This is ok, thanks.
> >>>
> >>> You can commit it once the git conversion goes through :)
> >>
> >> Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b
> >>
> >
> > This since commit, I've noticed many ICEs like:
> > Executing on host:
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc
> > -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/
> > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c
> > -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> > -fdiagnostics-color=never  -fdiagnostics-urls=never    -O0
> > -mfp16-format=ieee       -lm  -o ./arm-fp16-ops-1.exe    (timeout =
> > 800)
> > spawn -ignore SIGHUP
> > /aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc
> > -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/
> > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c
> > -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> > -fdiagnostics-color=never -fdiagnostics-urls=never -O0
> > -mfp16-format=ieee -lm -o ./arm-fp16-ops-1.exe
> > during RTL pass: expand
> > In file included from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:3,
> >                   from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c:5:
> > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h: In function 'main':
> > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:12: internal compiler
> > error: in convert_mode_scalar, at expr.c:328
> > /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:3: note: in expansion
> > of macro 'CHECK'
> > 0x8cb089 convert_mode_scalar
> >          /gcc/expr.c:325
> > 0x8cb089 convert_move(rtx_def*, rtx_def*, int)
> >          /gcc/expr.c:297
> > 0x8cb32f convert_modes(machine_mode, machine_mode, rtx_def*, int)
> >          /gcc/expr.c:737
> > 0xb8b2a0 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
> > rtx_def*, int, optab_methods)
> >          /gcc/optabs.c:1895
> > 0x8bdebc expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> > expand_modifier)
> >          /gcc/expr.c:9847
> > 0x77e52a expand_gimple_stmt_1
> >          /gcc/cfgexpand.c:3784
> > 0x77e52a expand_gimple_stmt
> >          /gcc/cfgexpand.c:3844
> > 0x78068d expand_gimple_basic_block
> >          /gcc/cfgexpand.c:5884
> > 0x78279c execute
> >          /gcc/cfgexpand.c:6539
> >
> > This example is for gcc.dg/torture/arm-fp16-ops-1.c target arm-none-eabi.
> >
> > You said you saw no regressions, am I missing something?
> > (this is still true as of todays' daily-bump
> > bec238768255acf0fe5b0993d05cf99f6331b79e)
> >
> > Thanks,
> >
> > Christophe
>
> Hi Christophe!
>
> Yes I think this is a duplicate of
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 which Martin raised last Friday.
>
> I'm working on this! I made the rookie mistake of doing my reg-testing on a
> non-final version of the patch rather than the _final_ final version - hence not
> picking this up until it was too late... Sorry about that!
>

Good to know it's a duplicate, there has a been a few arm bootstrap
fixes during the week-end, but that was not sufficient.

Thanks


> I'm working on the fix now :)
>
> Cheers,
> Stam
>
>
> >
> >
> >
> >> Thank you!
> >> Stam
> >>>
> >>> Kyrill
> >>>
> >>>
> >>>>
> >>>> Cheers,
> >>>> Stam
> >>>>
> >>>>
> >>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
> >>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
> >>>>
> >>>> Details on ARM Bfloat can be found here:
> >>>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>> 2020-01-10  Stam Markianos-Wright <stam.markianos-wright@arm.com>
> >>>>
> >>>>          * config.gcc: Add arm_bf16.h.
> >>>>          * config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix comment.
> >>>>          (arm_simd_builtin_std_type): Add BFmode.
> >>>>          (arm_init_simd_builtin_types): Define element types for vector types.
> >>>>          (arm_init_bf16_types):  New function.
> >>>>          (arm_init_builtins): Add arm_init_bf16_types function call.
> >>>>          * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes.
> >>>>          * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
> >>>>          * config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
> >>>>          (arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
> >>>>          (arm_vector_mode_supported_p): Add V4BF, V8BF.
> >>>>          (arm_mangle_type):
> >>>>          * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
> >>>>            VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node,
> >>>>            arm_bf16_ptr_type_node.
> >>>>          * config/arm/arm.md: New enabled_for_bfmode_scalar,
> >>>>            enabled_for_bfmode_vector attributes. Add BFmode to movhf expand.
> >>>>            pattern and define_split between ARM registers.
> >>>>          * config/arm/arm_bf16.h: New file.
> >>>>          * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
> >>>>          * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New.
> >>>>            (VQXMOV): Add V8BF.
> >>>>          * config/arm/neon.md: Add BF vector types to NEON move patterns.
> >>>>          * config/arm/vfp.md: Add BFmode to movhf patterns.
> >>>>
> >>>> gcc/testsuite/ChangeLog:
> >>>>
> >>>> 2020-01-10  Stam Markianos-Wright <stam.markianos-wright@arm.com>
> >>>>
> >>>>          * g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
> >>>>          * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
> >>>>          * gcc.target/arm/bfloat16_scalar_1_1.c: New test.
> >>>>          * gcc.target/arm/bfloat16_scalar_1_2.c: New test.
> >>>>          * gcc.target/arm/bfloat16_scalar_2_1.c: New test.
> >>>>          * gcc.target/arm/bfloat16_scalar_2_2.c: New test.
> >>>>          * gcc.target/arm/bfloat16_scalar_3_1.c: New test.
> >>>>          * gcc.target/arm/bfloat16_scalar_3_2.c: New test.
> >>>>          * gcc.target/arm/bfloat16_scalar_4.c: New test.
> >>>>          * gcc.target/arm/bfloat16_simd_1_1.c: New test.
> >>>>          * gcc.target/arm/bfloat16_simd_1_2.c: New test.
> >>>>          * gcc.target/arm/bfloat16_simd_2_1.c: New test.
> >>>>          * gcc.target/arm/bfloat16_simd_2_2.c: New test.
> >>>>          * gcc.target/arm/bfloat16_simd_3_1.c: New test.
> >>>>          * gcc.target/arm/bfloat16_simd_3_2.c: New test.
> >>>>
> >>>>
> >>>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-01-20 15:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-10 18:47 [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2] Stam Markianos-Wright
2020-01-13 10:33 ` Kyrill Tkachov
2020-01-16 16:01   ` Stam Markianos-Wright
2020-01-20 13:17     ` Christophe Lyon
2020-01-20 15:22       ` Stam Markianos-Wright
2020-01-20 15:59         ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).