Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD
       [not found] <1f271712-6c61-1be2-68bc-d61b51790dd3@arm.com>
@ 2019-12-23 18:30 ` Richard Sandiford
  2020-01-31 14:51   ` Delia Burduv
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2019-12-23 18:30 UTC (permalink / raw)
  To: Delia Burduv
  Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Thanks for the patch, looks good.

Delia Burduv <Delia.Burduv@arm.com> writes:
> This patch adds the ARMv8.6 ACLE intrinsics for bfmmla, bfmlalb and bfmlalt as part of the BFloat16 extension.
> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
> The intrinsics are declared in arm_neon.h and the RTL patterns are defined in aarch64-simd.md.
> Two new tests are added to check assembler output.
>
> This patch depends on the two Aarch64 back-end patches. (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html and https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html)
>
> Tested for regression on aarch64-none-elf and aarch64_be-none-elf. I don't have commit rights, so if this is ok can someone please commit it for me?
>
> gcc/ChangeLog:
>
> 2019-10-29  Delia Burduv  <delia.burduv@arm.com>
>
>         * config/aarch64/aarch64-simd-builtins.def
>           (bfmmla): New built-in function.
>           (bfmlalb): New built-in function.
>           (bfmlalt): New built-in function.
>           (bfmlalb_lane): New built-in function.
>           (bfmlalt_lane): New built-in function.
>           (bfmlalb_laneq): New built-in function.
>           (bfmlalt_laneq): New built-in function
>         * config/aarch64/aarch64-simd.md (bfmmla): New pattern.
>           (bfmlal<bt>): New patterns.
>         * config/aarch64/arm_neon.h (vbfmmlaq_f32): New intrinsic.
>           (vbfmlalbq_f32): New intrinsic.
>           (vbfmlaltq_f32): New intrinsic.
>           (vbfmlalbq_lane_f32): New intrinsic.
>           (vbfmlaltq_lane_f32): New intrinsic.
>           (vbfmlalbq_laneq_f32): New intrinsic.
>           (vbfmlaltq_laneq_f32): New intrinsic.
>         * config/aarch64/iterators.md (UNSPEC_BFMMLA): New UNSPEC.
>           (UNSPEC_BFMLALB): New UNSPEC.
>           (UNSPEC_BFMLALT): New UNSPEC.
>           (BF_MLA): New int iterator.
>           (bt): Added UNSPEC_BFMLALB, UNSPEC_BFMLALT.
>         * config/arm/types.md (bf_mmla): New type.
>           (bf_mla): New type.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-29  Delia Burduv  <delia.burduv@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c: New test.
>         * gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c: New test.
>         * gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c:
>           New test.

Formatting nit: continuation lines should only be indented by a tab,
rather than a tab and two spaces.  (I agree the above looks nicer,
but the policy is not to be flexible over this kind of thing...)

> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
> index f4ca35a59704c761fe2ac2b6d401fff7c8aba80d..5e9f50f090870d0c63916540a48f5ac132d2630d 100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -682,3 +682,14 @@
>    BUILTIN_VSFDF (UNOP, frint32x, 0)
>    BUILTIN_VSFDF (UNOP, frint64z, 0)
>    BUILTIN_VSFDF (UNOP, frint64x, 0)
> +
> +  /* Implemented by aarch64_bfmmlaqv4sf  */
> +  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
> +
> +  /* Implemented by aarch64_bfmlal<bt>{_lane{q}}v4sf  */
> +  VAR1 (TERNOP, bfmlalb, 0, v4sf)
> +  VAR1 (TERNOP, bfmlalt, 0, v4sf)
> +  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
> +  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
> +  VAR1 (QUADOP_LANE, bfmlalb_laneq, 0, v4sf)
> +  VAR1 (QUADOP_LANE, bfmlalt_laneq, 0, v4sf)
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 55660ae248f4fa75d35ba2949cd4b9d5139ff5f5..66a6c4116a1fdd26dd4eec8b0609e28eb2c38fa1 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -7027,3 +7027,57 @@
>    "xtn\t%0.<Vntype>, %1.<Vtype>"
>    [(set_attr "type" "neon_shift_imm_narrow_q")]
>  )
> +
> +;; bfmmla
> +(define_insn "aarch64_bfmmlaqv4sf"
> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
> +        (plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
> +                   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
> +                                 (match_operand:V8BF 3 "register_operand" "w")]
> +                    UNSPEC_BFMMLA)))]
> +  "TARGET_BF16_SIMD"
> +  "bfmmla\\t%0.4s, %2.8h, %3.8h"
> +  [(set_attr "type" "neon_mla_s_q")]

Looks like this should be neon_fp_mla_s_q instead.

> +)
> +
> +;; bfmlal<bt>
> +(define_insn "aarch64_bfmlal<bt>v4sf"
> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
> +                                  (match_operand:V8BF 3 "register_operand" "w")]
> +                     BF_MLA)))]
> +  "TARGET_BF16_SIMD"
> +  "bfmlal<bt>\\t%0.4s, %2.8h, %3.8h"
> +  [(set_attr "type" "neon_fp_mla_s")]
> +)

Maybe we should have _q here too.  All the vectors are 128-bit vectors,
we just happen to ignore even or odd elements of the BF inputs.

> +(define_insn "aarch64_bfmlal<bt>_lanev4sf"
> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
> +                                  (match_operand:V4BF 3 "register_operand" "w")
> +                                  (match_operand:SI 4 "const_int_operand" "n")]
> +                     BF_MLA)))]
> +  "TARGET_BF16_SIMD"
> +{
> +  operands[4] = aarch64_endian_lane_rtx (V4BFmode, INTVAL (operands[4]));
> +  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
> +}
> +  [(set_attr "type" "neon_fp_mla_s")]
> +)

IIUC, these should be neon_fp_mla_s_scalar_q, but I might have misunderstood
the naming scheme.

> +(define_insn "aarch64_bfmlal<bt>_laneqv4sf"
> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
> +                                  (match_operand:V8BF 3 "register_operand" "w")
> +                                  (match_operand:SI 4 "const_int_operand" "n")]
> +                     BF_MLA)))]
> +  "TARGET_BF16_SIMD"
> +{
> +  operands[4] = aarch64_endian_lane_rtx (V8BFmode, INTVAL (operands[4]));
> +  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
> +}
> +  [(set_attr "type" "neon_fp_mla_s")]
> +)

These can be combined into a single pattern by using a mode iterator for
V4BF/V8BF.

> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 6cdbf381f0156ed993f03b847228b36ebbdd14f8..9001c63b0d44e7ad699ace097b9259681b691033 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -34610,6 +34610,70 @@ vrnd64xq_f64 (float64x2_t __a)
>  
>  #include "arm_bf16.h"
>  
> +#pragma GCC push_options
> +#pragma GCC target ("arch=armv8.2-a+bf16")
> +#ifdef __ARM_FEATURE_BF16_VECTOR_ARITHMETIC
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vbfmmlaq_f32 \
> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
> +
> +{

Formatting nits: should be:

vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
{

which no backslash, line break or blank line.

> +  return __builtin_aarch64_bfmmlaqv4sf (__r, __a, __b);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vbfmlalbq_f32 \
> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
> +{
> +  return __builtin_aarch64_bfmlalbv4sf (__r, __a, __b);
> +}
> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vbfmlaltq_f32 \
> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
> +{
> +  return __builtin_aarch64_bfmlaltv4sf (__r, __a, __b);
> +}

Same for these.

> +
> +__extension__ extern __inline float32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vbfmlalbq_lane_f32 \
> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, const int __index)
> +{

Here it's probably better to format as:

vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
		    const int __index)
{

Same for the rest of the file.

> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
> index df39522f2ad63a52c910b1a6bcc7aa13aaf5d021..2f5ada97991abc88cc74f4768eb395b2b757ee26 100644
> --- a/gcc/config/arm/types.md
> +++ b/gcc/config/arm/types.md
> @@ -550,6 +550,10 @@
>  ; The classification below is for TME instructions
>  ;
>  ; tme
> +;
> +; The classification below is for BFloat16 widening multiply-add
> +;
> +; bf_mla

This doesn't seem to be used by the new define_insns.

>  
>  (define_attr "type"
>   "adc_imm,\
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..11558be667c65228529ead90628604cba0bbd044
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
> @@ -0,0 +1,73 @@
> +/* { dg-do assemble { target { aarch64*-*-* } } } */
> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> +/* { dg-additional-options "-save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
> +
> +#include <arm_neon.h>
> +
> +/*
> +**test_bfmlalb:
> +**      ...
> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
> +**      ...
> +*/
> +float32x4_t test_bfmlalb (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
> +{
> +  return vbfmlalbq_f32 (r, a, b);
> +}
> +
> +/*
> +**test_bfmlalt:
> +**      ...
> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
> +**      ...
> +*/
> +float32x4_t test_bfmlalt (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
> +{
> +  return vbfmlaltq_f32 (r, a, b);
> +}
> +
> +/*
> +**test_bfmlalb_lane:
> +**      ...
> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[0\]
> +**      ...
> +*/
> +float32x4_t test_bfmlalb_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
> +{
> +  return vbfmlalbq_lane_f32 (r, a, b, 0);
> +}
> +
> +/*
> +**test_bfmlalt_lane:
> +**      ...
> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[2\]
> +**      ...
> +*/
> +float32x4_t test_bfmlalt_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
> +{
> +  return vbfmlaltq_lane_f32 (r, a, b, 2);
> +}
> +
> +/*
> +**test_bfmlalb_laneq:
> +**      ...
> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[4\]
> +**      ...
> +*/
> +float32x4_t test_bfmlalb_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
> +{
> +  return vbfmlalbq_laneq_f32 (r, a, b, 4);
> +}
> +
> +/*
> +**test_bfmlalt_laneq:
> +**      ...
> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[7\]
> +**      ...
> +*/
> +float32x4_t test_bfmlalt_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
> +{
> +  return vbfmlaltq_laneq_f32 (r, a, b, 7);
> +}

It might be better to compile these at -O and test for the exact
input and output registers.  E.g.:

**test_bfmlalt_laneq:
**      bfmlalt	v0\.4s, v1\.8h, v2\.h\[7\]
**      ret

> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b12cf47d67a33f13967738b48a4984765c0ff2df
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
> @@ -0,0 +1,19 @@
> +/* { dg-do assemble { target { aarch64*-*-* } } } */
> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> +/* { dg-additional-options "-save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
> +
> +#include <arm_neon.h>
> +
> +
> +/*
> +**test_bfmmla:
> +**	...
> +**	bfmmla	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
> +**	...
> +*/
> +float32x4_t test_bfmmla (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
> +{
> +  return vbfmmlaq_f32 (r, x, y);
> +}

Same here.

> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..4a8a9b64c04b39f3cd95101326022f67326921f5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
> @@ -0,0 +1,46 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> +
> +#include <arm_neon.h>
> +
> +void
> +f_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
> +{
> +  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 34655 } */
> +  vbfmlaltq_lane_f32 (r, a, b, -1);
> +  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 34655 } */
> +  vbfmlaltq_lane_f32 (r, a, b, 4);
> +  return;
> +}
> +
> +void
> +f_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
> +{
> +  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 34671 } */
> +  vbfmlaltq_laneq_f32 (r, a, b, -1);
> +  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 34671 } */
> +  vbfmlaltq_laneq_f32 (r, a, b, 8);
> +  return;
> +}
> +
> +void
> +f_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
> +{
> +  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 34647 } */
> +  vbfmlalbq_lane_f32 (r, a, b, -1);
> +  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 34647 } */
> +  vbfmlalbq_lane_f32 (r, a, b, 4);
> +  return;
> +}
> +
> +void
> +f_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
> +{
> +  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 34663 } */
> +  vbfmlalbq_laneq_f32 (r, a, b, -1);
> +  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 34663 } */
> +  vbfmlalbq_laneq_f32 (r, a, b, 8);
> +  return;
> +}

It'd better not to hard-code the arm_neon.h line numbers here.
The other tests use "0" -- does that work here too?

It'd also be good to have a test that checks for an appropriate error if
these intrinsics are used when bf16 is disabled.  We don't need that
for all intrinsics, just one would be enough.  (Sorry if you have that
in another patch, this was the first one I got to.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD
  2019-12-23 18:30 ` [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD Richard Sandiford
@ 2020-01-31 14:51   ` Delia Burduv
  2020-01-31 16:23     ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Delia Burduv @ 2020-01-31 14:51 UTC (permalink / raw)
  To: gcc-patches, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov,
	Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 17225 bytes --]


Thank you, Richard!

Here is the updated patch. The test that checks for errors when bf16 is 
disabled is in the bfcvt patch.

Cheers,
Delia

gcc/ChangeLog:

2019-11-06  Delia Burduv  <delia.burduv@arm.com>

         * config/aarch64/aarch64-simd-builtins.def
         (bfcvtn): New built-in function.
         (bfcvtn_q): New built-in function.
         (bfcvtn2): New built-in function.
         (bfcvt): New built-in function.
         * config/aarch64/aarch64-simd.md
         (aarch64_bfcvtn<q><mode>): New pattern.
         (aarch64_bfcvtn2v8bf): New pattern.
         (aarch64_bfcvtbf): New pattern.
         * config/aarch64/arm_bf16.h (float32_t): New typedef.
         (vcvth_bf16_f32): New intrinsic.
         * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
         (vcvtq_low_bf16_f32): New intrinsic.
         (vcvtq_high_bf16_f32): New intrinsic.
         * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
         (UNSPEC_BFCVTN): New UNSPEC.
         (UNSPEC_BFCVTN2): New UNSPEC.
         (UNSPEC_BFCVT): New UNSPEC.
         * config/arm/types.md (bf_cvt): New type.


gcc/testsuite/ChangeLog:

2019-11-06  Delia Burduv  <delia.burduv@arm.com>

         * gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c: New
	test.
         * gcc.target/aarch64/advsimd-intrinsics/bfcvt-nobf16.c: New
	test.
         * gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c: New
	test.
         * gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: New
	test.


On 12/23/19 6:11 PM, Richard Sandiford wrote:
> Thanks for the patch, looks good.
> 
> Delia Burduv <Delia.Burduv@arm.com> writes:
>> This patch adds the ARMv8.6 ACLE intrinsics for bfmmla, bfmlalb and bfmlalt as part of the BFloat16 extension.
>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
>> The intrinsics are declared in arm_neon.h and the RTL patterns are defined in aarch64-simd.md.
>> Two new tests are added to check assembler output.
>>
>> This patch depends on the two Aarch64 back-end patches. (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html and https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html)
>>
>> Tested for regression on aarch64-none-elf and aarch64_be-none-elf. I don't have commit rights, so if this is ok can someone please commit it for me?
>>
>> gcc/ChangeLog:
>>
>> 2019-10-29  Delia Burduv  <delia.burduv@arm.com>
>>
>>          * config/aarch64/aarch64-simd-builtins.def
>>            (bfmmla): New built-in function.
>>            (bfmlalb): New built-in function.
>>            (bfmlalt): New built-in function.
>>            (bfmlalb_lane): New built-in function.
>>            (bfmlalt_lane): New built-in function.
>>            (bfmlalb_laneq): New built-in function.
>>            (bfmlalt_laneq): New built-in function
>>          * config/aarch64/aarch64-simd.md (bfmmla): New pattern.
>>            (bfmlal<bt>): New patterns.
>>          * config/aarch64/arm_neon.h (vbfmmlaq_f32): New intrinsic.
>>            (vbfmlalbq_f32): New intrinsic.
>>            (vbfmlaltq_f32): New intrinsic.
>>            (vbfmlalbq_lane_f32): New intrinsic.
>>            (vbfmlaltq_lane_f32): New intrinsic.
>>            (vbfmlalbq_laneq_f32): New intrinsic.
>>            (vbfmlaltq_laneq_f32): New intrinsic.
>>          * config/aarch64/iterators.md (UNSPEC_BFMMLA): New UNSPEC.
>>            (UNSPEC_BFMLALB): New UNSPEC.
>>            (UNSPEC_BFMLALT): New UNSPEC.
>>            (BF_MLA): New int iterator.
>>            (bt): Added UNSPEC_BFMLALB, UNSPEC_BFMLALT.
>>          * config/arm/types.md (bf_mmla): New type.
>>            (bf_mla): New type.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-29  Delia Burduv  <delia.burduv@arm.com>
>>
>>          * gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c: New test.
>>          * gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c: New test.
>>          * gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c:
>>            New test.
> 
> Formatting nit: continuation lines should only be indented by a tab,
> rather than a tab and two spaces.  (I agree the above looks nicer,
> but the policy is not to be flexible over this kind of thing...)
> 
>> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
>> index f4ca35a59704c761fe2ac2b6d401fff7c8aba80d..5e9f50f090870d0c63916540a48f5ac132d2630d 100644
>> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
>> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
>> @@ -682,3 +682,14 @@
>>     BUILTIN_VSFDF (UNOP, frint32x, 0)
>>     BUILTIN_VSFDF (UNOP, frint64z, 0)
>>     BUILTIN_VSFDF (UNOP, frint64x, 0)
>> +
>> +  /* Implemented by aarch64_bfmmlaqv4sf  */
>> +  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
>> +
>> +  /* Implemented by aarch64_bfmlal<bt>{_lane{q}}v4sf  */
>> +  VAR1 (TERNOP, bfmlalb, 0, v4sf)
>> +  VAR1 (TERNOP, bfmlalt, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalb_laneq, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalt_laneq, 0, v4sf)
>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>> index 55660ae248f4fa75d35ba2949cd4b9d5139ff5f5..66a6c4116a1fdd26dd4eec8b0609e28eb2c38fa1 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -7027,3 +7027,57 @@
>>     "xtn\t%0.<Vntype>, %1.<Vtype>"
>>     [(set_attr "type" "neon_shift_imm_narrow_q")]
>>   )
>> +
>> +;; bfmmla
>> +(define_insn "aarch64_bfmmlaqv4sf"
>> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
>> +        (plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
>> +                   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
>> +                                 (match_operand:V8BF 3 "register_operand" "w")]
>> +                    UNSPEC_BFMMLA)))]
>> +  "TARGET_BF16_SIMD"
>> +  "bfmmla\\t%0.4s, %2.8h, %3.8h"
>> +  [(set_attr "type" "neon_mla_s_q")]
> 
> Looks like this should be neon_fp_mla_s_q instead.
> 
>> +)
>> +
>> +;; bfmlal<bt>
>> +(define_insn "aarch64_bfmlal<bt>v4sf"
>> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
>> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
>> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
>> +                                  (match_operand:V8BF 3 "register_operand" "w")]
>> +                     BF_MLA)))]
>> +  "TARGET_BF16_SIMD"
>> +  "bfmlal<bt>\\t%0.4s, %2.8h, %3.8h"
>> +  [(set_attr "type" "neon_fp_mla_s")]
>> +)
> 
> Maybe we should have _q here too.  All the vectors are 128-bit vectors,
> we just happen to ignore even or odd elements of the BF inputs.
> 
>> +(define_insn "aarch64_bfmlal<bt>_lanev4sf"
>> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
>> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
>> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
>> +                                  (match_operand:V4BF 3 "register_operand" "w")
>> +                                  (match_operand:SI 4 "const_int_operand" "n")]
>> +                     BF_MLA)))]
>> +  "TARGET_BF16_SIMD"
>> +{
>> +  operands[4] = aarch64_endian_lane_rtx (V4BFmode, INTVAL (operands[4]));
>> +  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
>> +}
>> +  [(set_attr "type" "neon_fp_mla_s")]
>> +)
> 
> IIUC, these should be neon_fp_mla_s_scalar_q, but I might have misunderstood
> the naming scheme.
> 
>> +(define_insn "aarch64_bfmlal<bt>_laneqv4sf"
>> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
>> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
>> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
>> +                                  (match_operand:V8BF 3 "register_operand" "w")
>> +                                  (match_operand:SI 4 "const_int_operand" "n")]
>> +                     BF_MLA)))]
>> +  "TARGET_BF16_SIMD"
>> +{
>> +  operands[4] = aarch64_endian_lane_rtx (V8BFmode, INTVAL (operands[4]));
>> +  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
>> +}
>> +  [(set_attr "type" "neon_fp_mla_s")]
>> +)
> 
> These can be combined into a single pattern by using a mode iterator for
> V4BF/V8BF.
> 
>> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
>> index 6cdbf381f0156ed993f03b847228b36ebbdd14f8..9001c63b0d44e7ad699ace097b9259681b691033 100644
>> --- a/gcc/config/aarch64/arm_neon.h
>> +++ b/gcc/config/aarch64/arm_neon.h
>> @@ -34610,6 +34610,70 @@ vrnd64xq_f64 (float64x2_t __a)
>>   
>>   #include "arm_bf16.h"
>>   
>> +#pragma GCC push_options
>> +#pragma GCC target ("arch=armv8.2-a+bf16")
>> +#ifdef __ARM_FEATURE_BF16_VECTOR_ARITHMETIC
>> +
>> +__extension__ extern __inline float32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vbfmmlaq_f32 \
>> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
>> +
>> +{
> 
> Formatting nits: should be:
> 
> vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
> {
> 
> which no backslash, line break or blank line.
> 
>> +  return __builtin_aarch64_bfmmlaqv4sf (__r, __a, __b);
>> +}
>> +
>> +__extension__ extern __inline float32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vbfmlalbq_f32 \
>> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
>> +{
>> +  return __builtin_aarch64_bfmlalbv4sf (__r, __a, __b);
>> +}
>> +
>> +__extension__ extern __inline float32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vbfmlaltq_f32 \
>> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
>> +{
>> +  return __builtin_aarch64_bfmlaltv4sf (__r, __a, __b);
>> +}
> 
> Same for these.
> 
>> +
>> +__extension__ extern __inline float32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vbfmlalbq_lane_f32 \
>> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, const int __index)
>> +{
> 
> Here it's probably better to format as:
> 
> vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
> 		    const int __index)
> {
> 
> Same for the rest of the file.
> 
>> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
>> index df39522f2ad63a52c910b1a6bcc7aa13aaf5d021..2f5ada97991abc88cc74f4768eb395b2b757ee26 100644
>> --- a/gcc/config/arm/types.md
>> +++ b/gcc/config/arm/types.md
>> @@ -550,6 +550,10 @@
>>   ; The classification below is for TME instructions
>>   ;
>>   ; tme
>> +;
>> +; The classification below is for BFloat16 widening multiply-add
>> +;
>> +; bf_mla
> 
> This doesn't seem to be used by the new define_insns.
> 
>>   
>>   (define_attr "type"
>>    "adc_imm,\
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..11558be667c65228529ead90628604cba0bbd044
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>> @@ -0,0 +1,73 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +/* { dg-additional-options "-save-temps" } */
>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>> +
>> +#include <arm_neon.h>
>> +
>> +/*
>> +**test_bfmlalb:
>> +**      ...
>> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalb (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  return vbfmlalbq_f32 (r, a, b);
>> +}
>> +
>> +/*
>> +**test_bfmlalt:
>> +**      ...
>> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalt (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  return vbfmlaltq_f32 (r, a, b);
>> +}
>> +
>> +/*
>> +**test_bfmlalb_lane:
>> +**      ...
>> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[0\]
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalb_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
>> +{
>> +  return vbfmlalbq_lane_f32 (r, a, b, 0);
>> +}
>> +
>> +/*
>> +**test_bfmlalt_lane:
>> +**      ...
>> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[2\]
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalt_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
>> +{
>> +  return vbfmlaltq_lane_f32 (r, a, b, 2);
>> +}
>> +
>> +/*
>> +**test_bfmlalb_laneq:
>> +**      ...
>> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[4\]
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalb_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  return vbfmlalbq_laneq_f32 (r, a, b, 4);
>> +}
>> +
>> +/*
>> +**test_bfmlalt_laneq:
>> +**      ...
>> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[7\]
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalt_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  return vbfmlaltq_laneq_f32 (r, a, b, 7);
>> +}
> 
> It might be better to compile these at -O and test for the exact
> input and output registers.  E.g.:
> 
> **test_bfmlalt_laneq:
> **      bfmlalt	v0\.4s, v1\.8h, v2\.h\[7\]
> **      ret
> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..b12cf47d67a33f13967738b48a4984765c0ff2df
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +/* { dg-additional-options "-save-temps" } */
>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>> +
>> +#include <arm_neon.h>
>> +
>> +
>> +/*
>> +**test_bfmmla:
>> +**	...
>> +**	bfmmla	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
>> +**	...
>> +*/
>> +float32x4_t test_bfmmla (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
>> +{
>> +  return vbfmmlaq_f32 (r, x, y);
>> +}
> 
> Same here.
> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..4a8a9b64c04b39f3cd95101326022f67326921f5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
>> @@ -0,0 +1,46 @@
>> +/* { dg-do compile { target { aarch64*-*-* } } } */
>> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +
>> +#include <arm_neon.h>
>> +
>> +void
>> +f_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
>> +{
>> +  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 34655 } */
>> +  vbfmlaltq_lane_f32 (r, a, b, -1);
>> +  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 34655 } */
>> +  vbfmlaltq_lane_f32 (r, a, b, 4);
>> +  return;
>> +}
>> +
>> +void
>> +f_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 34671 } */
>> +  vbfmlaltq_laneq_f32 (r, a, b, -1);
>> +  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 34671 } */
>> +  vbfmlaltq_laneq_f32 (r, a, b, 8);
>> +  return;
>> +}
>> +
>> +void
>> +f_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
>> +{
>> +  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 34647 } */
>> +  vbfmlalbq_lane_f32 (r, a, b, -1);
>> +  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 34647 } */
>> +  vbfmlalbq_lane_f32 (r, a, b, 4);
>> +  return;
>> +}
>> +
>> +void
>> +f_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 34663 } */
>> +  vbfmlalbq_laneq_f32 (r, a, b, -1);
>> +  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 34663 } */
>> +  vbfmlalbq_laneq_f32 (r, a, b, 8);
>> +  return;
>> +}
> 
> It'd better not to hard-code the arm_neon.h line numbers here.
> The other tests use "0" -- does that work here too?
> 
> It'd also be good to have a test that checks for an appropriate error if
> these intrinsics are used when bf16 is disabled.  We don't need that
> for all intrinsics, just one would be enough.  (Sorry if you have that
> in another patch, this was the first one I got to.)
> 
> Thanks,
> Richard
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb12110.patch --]
[-- Type: text/x-patch; name="rb12110.patch", Size: 11426 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index a118f4f121de067c0a80f691b852247b0ab27f7a..02b2154cf64dad02cf57b110af51b19dd7f91c51 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -692,3 +692,14 @@
   VAR2 (TERNOP, bfdot, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_lane, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_laneq, 0, v2sf, v4sf)
+
+  /* Implemented by aarch64_bfmmlaqv4sf  */
+  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
+
+  /* Implemented by aarch64_bfmlal<bt>{_lane{q}}v4sf  */
+  VAR1 (TERNOP, bfmlalb, 0, v4sf)
+  VAR1 (TERNOP, bfmlalt, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, v4sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 97f46f96968a6bc2f93bbc812931537b819b3b19..6ba72d7dc82ed02b5b5001a13ca896ab245a9d41 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7091,3 +7091,42 @@
 }
   [(set_attr "type" "neon_dot<VDQSF:q>")]
 )
+
+;; bfmmla
+(define_insn "aarch64_bfmmlaqv4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
+                   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                 (match_operand:V8BF 3 "register_operand" "w")]
+                    UNSPEC_BFMMLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmmla\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s_q")]
+)
+
+;; bfmlal<bt>
+(define_insn "aarch64_bfmlal<bt>v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:V8BF 3 "register_operand" "w")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmlal<bt>\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s_q")]
+)
+
+(define_insn "aarch64_bfmlal<bt>_lane<q>v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:VBF 3 "register_operand" "w")
+                                  (match_operand:SI 4 "const_int_operand" "n")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+{
+  operands[4] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[4]));
+  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
+}
+  [(set_attr "type" "neon_fp_mla_s_scalar_q")]
+)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 7f05c3f9eca844b0e7b824a191223a4906c825b1..db845a3d2d204d28f0e62fa61927e01dcb15f4a4 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34660,6 +34660,60 @@ vbfdotq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfdot_laneqv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+
+{
+  return __builtin_aarch64_bfmmlaqv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfmlalbv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfmlaltv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		    const int __index)
+{
+  return __builtin_aarch64_bfmlalb_lanev4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		    const int __index)
+{
+  return __builtin_aarch64_bfmlalt_lanev4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		     const int __index)
+{
+  return __builtin_aarch64_bfmlalb_lane_qv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		     const int __index)
+{
+  return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
+}
+
 #pragma GCC pop_options
 
 /* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index fc973086cb91ae0dc54eeeb0b832d522539d7982..a32b21c639c2fe7ce6e432901fb293f196cbfff0 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -808,6 +808,9 @@
     UNSPEC_USDOT	; Used in aarch64-simd.md.
     UNSPEC_SUDOT	; Used in aarch64-simd.md.
     UNSPEC_BFDOT	; Used in aarch64-simd.md.
+    UNSPEC_BFMMLA      ; Used in aarch64-simd.md.
+    UNSPEC_BFMLALB     ; Used in aarch64-simd.md.
+    UNSPEC_BFMLALT     ; Used in aarch64-simd.md.
 ])
 
 ;; ------------------------------------------------------------------
@@ -2553,6 +2556,9 @@
 
 (define_int_iterator SVE_PITER [UNSPEC_PFIRST UNSPEC_PNEXT])
 
+(define_int_iterator BF_MLA [UNSPEC_BFMLALB
+                            UNSPEC_BFMLALT])
+
 ;; Iterators for atomic operations.
 
 (define_int_iterator ATOMIC_LDOP
@@ -2793,6 +2799,8 @@
 (define_int_attr ab [(UNSPEC_CLASTA "a") (UNSPEC_CLASTB "b")
 		     (UNSPEC_LASTA "a") (UNSPEC_LASTB "b")])
 
+(define_int_attr bt [(UNSPEC_BFMLALB "b") (UNSPEC_BFMLALT "t")])
+
 (define_int_attr addsub [(UNSPEC_SHADD "add")
 			 (UNSPEC_UHADD "add")
 			 (UNSPEC_SRHADD "add")
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..9feb7ee7905cb14037427a36797fc67a6fa3fbc8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
@@ -0,0 +1,67 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include <arm_neon.h>
+
+/*
+**test_bfmlalb:
+**      bfmlalb\tv0.4s, v1.8h, v2.8h
+**      ret
+*/
+float32x4_t test_bfmlalb (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlalbq_f32 (r, a, b);
+}
+
+/*
+**test_bfmlalt:
+**      bfmlalt\tv0.4s, v1.8h, v2.8h
+**      ret
+*/
+float32x4_t test_bfmlalt (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlaltq_f32 (r, a, b);
+}
+
+/*
+**test_bfmlalb_lane:
+**      bfmlalb\tv0.4s, v1.8h, v2.h[0]
+**      ret
+*/
+float32x4_t test_bfmlalb_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return vbfmlalbq_lane_f32 (r, a, b, 0);
+}
+
+/*
+**test_bfmlalt_lane:
+**      bfmlalt\tv0.4s, v1.8h, v2.h[2]
+**      ret
+*/
+float32x4_t test_bfmlalt_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return vbfmlaltq_lane_f32 (r, a, b, 2);
+}
+
+/*
+**test_bfmlalb_laneq:
+**      bfmlalb\tv0.4s, v1.8h, v2.h[4]
+**      ret
+*/
+float32x4_t test_bfmlalb_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlalbq_laneq_f32 (r, a, b, 4);
+}
+
+/*
+**test_bfmlalt_laneq:
+**      bfmlalt\tv0.4s, v1.8h, v2.h[7]
+**      ret
+*/
+float32x4_t test_bfmlalt_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlaltq_laneq_f32 (r, a, b, 7);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..b0a856676e377ac182fafb2b39189451e460789e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include <arm_neon.h>
+
+
+/*
+**test_bfmmla:
+**     bfmmla\tv0.4s, v1.8h, v2.8h
+**     ret
+*/
+float32x4_t test_bfmmla (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
+{
+  return vbfmmlaq_f32 (r, x, y);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..4d50ba3a3814cb6fe8a768bdf6e13a4207cf585a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+
+#include <arm_neon.h>
+
+void
+f_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlaltq_lane_f32 (r, a, b, -1);
+  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlaltq_lane_f32 (r, a, b, 4);
+  return;
+}
+
+void
+f_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlaltq_laneq_f32 (r, a, b, -1);
+  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlaltq_laneq_f32 (r, a, b, 8);
+  return;
+}
+
+void
+f_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  /* { dg-error "lane -2 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlalbq_lane_f32 (r, a, b, -2);
+  /* { dg-error "lane 5 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlalbq_lane_f32 (r, a, b, 5);
+  return;
+}
+
+void
+f_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  /* { dg-error "lane -2 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlalbq_laneq_f32 (r, a, b, -2);
+  /* { dg-error "lane 9 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlalbq_laneq_f32 (r, a, b, 9);
+  return;
+}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD
  2020-01-31 14:51   ` Delia Burduv
@ 2020-01-31 16:23     ` Richard Sandiford
  2020-01-31 17:00       ` Delia Burduv
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2020-01-31 16:23 UTC (permalink / raw)
  To: Delia Burduv
  Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Delia Burduv <Delia.Burduv@arm.com> writes:
> Thank you, Richard!
>
> Here is the updated patch. The test that checks for errors when bf16 is 
> disabled is in the bfcvt patch.

Looks good.  Just a couple of very minor things...

>
> Cheers,
> Delia
>
> gcc/ChangeLog:
>
> 2019-11-06  Delia Burduv  <delia.burduv@arm.com>
>
>          * config/aarch64/aarch64-simd-builtins.def
>          (bfcvtn): New built-in function.
>          (bfcvtn_q): New built-in function.
>          (bfcvtn2): New built-in function.
>          (bfcvt): New built-in function.
>          * config/aarch64/aarch64-simd.md
>          (aarch64_bfcvtn<q><mode>): New pattern.
>          (aarch64_bfcvtn2v8bf): New pattern.
>          (aarch64_bfcvtbf): New pattern.
>          * config/aarch64/arm_bf16.h (float32_t): New typedef.
>          (vcvth_bf16_f32): New intrinsic.
>          * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
>          (vcvtq_low_bf16_f32): New intrinsic.
>          (vcvtq_high_bf16_f32): New intrinsic.
>          * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
>          (UNSPEC_BFCVTN): New UNSPEC.
>          (UNSPEC_BFCVTN2): New UNSPEC.
>          (UNSPEC_BFCVT): New UNSPEC.
>          * config/arm/types.md (bf_cvt): New type.

The patch no longer changes types.md. :-)

> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..9feb7ee7905cb14037427a36797fc67a6fa3fbc8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
> @@ -0,0 +1,67 @@
> +/* { dg-do assemble { target { aarch64*-*-* } } } */
> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
> +/* { dg-add-options arm_v8_2a_bf16_neon } */
> +/* { dg-additional-options "-save-temps" } */
> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
> +
> +#include <arm_neon.h>
> +
> +/*
> +**test_bfmlalb:
> +**      bfmlalb\tv0.4s, v1.8h, v2.8h

This version uses \t while the previous one used literal tabs.
TBH I think the literal tab is nicer (and what we use for SVE FWIW).

OK with those changes, thanks.  Seems silly to ask when the changes
are so trivial, but: please could you post an updated patch so that
I can apply verbatim?

Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD
  2020-01-31 16:23     ` Richard Sandiford
@ 2020-01-31 17:00       ` Delia Burduv
  2020-02-06 16:42         ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Delia Burduv @ 2020-01-31 17:00 UTC (permalink / raw)
  To: gcc-patches, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov,
	Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 2671 bytes --]

Sure, here it is. I'll do that for the other patch too.

Thanks,
Delia

On 1/31/20 3:37 PM, Richard Sandiford wrote:
> Delia Burduv <Delia.Burduv@arm.com> writes:
>> Thank you, Richard!
>>
>> Here is the updated patch. The test that checks for errors when bf16 is
>> disabled is in the bfcvt patch.
> 
> Looks good.  Just a couple of very minor things...
> 
>>
>> Cheers,
>> Delia
>>
>> gcc/ChangeLog:
>>
>> 2019-11-06  Delia Burduv  <delia.burduv@arm.com>
>>
>>           * config/aarch64/aarch64-simd-builtins.def
>>           (bfcvtn): New built-in function.
>>           (bfcvtn_q): New built-in function.
>>           (bfcvtn2): New built-in function.
>>           (bfcvt): New built-in function.
>>           * config/aarch64/aarch64-simd.md
>>           (aarch64_bfcvtn<q><mode>): New pattern.
>>           (aarch64_bfcvtn2v8bf): New pattern.
>>           (aarch64_bfcvtbf): New pattern.
>>           * config/aarch64/arm_bf16.h (float32_t): New typedef.
>>           (vcvth_bf16_f32): New intrinsic.
>>           * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
>>           (vcvtq_low_bf16_f32): New intrinsic.
>>           (vcvtq_high_bf16_f32): New intrinsic.
>>           * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
>>           (UNSPEC_BFCVTN): New UNSPEC.
>>           (UNSPEC_BFCVTN2): New UNSPEC.
>>           (UNSPEC_BFCVT): New UNSPEC.
>>           * config/arm/types.md (bf_cvt): New type.
> 
> The patch no longer changes types.md. :-)
> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..9feb7ee7905cb14037427a36797fc67a6fa3fbc8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>> @@ -0,0 +1,67 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +/* { dg-additional-options "-save-temps" } */
>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>> +
>> +#include <arm_neon.h>
>> +
>> +/*
>> +**test_bfmlalb:
>> +**      bfmlalb\tv0.4s, v1.8h, v2.8h
> 
> This version uses \t while the previous one used literal tabs.
> TBH I think the literal tab is nicer (and what we use for SVE FWIW).
> 
> OK with those changes, thanks.  Seems silly to ask when the changes
> are so trivial, but: please could you post an updated patch so that
> I can apply verbatim?
> 
> Richard
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb12110(1).patch --]
[-- Type: text/x-patch; name="rb12110(1).patch", Size: 11419 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index a118f4f121de067c0a80f691b852247b0ab27f7a..02b2154cf64dad02cf57b110af51b19dd7f91c51 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -692,3 +692,14 @@
   VAR2 (TERNOP, bfdot, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_lane, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_laneq, 0, v2sf, v4sf)
+
+  /* Implemented by aarch64_bfmmlaqv4sf  */
+  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
+
+  /* Implemented by aarch64_bfmlal<bt>{_lane{q}}v4sf  */
+  VAR1 (TERNOP, bfmlalb, 0, v4sf)
+  VAR1 (TERNOP, bfmlalt, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, v4sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 97f46f96968a6bc2f93bbc812931537b819b3b19..6ba72d7dc82ed02b5b5001a13ca896ab245a9d41 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7091,3 +7091,42 @@
 }
   [(set_attr "type" "neon_dot<VDQSF:q>")]
 )
+
+;; bfmmla
+(define_insn "aarch64_bfmmlaqv4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
+                   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                 (match_operand:V8BF 3 "register_operand" "w")]
+                    UNSPEC_BFMMLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmmla\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s_q")]
+)
+
+;; bfmlal<bt>
+(define_insn "aarch64_bfmlal<bt>v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:V8BF 3 "register_operand" "w")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmlal<bt>\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s_q")]
+)
+
+(define_insn "aarch64_bfmlal<bt>_lane<q>v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:VBF 3 "register_operand" "w")
+                                  (match_operand:SI 4 "const_int_operand" "n")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+{
+  operands[4] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[4]));
+  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
+}
+  [(set_attr "type" "neon_fp_mla_s_scalar_q")]
+)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 7f05c3f9eca844b0e7b824a191223a4906c825b1..db845a3d2d204d28f0e62fa61927e01dcb15f4a4 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34660,6 +34660,60 @@ vbfdotq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfdot_laneqv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+
+{
+  return __builtin_aarch64_bfmmlaqv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfmlalbv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfmlaltv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		    const int __index)
+{
+  return __builtin_aarch64_bfmlalb_lanev4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		    const int __index)
+{
+  return __builtin_aarch64_bfmlalt_lanev4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		     const int __index)
+{
+  return __builtin_aarch64_bfmlalb_lane_qv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		     const int __index)
+{
+  return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
+}
+
 #pragma GCC pop_options
 
 /* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index fc973086cb91ae0dc54eeeb0b832d522539d7982..a32b21c639c2fe7ce6e432901fb293f196cbfff0 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -808,6 +808,9 @@
     UNSPEC_USDOT	; Used in aarch64-simd.md.
     UNSPEC_SUDOT	; Used in aarch64-simd.md.
     UNSPEC_BFDOT	; Used in aarch64-simd.md.
+    UNSPEC_BFMMLA      ; Used in aarch64-simd.md.
+    UNSPEC_BFMLALB     ; Used in aarch64-simd.md.
+    UNSPEC_BFMLALT     ; Used in aarch64-simd.md.
 ])
 
 ;; ------------------------------------------------------------------
@@ -2553,6 +2556,9 @@
 
 (define_int_iterator SVE_PITER [UNSPEC_PFIRST UNSPEC_PNEXT])
 
+(define_int_iterator BF_MLA [UNSPEC_BFMLALB
+                            UNSPEC_BFMLALT])
+
 ;; Iterators for atomic operations.
 
 (define_int_iterator ATOMIC_LDOP
@@ -2793,6 +2799,8 @@
 (define_int_attr ab [(UNSPEC_CLASTA "a") (UNSPEC_CLASTB "b")
 		     (UNSPEC_LASTA "a") (UNSPEC_LASTB "b")])
 
+(define_int_attr bt [(UNSPEC_BFMLALB "b") (UNSPEC_BFMLALT "t")])
+
 (define_int_attr addsub [(UNSPEC_SHADD "add")
 			 (UNSPEC_UHADD "add")
 			 (UNSPEC_SRHADD "add")
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..9810e4ba37444fe08425c1cceae086860d962453
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
@@ -0,0 +1,67 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include <arm_neon.h>
+
+/*
+**test_bfmlalb:
+**      bfmlalb	v0.4s, v1.8h, v2.8h
+**      ret
+*/
+float32x4_t test_bfmlalb (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlalbq_f32 (r, a, b);
+}
+
+/*
+**test_bfmlalt:
+**      bfmlalt	v0.4s, v1.8h, v2.8h
+**      ret
+*/
+float32x4_t test_bfmlalt (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlaltq_f32 (r, a, b);
+}
+
+/*
+**test_bfmlalb_lane:
+**      bfmlalb	v0.4s, v1.8h, v2.h[0]
+**      ret
+*/
+float32x4_t test_bfmlalb_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return vbfmlalbq_lane_f32 (r, a, b, 0);
+}
+
+/*
+**test_bfmlalt_lane:
+**      bfmlalt	v0.4s, v1.8h, v2.h[2]
+**      ret
+*/
+float32x4_t test_bfmlalt_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return vbfmlaltq_lane_f32 (r, a, b, 2);
+}
+
+/*
+**test_bfmlalb_laneq:
+**      bfmlalb	v0.4s, v1.8h, v2.h[4]
+**      ret
+*/
+float32x4_t test_bfmlalb_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlalbq_laneq_f32 (r, a, b, 4);
+}
+
+/*
+**test_bfmlalt_laneq:
+**      bfmlalt	v0.4s, v1.8h, v2.h[7]
+**      ret
+*/
+float32x4_t test_bfmlalt_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlaltq_laneq_f32 (r, a, b, 7);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..0aaa69f0037fb5ed5c085e76ee0c7eb61e5e8090
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include <arm_neon.h>
+
+
+/*
+**test_bfmmla:
+**     bfmmla	v0.4s, v1.8h, v2.8h
+**     ret
+*/
+float32x4_t test_bfmmla (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
+{
+  return vbfmmlaq_f32 (r, x, y);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..4d50ba3a3814cb6fe8a768bdf6e13a4207cf585a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+
+#include <arm_neon.h>
+
+void
+f_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlaltq_lane_f32 (r, a, b, -1);
+  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlaltq_lane_f32 (r, a, b, 4);
+  return;
+}
+
+void
+f_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlaltq_laneq_f32 (r, a, b, -1);
+  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlaltq_laneq_f32 (r, a, b, 8);
+  return;
+}
+
+void
+f_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  /* { dg-error "lane -2 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlalbq_lane_f32 (r, a, b, -2);
+  /* { dg-error "lane 5 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlalbq_lane_f32 (r, a, b, 5);
+  return;
+}
+
+void
+f_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  /* { dg-error "lane -2 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlalbq_laneq_f32 (r, a, b, -2);
+  /* { dg-error "lane 9 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlalbq_laneq_f32 (r, a, b, 9);
+  return;
+}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD
  2020-01-31 17:00       ` Delia Burduv
@ 2020-02-06 16:42         ` Richard Sandiford
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Sandiford @ 2020-02-06 16:42 UTC (permalink / raw)
  To: Delia Burduv
  Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Delia Burduv <Delia.Burduv@arm.com> writes:
> Sure, here it is. I'll do that for the other patch too.

Thanks, belatedly pushed as f78335df69993a900512f92324cab6a20b1bde0c.
Sorry for the delay.

Richard

>
> Thanks,
> Delia
>
> On 1/31/20 3:37 PM, Richard Sandiford wrote:
>> Delia Burduv <Delia.Burduv@arm.com> writes:
>>> Thank you, Richard!
>>>
>>> Here is the updated patch. The test that checks for errors when bf16 is
>>> disabled is in the bfcvt patch.
>> 
>> Looks good.  Just a couple of very minor things...
>> 
>>>
>>> Cheers,
>>> Delia
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-06  Delia Burduv  <delia.burduv@arm.com>
>>>
>>>           * config/aarch64/aarch64-simd-builtins.def
>>>           (bfcvtn): New built-in function.
>>>           (bfcvtn_q): New built-in function.
>>>           (bfcvtn2): New built-in function.
>>>           (bfcvt): New built-in function.
>>>           * config/aarch64/aarch64-simd.md
>>>           (aarch64_bfcvtn<q><mode>): New pattern.
>>>           (aarch64_bfcvtn2v8bf): New pattern.
>>>           (aarch64_bfcvtbf): New pattern.
>>>           * config/aarch64/arm_bf16.h (float32_t): New typedef.
>>>           (vcvth_bf16_f32): New intrinsic.
>>>           * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
>>>           (vcvtq_low_bf16_f32): New intrinsic.
>>>           (vcvtq_high_bf16_f32): New intrinsic.
>>>           * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
>>>           (UNSPEC_BFCVTN): New UNSPEC.
>>>           (UNSPEC_BFCVTN2): New UNSPEC.
>>>           (UNSPEC_BFCVT): New UNSPEC.
>>>           * config/arm/types.md (bf_cvt): New type.
>> 
>> The patch no longer changes types.md. :-)
>> 
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..9feb7ee7905cb14037427a36797fc67a6fa3fbc8
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>>> @@ -0,0 +1,67 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>>> +/* { dg-additional-options "-save-temps" } */
>>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>>> +
>>> +#include <arm_neon.h>
>>> +
>>> +/*
>>> +**test_bfmlalb:
>>> +**      bfmlalb\tv0.4s, v1.8h, v2.8h
>> 
>> This version uses \t while the previous one used literal tabs.
>> TBH I think the literal tab is nicer (and what we use for SVE FWIW).
>> 
>> OK with those changes, thanks.  Seems silly to ask when the changes
>> are so trivial, but: please could you post an updated patch so that
>> I can apply verbatim?
>> 
>> Richard
>> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD
@ 2019-12-20 18:42 Delia Burduv
  0 siblings, 0 replies; 6+ messages in thread
From: Delia Burduv @ 2019-12-20 18:42 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Earnshaw, Richard Sandiford, Marcus Shawcroft, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 2299 bytes --]

This patch adds the ARMv8.6 ACLE intrinsics for bfmmla, bfmlalb and 
bfmlalt as part of the BFloat16 extension.
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
The intrinsics are declared in arm_neon.h and the RTL patterns are 
defined in aarch64-simd.md.
Two new tests are added to check assembler output.

This patch depends on the two Aarch64 back-end patches. 
(https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html and 
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html)

Tested for regression on aarch64-none-elf and aarch64_be-none-elf. I 
don't have commit rights, so if this is ok can someone please commit it 
for me?

gcc/ChangeLog:

2019-10-29  Delia Burduv  <delia.burduv@arm.com>

         * config/aarch64/aarch64-simd-builtins.def
           (bfmmla): New built-in function.
           (bfmlalb): New built-in function.
           (bfmlalt): New built-in function.
           (bfmlalb_lane): New built-in function.
           (bfmlalt_lane): New built-in function.
           (bfmlalb_laneq): New built-in function.
           (bfmlalt_laneq): New built-in function
         * config/aarch64/aarch64-simd.md (bfmmla): New pattern.
           (bfmlal<bt>): New patterns.
         * config/aarch64/arm_neon.h (vbfmmlaq_f32): New intrinsic.
           (vbfmlalbq_f32): New intrinsic.
           (vbfmlaltq_f32): New intrinsic.
           (vbfmlalbq_lane_f32): New intrinsic.
           (vbfmlaltq_lane_f32): New intrinsic.
           (vbfmlalbq_laneq_f32): New intrinsic.
           (vbfmlaltq_laneq_f32): New intrinsic.
         * config/aarch64/iterators.md (UNSPEC_BFMMLA): New UNSPEC.
           (UNSPEC_BFMLALB): New UNSPEC.
           (UNSPEC_BFMLALT): New UNSPEC.
           (BF_MLA): New int iterator.
           (bt): Added UNSPEC_BFMLALB, UNSPEC_BFMLALT.
         * config/arm/types.md (bf_mmla): New type.
           (bf_mla): New type.

gcc/testsuite/ChangeLog:

2019-10-29  Delia Burduv  <delia.burduv@arm.com>

         * gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c: New 
test.
         * gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c: New test.
         * 
gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c:
           New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb12110.patch --]
[-- Type: text/x-patch; name="rb12110.patch", Size: 12808 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index f4ca35a59704c761fe2ac2b6d401fff7c8aba80d..5e9f50f090870d0c63916540a48f5ac132d2630d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -682,3 +682,14 @@
   BUILTIN_VSFDF (UNOP, frint32x, 0)
   BUILTIN_VSFDF (UNOP, frint64z, 0)
   BUILTIN_VSFDF (UNOP, frint64x, 0)
+
+  /* Implemented by aarch64_bfmmlaqv4sf  */
+  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
+
+  /* Implemented by aarch64_bfmlal<bt>{_lane{q}}v4sf  */
+  VAR1 (TERNOP, bfmlalb, 0, v4sf)
+  VAR1 (TERNOP, bfmlalt, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_laneq, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_laneq, 0, v4sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 55660ae248f4fa75d35ba2949cd4b9d5139ff5f5..66a6c4116a1fdd26dd4eec8b0609e28eb2c38fa1 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7027,3 +7027,57 @@
   "xtn\t%0.<Vntype>, %1.<Vtype>"
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
+
+;; bfmmla
+(define_insn "aarch64_bfmmlaqv4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
+                   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                 (match_operand:V8BF 3 "register_operand" "w")]
+                    UNSPEC_BFMMLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmmla\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_mla_s_q")]
+)
+
+;; bfmlal<bt>
+(define_insn "aarch64_bfmlal<bt>v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:V8BF 3 "register_operand" "w")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmlal<bt>\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s")]
+)
+
+(define_insn "aarch64_bfmlal<bt>_lanev4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:V4BF 3 "register_operand" "w")
+                                  (match_operand:SI 4 "const_int_operand" "n")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+{
+  operands[4] = aarch64_endian_lane_rtx (V4BFmode, INTVAL (operands[4]));
+  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
+}
+  [(set_attr "type" "neon_fp_mla_s")]
+)
+
+(define_insn "aarch64_bfmlal<bt>_laneqv4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:V8BF 3 "register_operand" "w")
+                                  (match_operand:SI 4 "const_int_operand" "n")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+{
+  operands[4] = aarch64_endian_lane_rtx (V8BFmode, INTVAL (operands[4]));
+  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
+}
+  [(set_attr "type" "neon_fp_mla_s")]
+)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 6cdbf381f0156ed993f03b847228b36ebbdd14f8..9001c63b0d44e7ad699ace097b9259681b691033 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34610,6 +34610,70 @@ vrnd64xq_f64 (float64x2_t __a)
 
 #include "arm_bf16.h"
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+#ifdef __ARM_FEATURE_BF16_VECTOR_ARITHMETIC
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 \
+      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+
+{
+  return __builtin_aarch64_bfmmlaqv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 \
+      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfmlalbv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_f32 \
+      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfmlaltv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_lane_f32 \
+      (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, const int __index)
+{
+  return __builtin_aarch64_bfmlalb_lanev4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_lane_f32 \
+      (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, const int __index)
+{
+  return __builtin_aarch64_bfmlalt_lanev4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_laneq_f32 \
+      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, const int __index)
+{
+  return __builtin_aarch64_bfmlalb_laneqv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_laneq_f32 \
+      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b, const int __index)
+{
+  return __builtin_aarch64_bfmlalt_laneqv4sf (__r, __a, __b, __index);
+}
+
+#endif
+#pragma GCC pop_options
+
 #pragma GCC pop_options
 
 #undef __aarch64_vget_lane_any
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 931166da5e47302afe810498eea9c8c2ab89b9de..e7ca2faba0e1ef5d59ac658eaa4d788b00b8b587 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -673,6 +673,9 @@
     UNSPEC_UMULHS	; Used in aarch64-sve2.md.
     UNSPEC_UMULHRS	; Used in aarch64-sve2.md.
     UNSPEC_ASRD		; Used in aarch64-sve.md.
+    UNSPEC_BFMMLA	; Used in aarch64-simd.md.
+    UNSPEC_BFMLALB	; Used in aarch64-simd.md.
+    UNSPEC_BFMLALT	; Used in aarch64-simd.md.
 ])
 
 ;; ------------------------------------------------------------------
@@ -2127,6 +2130,9 @@
 
 (define_int_iterator SVE_PITER [UNSPEC_PFIRST UNSPEC_PNEXT])
 
+(define_int_iterator BF_MLA [UNSPEC_BFMLALB
+			     UNSPEC_BFMLALT])
+
 ;; Iterators for atomic operations.
 
 (define_int_iterator ATOMIC_LDOP
@@ -2342,7 +2348,8 @@
 		    (UNSPEC_SRHADD "") (UNSPEC_URHADD "u")])
 
 (define_int_attr bt [(UNSPEC_SMULLB "b") (UNSPEC_UMULLB "b")
-		     (UNSPEC_SMULLT "t") (UNSPEC_UMULLT "t")])
+		     (UNSPEC_SMULLT "t") (UNSPEC_UMULLT "t")
+		     (UNSPEC_BFMLALB "b") (UNSPEC_BFMLALT "t")])
 
 (define_int_attr fn [(UNSPEC_LDFF1 "f") (UNSPEC_LDNF1 "n")])
 
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index df39522f2ad63a52c910b1a6bcc7aa13aaf5d021..2f5ada97991abc88cc74f4768eb395b2b757ee26 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -550,6 +550,10 @@
 ; The classification below is for TME instructions
 ;
 ; tme
+;
+; The classification below is for BFloat16 widening multiply-add
+;
+; bf_mla
 
 (define_attr "type"
  "adc_imm,\
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..11558be667c65228529ead90628604cba0bbd044
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
@@ -0,0 +1,73 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include <arm_neon.h>
+
+/*
+**test_bfmlalb:
+**      ...
+**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
+**      ...
+*/
+float32x4_t test_bfmlalb (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlalbq_f32 (r, a, b);
+}
+
+/*
+**test_bfmlalt:
+**      ...
+**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
+**      ...
+*/
+float32x4_t test_bfmlalt (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlaltq_f32 (r, a, b);
+}
+
+/*
+**test_bfmlalb_lane:
+**      ...
+**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[0\]
+**      ...
+*/
+float32x4_t test_bfmlalb_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return vbfmlalbq_lane_f32 (r, a, b, 0);
+}
+
+/*
+**test_bfmlalt_lane:
+**      ...
+**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[2\]
+**      ...
+*/
+float32x4_t test_bfmlalt_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return vbfmlaltq_lane_f32 (r, a, b, 2);
+}
+
+/*
+**test_bfmlalb_laneq:
+**      ...
+**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[4\]
+**      ...
+*/
+float32x4_t test_bfmlalb_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlalbq_laneq_f32 (r, a, b, 4);
+}
+
+/*
+**test_bfmlalt_laneq:
+**      ...
+**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[7\]
+**      ...
+*/
+float32x4_t test_bfmlalt_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlaltq_laneq_f32 (r, a, b, 7);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..b12cf47d67a33f13967738b48a4984765c0ff2df
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
@@ -0,0 +1,19 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include <arm_neon.h>
+
+
+/*
+**test_bfmmla:
+**	...
+**	bfmmla	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
+**	...
+*/
+float32x4_t test_bfmmla (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
+{
+  return vbfmmlaq_f32 (r, x, y);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..4a8a9b64c04b39f3cd95101326022f67326921f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+
+#include <arm_neon.h>
+
+void
+f_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 34655 } */
+  vbfmlaltq_lane_f32 (r, a, b, -1);
+  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 34655 } */
+  vbfmlaltq_lane_f32 (r, a, b, 4);
+  return;
+}
+
+void
+f_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 34671 } */
+  vbfmlaltq_laneq_f32 (r, a, b, -1);
+  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 34671 } */
+  vbfmlaltq_laneq_f32 (r, a, b, 8);
+  return;
+}
+
+void
+f_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 34647 } */
+  vbfmlalbq_lane_f32 (r, a, b, -1);
+  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 34647 } */
+  vbfmlalbq_lane_f32 (r, a, b, 4);
+  return;
+}
+
+void
+f_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 34663 } */
+  vbfmlalbq_laneq_f32 (r, a, b, -1);
+  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 34663 } */
+  vbfmlalbq_laneq_f32 (r, a, b, 8);
+  return;
+}

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-06 16:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1f271712-6c61-1be2-68bc-d61b51790dd3@arm.com>
2019-12-23 18:30 ` [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD Richard Sandiford
2020-01-31 14:51   ` Delia Burduv
2020-01-31 16:23     ` Richard Sandiford
2020-01-31 17:00       ` Delia Burduv
2020-02-06 16:42         ` Richard Sandiford
2019-12-20 18:42 Delia Burduv

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).