Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Delia Burduv <Delia.Burduv@arm.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	Richard Earnshaw	<Richard.Earnshaw@arm.com>,
	Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
	Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>,
	Richard Sandiford	<Richard.Sandiford@arm.com>
Subject: Re: [GCC][PATCH][AArch64] ACLE intrinsics bfmmla and bfmlal<b/t> for AArch64 AdvSIMD
Date: Fri, 31 Jan 2020 14:51:00 -0000	[thread overview]
Message-ID: <4805ecd2-4e55-1fbf-2705-8c915cf5b404@arm.com> (raw)
In-Reply-To: <mpttv5qkj3g.fsf@arm.com>

[-- Attachment #1: Type: text/plain, Size: 17225 bytes --]


Thank you, Richard!

Here is the updated patch. The test that checks for errors when bf16 is 
disabled is in the bfcvt patch.

Cheers,
Delia

gcc/ChangeLog:

2019-11-06  Delia Burduv  <delia.burduv@arm.com>

         * config/aarch64/aarch64-simd-builtins.def
         (bfcvtn): New built-in function.
         (bfcvtn_q): New built-in function.
         (bfcvtn2): New built-in function.
         (bfcvt): New built-in function.
         * config/aarch64/aarch64-simd.md
         (aarch64_bfcvtn<q><mode>): New pattern.
         (aarch64_bfcvtn2v8bf): New pattern.
         (aarch64_bfcvtbf): New pattern.
         * config/aarch64/arm_bf16.h (float32_t): New typedef.
         (vcvth_bf16_f32): New intrinsic.
         * config/aarch64/arm_bf16.h (vcvt_bf16_f32): New intrinsic.
         (vcvtq_low_bf16_f32): New intrinsic.
         (vcvtq_high_bf16_f32): New intrinsic.
         * config/aarch64/iterators.md (V4SF_TO_BF): New mode iterator.
         (UNSPEC_BFCVTN): New UNSPEC.
         (UNSPEC_BFCVTN2): New UNSPEC.
         (UNSPEC_BFCVT): New UNSPEC.
         * config/arm/types.md (bf_cvt): New type.


gcc/testsuite/ChangeLog:

2019-11-06  Delia Burduv  <delia.burduv@arm.com>

         * gcc.target/aarch64/advsimd-intrinsics/bfcvt-compile.c: New
	test.
         * gcc.target/aarch64/advsimd-intrinsics/bfcvt-nobf16.c: New
	test.
         * gcc.target/aarch64/advsimd-intrinsics/bfcvt-nosimd.c: New
	test.
         * gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: New
	test.


On 12/23/19 6:11 PM, Richard Sandiford wrote:
> Thanks for the patch, looks good.
> 
> Delia Burduv <Delia.Burduv@arm.com> writes:
>> This patch adds the ARMv8.6 ACLE intrinsics for bfmmla, bfmlalb and bfmlalt as part of the BFloat16 extension.
>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
>> The intrinsics are declared in arm_neon.h and the RTL patterns are defined in aarch64-simd.md.
>> Two new tests are added to check assembler output.
>>
>> This patch depends on the two Aarch64 back-end patches. (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html and https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html)
>>
>> Tested for regression on aarch64-none-elf and aarch64_be-none-elf. I don't have commit rights, so if this is ok can someone please commit it for me?
>>
>> gcc/ChangeLog:
>>
>> 2019-10-29  Delia Burduv  <delia.burduv@arm.com>
>>
>>          * config/aarch64/aarch64-simd-builtins.def
>>            (bfmmla): New built-in function.
>>            (bfmlalb): New built-in function.
>>            (bfmlalt): New built-in function.
>>            (bfmlalb_lane): New built-in function.
>>            (bfmlalt_lane): New built-in function.
>>            (bfmlalb_laneq): New built-in function.
>>            (bfmlalt_laneq): New built-in function
>>          * config/aarch64/aarch64-simd.md (bfmmla): New pattern.
>>            (bfmlal<bt>): New patterns.
>>          * config/aarch64/arm_neon.h (vbfmmlaq_f32): New intrinsic.
>>            (vbfmlalbq_f32): New intrinsic.
>>            (vbfmlaltq_f32): New intrinsic.
>>            (vbfmlalbq_lane_f32): New intrinsic.
>>            (vbfmlaltq_lane_f32): New intrinsic.
>>            (vbfmlalbq_laneq_f32): New intrinsic.
>>            (vbfmlaltq_laneq_f32): New intrinsic.
>>          * config/aarch64/iterators.md (UNSPEC_BFMMLA): New UNSPEC.
>>            (UNSPEC_BFMLALB): New UNSPEC.
>>            (UNSPEC_BFMLALT): New UNSPEC.
>>            (BF_MLA): New int iterator.
>>            (bt): Added UNSPEC_BFMLALB, UNSPEC_BFMLALT.
>>          * config/arm/types.md (bf_mmla): New type.
>>            (bf_mla): New type.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-29  Delia Burduv  <delia.burduv@arm.com>
>>
>>          * gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c: New test.
>>          * gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c: New test.
>>          * gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c:
>>            New test.
> 
> Formatting nit: continuation lines should only be indented by a tab,
> rather than a tab and two spaces.  (I agree the above looks nicer,
> but the policy is not to be flexible over this kind of thing...)
> 
>> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
>> index f4ca35a59704c761fe2ac2b6d401fff7c8aba80d..5e9f50f090870d0c63916540a48f5ac132d2630d 100644
>> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
>> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
>> @@ -682,3 +682,14 @@
>>     BUILTIN_VSFDF (UNOP, frint32x, 0)
>>     BUILTIN_VSFDF (UNOP, frint64z, 0)
>>     BUILTIN_VSFDF (UNOP, frint64x, 0)
>> +
>> +  /* Implemented by aarch64_bfmmlaqv4sf  */
>> +  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
>> +
>> +  /* Implemented by aarch64_bfmlal<bt>{_lane{q}}v4sf  */
>> +  VAR1 (TERNOP, bfmlalb, 0, v4sf)
>> +  VAR1 (TERNOP, bfmlalt, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalb_laneq, 0, v4sf)
>> +  VAR1 (QUADOP_LANE, bfmlalt_laneq, 0, v4sf)
>> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
>> index 55660ae248f4fa75d35ba2949cd4b9d5139ff5f5..66a6c4116a1fdd26dd4eec8b0609e28eb2c38fa1 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -7027,3 +7027,57 @@
>>     "xtn\t%0.<Vntype>, %1.<Vtype>"
>>     [(set_attr "type" "neon_shift_imm_narrow_q")]
>>   )
>> +
>> +;; bfmmla
>> +(define_insn "aarch64_bfmmlaqv4sf"
>> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
>> +        (plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
>> +                   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
>> +                                 (match_operand:V8BF 3 "register_operand" "w")]
>> +                    UNSPEC_BFMMLA)))]
>> +  "TARGET_BF16_SIMD"
>> +  "bfmmla\\t%0.4s, %2.8h, %3.8h"
>> +  [(set_attr "type" "neon_mla_s_q")]
> 
> Looks like this should be neon_fp_mla_s_q instead.
> 
>> +)
>> +
>> +;; bfmlal<bt>
>> +(define_insn "aarch64_bfmlal<bt>v4sf"
>> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
>> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
>> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
>> +                                  (match_operand:V8BF 3 "register_operand" "w")]
>> +                     BF_MLA)))]
>> +  "TARGET_BF16_SIMD"
>> +  "bfmlal<bt>\\t%0.4s, %2.8h, %3.8h"
>> +  [(set_attr "type" "neon_fp_mla_s")]
>> +)
> 
> Maybe we should have _q here too.  All the vectors are 128-bit vectors,
> we just happen to ignore even or odd elements of the BF inputs.
> 
>> +(define_insn "aarch64_bfmlal<bt>_lanev4sf"
>> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
>> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
>> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
>> +                                  (match_operand:V4BF 3 "register_operand" "w")
>> +                                  (match_operand:SI 4 "const_int_operand" "n")]
>> +                     BF_MLA)))]
>> +  "TARGET_BF16_SIMD"
>> +{
>> +  operands[4] = aarch64_endian_lane_rtx (V4BFmode, INTVAL (operands[4]));
>> +  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
>> +}
>> +  [(set_attr "type" "neon_fp_mla_s")]
>> +)
> 
> IIUC, these should be neon_fp_mla_s_scalar_q, but I might have misunderstood
> the naming scheme.
> 
>> +(define_insn "aarch64_bfmlal<bt>_laneqv4sf"
>> +  [(set (match_operand:V4SF 0 "register_operand" "=w")
>> +        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
>> +                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
>> +                                  (match_operand:V8BF 3 "register_operand" "w")
>> +                                  (match_operand:SI 4 "const_int_operand" "n")]
>> +                     BF_MLA)))]
>> +  "TARGET_BF16_SIMD"
>> +{
>> +  operands[4] = aarch64_endian_lane_rtx (V8BFmode, INTVAL (operands[4]));
>> +  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
>> +}
>> +  [(set_attr "type" "neon_fp_mla_s")]
>> +)
> 
> These can be combined into a single pattern by using a mode iterator for
> V4BF/V8BF.
> 
>> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
>> index 6cdbf381f0156ed993f03b847228b36ebbdd14f8..9001c63b0d44e7ad699ace097b9259681b691033 100644
>> --- a/gcc/config/aarch64/arm_neon.h
>> +++ b/gcc/config/aarch64/arm_neon.h
>> @@ -34610,6 +34610,70 @@ vrnd64xq_f64 (float64x2_t __a)
>>   
>>   #include "arm_bf16.h"
>>   
>> +#pragma GCC push_options
>> +#pragma GCC target ("arch=armv8.2-a+bf16")
>> +#ifdef __ARM_FEATURE_BF16_VECTOR_ARITHMETIC
>> +
>> +__extension__ extern __inline float32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vbfmmlaq_f32 \
>> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
>> +
>> +{
> 
> Formatting nits: should be:
> 
> vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
> {
> 
> which no backslash, line break or blank line.
> 
>> +  return __builtin_aarch64_bfmmlaqv4sf (__r, __a, __b);
>> +}
>> +
>> +__extension__ extern __inline float32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vbfmlalbq_f32 \
>> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
>> +{
>> +  return __builtin_aarch64_bfmlalbv4sf (__r, __a, __b);
>> +}
>> +
>> +__extension__ extern __inline float32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vbfmlaltq_f32 \
>> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
>> +{
>> +  return __builtin_aarch64_bfmlaltv4sf (__r, __a, __b);
>> +}
> 
> Same for these.
> 
>> +
>> +__extension__ extern __inline float32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vbfmlalbq_lane_f32 \
>> +      (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b, const int __index)
>> +{
> 
> Here it's probably better to format as:
> 
> vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
> 		    const int __index)
> {
> 
> Same for the rest of the file.
> 
>> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
>> index df39522f2ad63a52c910b1a6bcc7aa13aaf5d021..2f5ada97991abc88cc74f4768eb395b2b757ee26 100644
>> --- a/gcc/config/arm/types.md
>> +++ b/gcc/config/arm/types.md
>> @@ -550,6 +550,10 @@
>>   ; The classification below is for TME instructions
>>   ;
>>   ; tme
>> +;
>> +; The classification below is for BFloat16 widening multiply-add
>> +;
>> +; bf_mla
> 
> This doesn't seem to be used by the new define_insns.
> 
>>   
>>   (define_attr "type"
>>    "adc_imm,\
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..11558be667c65228529ead90628604cba0bbd044
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
>> @@ -0,0 +1,73 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +/* { dg-additional-options "-save-temps" } */
>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>> +
>> +#include <arm_neon.h>
>> +
>> +/*
>> +**test_bfmlalb:
>> +**      ...
>> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalb (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  return vbfmlalbq_f32 (r, a, b);
>> +}
>> +
>> +/*
>> +**test_bfmlalt:
>> +**      ...
>> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalt (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  return vbfmlaltq_f32 (r, a, b);
>> +}
>> +
>> +/*
>> +**test_bfmlalb_lane:
>> +**      ...
>> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[0\]
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalb_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
>> +{
>> +  return vbfmlalbq_lane_f32 (r, a, b, 0);
>> +}
>> +
>> +/*
>> +**test_bfmlalt_lane:
>> +**      ...
>> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[2\]
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalt_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
>> +{
>> +  return vbfmlaltq_lane_f32 (r, a, b, 2);
>> +}
>> +
>> +/*
>> +**test_bfmlalb_laneq:
>> +**      ...
>> +**      bfmlalb	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[4\]
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalb_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  return vbfmlalbq_laneq_f32 (r, a, b, 4);
>> +}
>> +
>> +/*
>> +**test_bfmlalt_laneq:
>> +**      ...
>> +**      bfmlalt	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.h\[7\]
>> +**      ...
>> +*/
>> +float32x4_t test_bfmlalt_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  return vbfmlaltq_laneq_f32 (r, a, b, 7);
>> +}
> 
> It might be better to compile these at -O and test for the exact
> input and output registers.  E.g.:
> 
> **test_bfmlalt_laneq:
> **      bfmlalt	v0\.4s, v1\.8h, v2\.h\[7\]
> **      ret
> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..b12cf47d67a33f13967738b48a4984765c0ff2df
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +/* { dg-additional-options "-save-temps" } */
>> +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
>> +
>> +#include <arm_neon.h>
>> +
>> +
>> +/*
>> +**test_bfmmla:
>> +**	...
>> +**	bfmmla	v[0-9]+.4s, v[0-9]+.8h, v[0-9]+.8h
>> +**	...
>> +*/
>> +float32x4_t test_bfmmla (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
>> +{
>> +  return vbfmmlaq_f32 (r, x, y);
>> +}
> 
> Same here.
> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..4a8a9b64c04b39f3cd95101326022f67326921f5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
>> @@ -0,0 +1,46 @@
>> +/* { dg-do compile { target { aarch64*-*-* } } } */
>> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon } */
>> +
>> +#include <arm_neon.h>
>> +
>> +void
>> +f_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
>> +{
>> +  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 34655 } */
>> +  vbfmlaltq_lane_f32 (r, a, b, -1);
>> +  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 34655 } */
>> +  vbfmlaltq_lane_f32 (r, a, b, 4);
>> +  return;
>> +}
>> +
>> +void
>> +f_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 34671 } */
>> +  vbfmlaltq_laneq_f32 (r, a, b, -1);
>> +  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 34671 } */
>> +  vbfmlaltq_laneq_f32 (r, a, b, 8);
>> +  return;
>> +}
>> +
>> +void
>> +f_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
>> +{
>> +  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 34647 } */
>> +  vbfmlalbq_lane_f32 (r, a, b, -1);
>> +  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 34647 } */
>> +  vbfmlalbq_lane_f32 (r, a, b, 4);
>> +  return;
>> +}
>> +
>> +void
>> +f_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
>> +{
>> +  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 34663 } */
>> +  vbfmlalbq_laneq_f32 (r, a, b, -1);
>> +  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 34663 } */
>> +  vbfmlalbq_laneq_f32 (r, a, b, 8);
>> +  return;
>> +}
> 
> It'd better not to hard-code the arm_neon.h line numbers here.
> The other tests use "0" -- does that work here too?
> 
> It'd also be good to have a test that checks for an appropriate error if
> these intrinsics are used when bf16 is disabled.  We don't need that
> for all intrinsics, just one would be enough.  (Sorry if you have that
> in another patch, this was the first one I got to.)
> 
> Thanks,
> Richard
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb12110.patch --]
[-- Type: text/x-patch; name="rb12110.patch", Size: 11426 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index a118f4f121de067c0a80f691b852247b0ab27f7a..02b2154cf64dad02cf57b110af51b19dd7f91c51 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -692,3 +692,14 @@
   VAR2 (TERNOP, bfdot, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_lane, 0, v2sf, v4sf)
   VAR2 (QUADOP_LANE_PAIR, bfdot_laneq, 0, v2sf, v4sf)
+
+  /* Implemented by aarch64_bfmmlaqv4sf  */
+  VAR1 (TERNOP, bfmmlaq, 0, v4sf)
+
+  /* Implemented by aarch64_bfmlal<bt>{_lane{q}}v4sf  */
+  VAR1 (TERNOP, bfmlalb, 0, v4sf)
+  VAR1 (TERNOP, bfmlalt, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalb_lane_q, 0, v4sf)
+  VAR1 (QUADOP_LANE, bfmlalt_lane_q, 0, v4sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 97f46f96968a6bc2f93bbc812931537b819b3b19..6ba72d7dc82ed02b5b5001a13ca896ab245a9d41 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7091,3 +7091,42 @@
 }
   [(set_attr "type" "neon_dot<VDQSF:q>")]
 )
+
+;; bfmmla
+(define_insn "aarch64_bfmmlaqv4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus:V4SF (match_operand:V4SF 1 "register_operand" "0")
+                   (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                 (match_operand:V8BF 3 "register_operand" "w")]
+                    UNSPEC_BFMMLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmmla\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s_q")]
+)
+
+;; bfmlal<bt>
+(define_insn "aarch64_bfmlal<bt>v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:V8BF 3 "register_operand" "w")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+  "bfmlal<bt>\\t%0.4s, %2.8h, %3.8h"
+  [(set_attr "type" "neon_fp_mla_s_q")]
+)
+
+(define_insn "aarch64_bfmlal<bt>_lane<q>v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+        (plus: V4SF (match_operand:V4SF 1 "register_operand" "0")
+                    (unspec:V4SF [(match_operand:V8BF 2 "register_operand" "w")
+                                  (match_operand:VBF 3 "register_operand" "w")
+                                  (match_operand:SI 4 "const_int_operand" "n")]
+                     BF_MLA)))]
+  "TARGET_BF16_SIMD"
+{
+  operands[4] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[4]));
+  return "bfmlal<bt>\\t%0.4s, %2.8h, %3.h[%4]";
+}
+  [(set_attr "type" "neon_fp_mla_s_scalar_q")]
+)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 7f05c3f9eca844b0e7b824a191223a4906c825b1..db845a3d2d204d28f0e62fa61927e01dcb15f4a4 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34660,6 +34660,60 @@ vbfdotq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
   return __builtin_aarch64_bfdot_laneqv4sf (__r, __a, __b, __index);
 }
 
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmmlaq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+
+{
+  return __builtin_aarch64_bfmmlaqv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfmlalbv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfmlaltv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		    const int __index)
+{
+  return __builtin_aarch64_bfmlalb_lanev4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_lane_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x4_t __b,
+		    const int __index)
+{
+  return __builtin_aarch64_bfmlalt_lanev4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlalbq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		     const int __index)
+{
+  return __builtin_aarch64_bfmlalb_lane_qv4sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfmlaltq_laneq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b,
+		     const int __index)
+{
+  return __builtin_aarch64_bfmlalt_lane_qv4sf (__r, __a, __b, __index);
+}
+
 #pragma GCC pop_options
 
 /* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index fc973086cb91ae0dc54eeeb0b832d522539d7982..a32b21c639c2fe7ce6e432901fb293f196cbfff0 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -808,6 +808,9 @@
     UNSPEC_USDOT	; Used in aarch64-simd.md.
     UNSPEC_SUDOT	; Used in aarch64-simd.md.
     UNSPEC_BFDOT	; Used in aarch64-simd.md.
+    UNSPEC_BFMMLA      ; Used in aarch64-simd.md.
+    UNSPEC_BFMLALB     ; Used in aarch64-simd.md.
+    UNSPEC_BFMLALT     ; Used in aarch64-simd.md.
 ])
 
 ;; ------------------------------------------------------------------
@@ -2553,6 +2556,9 @@
 
 (define_int_iterator SVE_PITER [UNSPEC_PFIRST UNSPEC_PNEXT])
 
+(define_int_iterator BF_MLA [UNSPEC_BFMLALB
+                            UNSPEC_BFMLALT])
+
 ;; Iterators for atomic operations.
 
 (define_int_iterator ATOMIC_LDOP
@@ -2793,6 +2799,8 @@
 (define_int_attr ab [(UNSPEC_CLASTA "a") (UNSPEC_CLASTB "b")
 		     (UNSPEC_LASTA "a") (UNSPEC_LASTB "b")])
 
+(define_int_attr bt [(UNSPEC_BFMLALB "b") (UNSPEC_BFMLALT "t")])
+
 (define_int_attr addsub [(UNSPEC_SHADD "add")
 			 (UNSPEC_UHADD "add")
 			 (UNSPEC_SRHADD "add")
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..9feb7ee7905cb14037427a36797fc67a6fa3fbc8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmlalbt-compile.c
@@ -0,0 +1,67 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include <arm_neon.h>
+
+/*
+**test_bfmlalb:
+**      bfmlalb\tv0.4s, v1.8h, v2.8h
+**      ret
+*/
+float32x4_t test_bfmlalb (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlalbq_f32 (r, a, b);
+}
+
+/*
+**test_bfmlalt:
+**      bfmlalt\tv0.4s, v1.8h, v2.8h
+**      ret
+*/
+float32x4_t test_bfmlalt (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlaltq_f32 (r, a, b);
+}
+
+/*
+**test_bfmlalb_lane:
+**      bfmlalb\tv0.4s, v1.8h, v2.h[0]
+**      ret
+*/
+float32x4_t test_bfmlalb_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return vbfmlalbq_lane_f32 (r, a, b, 0);
+}
+
+/*
+**test_bfmlalt_lane:
+**      bfmlalt\tv0.4s, v1.8h, v2.h[2]
+**      ret
+*/
+float32x4_t test_bfmlalt_lane (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  return vbfmlaltq_lane_f32 (r, a, b, 2);
+}
+
+/*
+**test_bfmlalb_laneq:
+**      bfmlalb\tv0.4s, v1.8h, v2.h[4]
+**      ret
+*/
+float32x4_t test_bfmlalb_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlalbq_laneq_f32 (r, a, b, 4);
+}
+
+/*
+**test_bfmlalt_laneq:
+**      bfmlalt\tv0.4s, v1.8h, v2.h[7]
+**      ret
+*/
+float32x4_t test_bfmlalt_laneq (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  return vbfmlaltq_laneq_f32 (r, a, b, 7);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
new file mode 100644
index 0000000000000000000000000000000000000000..b0a856676e377ac182fafb2b39189451e460789e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfmmla-compile.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include <arm_neon.h>
+
+
+/*
+**test_bfmmla:
+**     bfmmla\tv0.4s, v1.8h, v2.8h
+**     ret
+*/
+float32x4_t test_bfmmla (float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
+{
+  return vbfmmlaq_f32 (r, x, y);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..4d50ba3a3814cb6fe8a768bdf6e13a4207cf585a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vbfmlalbt_lane_f32_indices_1.c
@@ -0,0 +1,46 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+
+#include <arm_neon.h>
+
+void
+f_vbfmlaltq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlaltq_lane_f32 (r, a, b, -1);
+  /* { dg-error "lane 4 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlaltq_lane_f32 (r, a, b, 4);
+  return;
+}
+
+void
+f_vbfmlaltq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  /* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlaltq_laneq_f32 (r, a, b, -1);
+  /* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlaltq_laneq_f32 (r, a, b, 8);
+  return;
+}
+
+void
+f_vbfmlalbq_lane_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x4_t b)
+{
+  /* { dg-error "lane -2 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlalbq_lane_f32 (r, a, b, -2);
+  /* { dg-error "lane 5 out of range 0 - 3" "" { target *-*-* } 0 } */
+  vbfmlalbq_lane_f32 (r, a, b, 5);
+  return;
+}
+
+void
+f_vbfmlalbq_laneq_f32 (float32x4_t r, bfloat16x8_t a, bfloat16x8_t b)
+{
+  /* { dg-error "lane -2 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlalbq_laneq_f32 (r, a, b, -2);
+  /* { dg-error "lane 9 out of range 0 - 7" "" { target *-*-* } 0 } */
+  vbfmlalbq_laneq_f32 (r, a, b, 9);
+  return;
+}

next prev parent reply	other threads:[~2020-01-31 14:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1f271712-6c61-1be2-68bc-d61b51790dd3@arm.com>
2019-12-23 18:30 ` Richard Sandiford
2020-01-31 14:51   ` Delia Burduv [this message]
2020-01-31 16:23     ` Richard Sandiford
2020-01-31 17:00       ` Delia Burduv
2020-02-06 16:42         ` Richard Sandiford
2019-12-20 18:42 Delia Burduv

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4805ecd2-4e55-1fbf-2705-8c915cf5b404@arm.com \
    --to=delia.burduv@arm.com \
    --cc=Kyrylo.Tkachov@arm.com \
    --cc=Marcus.Shawcroft@arm.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).