* [GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension @ 2019-12-13 10:23 Stam Markianos-Wright 2019-12-18 13:26 ` [Ping][GCC][PATCH][ARM]Add " Stam Markianos-Wright 0 siblings, 1 reply; 9+ messages in thread From: Stam Markianos-Wright @ 2019-12-13 10:23 UTC (permalink / raw) To: gcc-patches Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 2211 bytes --] Hi all, This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product operations (vector/by element) to the ARM back-end. These are: usdot (vector), <us/su>dot (by element). The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and for ARM they remain optional as of ARMv8.6-a. The functions are declared in arm_neon.h, RTL patterns are defined to generate assembler and tests are added to verify and perform adequate checks. Regression testing on arm-none-eabi passed successfully. This patch depends on: https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html for ARM CLI updates, and on: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html for testsuite effective_target update. Ok for trunk? Cheers, Stam ACLE documents are at https://developer.arm.com/docs/101028/latest ISA documents are at https://developer.arm.com/docs/ddi0596/latest PS. I don't have commit rights, so if someone could commit on my behalf, that would be great :) gcc/ChangeLog: 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com> * config/arm/arm-builtins.c (enum arm_type_qualifiers): (USTERNOP_QUALIFIERS): New define. (USMAC_LANE_QUADTUP_QUALIFIERS): New define. (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. (arm_expand_builtin_args): Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. * config/arm/arm_neon.h (vusdot_s32): New. (vusdot_lane_s32): New. (vusdotq_lane_s32): New. (vsudot_lane_s32): New. (vsudotq_lane_s32): New. * config/arm/arm_neon_builtins.def (usdot,usdot_lane,sudot_lane): New. * config/arm/iterators.md (DOTPROD_I8MM): New. (sup, opsuffix): Add <us/su>. * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. gcc/testsuite/ChangeLog: 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com> * gcc.target/arm/simd/vdot-compile-2-1.c: New test. * gcc.target/arm/simd/vdot-compile-2-2.c: New test. * gcc.target/arm/simd/vdot-compile-2-3.c: New test. * gcc.target/arm/simd/vdot-compile-2-4.c: New test. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: i8mm-us-su-dot-arm.patch --] [-- Type: text/x-patch; name="i8mm-us-su-dot-arm.patch", Size: 15615 bytes --] diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 2d902d0b325bc1fe5e22831ef8a59a2bb37c1225..a63c1a978fb1d436065ce9f5f082249c4ebf5ade 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -86,7 +86,10 @@ enum arm_type_qualifiers qualifier_const_void_pointer = 0x802, /* Lane indices selected in pairs - must be within range of previous argument = a vector. */ - qualifier_lane_pair_index = 0x1000 + qualifier_lane_pair_index = 0x1000, + /* Lane indices selected in quadtuplets - must be within range of previous + argument = a vector. */ + qualifier_lane_quadtup_index = 0x2000 }; /* The qualifier_internal allows generation of a unary builtin from @@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers) +/* T (T, unsigned T, T). */ +static enum arm_type_qualifiers +arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none }; +#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers) + /* T (T, immediate). */ static enum arm_type_qualifiers arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers) +/* T (T, unsigned T, T, lane index). */ +static enum arm_type_qualifiers +arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers) + +/* T (T, T, unsigend T, lane index). */ +static enum arm_type_qualifiers +arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, + qualifier_unsigned, qualifier_lane_quadtup_index }; +#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers) + /* T (T, T, immediate). */ static enum arm_type_qualifiers arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -2148,6 +2172,7 @@ typedef enum { ARG_BUILTIN_LANE_INDEX, ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX, ARG_BUILTIN_LANE_PAIR_INDEX, + ARG_BUILTIN_LANE_QUADTUP_INDEX, ARG_BUILTIN_NEON_MEMORY, ARG_BUILTIN_MEMORY, ARG_BUILTIN_STOP @@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode, if (CONST_INT_P (op[argc])) { machine_mode vmode = mode[argc - 1]; - neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp); + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (vmode) / 2, exp); + } + /* If the lane index isn't a constant then error out. */ + goto constant_arg; + + case ARG_BUILTIN_LANE_QUADTUP_INDEX: + /* Previous argument must be a vector, which this indexes. */ + gcc_assert (argc > 0); + if (CONST_INT_P (op[argc])) + { + machine_mode vmode = mode[argc - 1]; + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (vmode) / 4, exp); } - /* If the lane index isn't a constant then the next - case will error. */ - /* Fall through. */ + /* If the lane index isn't a constant then error out. */ + goto constant_arg; + case ARG_BUILTIN_CONSTANT: constant_arg: if (!(*insn_data[icode].operand[opno].predicate) @@ -2464,6 +2502,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target, args[k] = ARG_BUILTIN_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index) args[k] = ARG_BUILTIN_LANE_PAIR_INDEX; + else if (d->qualifiers[qualifiers_k] & qualifier_lane_quadtup_index) + args[k] = ARG_BUILTIN_LANE_QUADTUP_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index) args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_immediate) diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 1f200d491d1de3993bc3a682d586da137958ff6b..53602773a341535bfc9ff16dc4ac8f2b999df2ad 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18738,6 +18738,52 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); } + +/* AdvSIMD Matrix Multiply-Accumulate and Dot Product intrinsics. */ +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+i8mm") + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b) +{ + return __builtin_neon_usdotv8qi_ssus (__r, __a, __b); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a, + int8x8_t __b, const int __index) +{ + return __builtin_neon_usdot_lanev8qi_ssuss (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdotq_lane_s32 (int32x4_t __r, uint8x16_t __a, + int8x8_t __b, const int __index) +{ + return __builtin_neon_usdot_lanev16qi_ssuss (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vsudot_lane_s32 (int32x2_t __r, int8x8_t __a, + uint8x8_t __b, const int __index) +{ + return __builtin_neon_sudot_lanev8qi_sssus (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a, + uint8x8_t __b, const int __index) +{ + return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index); +} + +#pragma GCC pop_options + #pragma GCC pop_options #endif diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index bcccf93f7fa2750e9006e5856efecbec0fb331b9..7af85ee27bc84b4229633a0337b550e4ef2ec14b 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -352,6 +352,10 @@ VAR2 (UTERNOP, udot, v8qi, v16qi) VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi) VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi) +VAR1 (USTERNOP, usdot, v8qi) +VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi) +VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi) + VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf) VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf) VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index c412851843f4468c2c18bce264288705e076ac50..e58c706f9fb63271d1aadb1498c0b32674838f46 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -466,6 +466,8 @@ (define_int_iterator DOTPROD [UNSPEC_DOT_S UNSPEC_DOT_U]) +(define_int_iterator DOTPROD_I8MM [UNSPEC_DOT_US UNSPEC_DOT_SU]) + (define_int_iterator VFMLHALVES [UNSPEC_VFML_LO UNSPEC_VFML_HI]) (define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270]) @@ -920,6 +922,7 @@ (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u") (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u") (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u") + (UNSPEC_DOT_US "us") (UNSPEC_DOT_SU "su") (UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u") ]) @@ -1151,6 +1154,9 @@ (define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")]) (define_int_attr opsuffix [(UNSPEC_DOT_S "s8") - (UNSPEC_DOT_U "u8")]) + (UNSPEC_DOT_U "u8") + (UNSPEC_DOT_US "s8") + (UNSPEC_DOT_SU "u8") + ]) (define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 6a0ee28efc9aa9f1fba7b5ae031564f40aa095fe..7de31220d5d0712269137bc2c64d90ec9bfdcb2c 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -3279,6 +3279,20 @@ [(set_attr "type" "neon_dot<q>")] ) +;; These instructions map to the __builtins for the Dot Product operations. +(define_insn "neon_usdot<vsi2qi>" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI (match_operand:VCVTI 1 "register_operand" "0") + (unspec:VCVTI [(match_operand:<VSI2QI> 2 + "register_operand" "w") + (match_operand:<VSI2QI> 3 + "register_operand" "w")] + UNSPEC_DOT_US)))] + "TARGET_I8MM" + "vusdot.s8\\t%<V_reg>0, %<V_reg>2, %<V_reg>3" + [(set_attr "type" "neon_dot<q>")] +) + ;; These instructions map to the __builtins for the Dot Product ;; indexed operations. (define_insn "neon_<sup>dot_lane<vsi2qi>" @@ -3298,6 +3312,24 @@ [(set_attr "type" "neon_dot<q>")] ) +;; These instructions map to the __builtins for the Dot Product +;; indexed operations in the v8.6 I8MM extension. +(define_insn "neon_<sup>dot_lane<vsi2qi>" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI (match_operand:VCVTI 1 "register_operand" "0") + (unspec:VCVTI [(match_operand:<VSI2QI> 2 + "register_operand" "w") + (match_operand:V8QI 3 "register_operand" "t") + (match_operand:SI 4 "immediate_operand" "i")] + DOTPROD_I8MM)))] + "TARGET_I8MM" + { + operands[4] = GEN_INT (INTVAL (operands[4])); + return "v<sup>dot.<opsuffix>\\t%<V_reg>0, %<V_reg>2, %P3[%c4]"; + } + [(set_attr "type" "neon_dot<q>")] +) + ;; These expands map to the Dot Product optab the vectorizer checks for. ;; The auto-vectorizer expects a dot product builtin that also does an ;; accumulation into the provided register. diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index b4196b0e5cd939c3ee5e3f9bd19622fcc963adae..837471f49543d4faa8614bd54f2db8d37991c443 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -485,6 +485,8 @@ UNSPEC_VRNDX UNSPEC_DOT_S UNSPEC_DOT_U + UNSPEC_DOT_US + UNSPEC_DOT_SU UNSPEC_VFML_LO UNSPEC_VFML_HI UNSPEC_VCADD90 diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-1.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-1.c new file mode 100644 index 0000000000000000000000000000000000000000..862cf3211e71cbf8127f2b0f141c206676bf9bdb --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-1.c @@ -0,0 +1,42 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + return vusdotq_lane_s32 (r, x, y, 1); +} + +/* Signed-Unsigned Dot Product instructions. */ + + +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + return vsudot_lane_s32 (r, x, y, 0); +} + +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + return vsudotq_lane_s32 (r, x, y, 1); +} + +/* { dg-final { scan-assembler {vusdot\.s8\td[0-9]+, d[0-9]+, d[0-9]+} } } */ +/* { dg-final { scan-assembler {vusdot\.s8\td[0-9]+, d[0-9]+, d[0-9]+\[#?0\]} } } */ +/* { dg-final { scan-assembler {vusdot\.s8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?1\]} } } */ +/* { dg-final { scan-assembler {vsudot\.u8\td[0-9]+, d[0-9]+, d[0-9]+\[#?0\]} } } */ +/* { dg-final { scan-assembler {vsudot\.u8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?1\]} } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-2.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-2.c new file mode 100644 index 0000000000000000000000000000000000000000..91ecb073fdb5bd4523c9b1e62aed03de5adb820d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-2.c @@ -0,0 +1,42 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "-mbig-endian --save-temps" } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + return vusdotq_lane_s32 (r, x, y, 1); +} + +/* Signed-Unsigned Dot Product instructions. */ + + +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + return vsudot_lane_s32 (r, x, y, 0); +} + +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + return vsudotq_lane_s32 (r, x, y, 1); +} + +/* { dg-final { scan-assembler {vusdot\.s8\td[0-9]+, d[0-9]+, d[0-9]+} } } */ +/* { dg-final { scan-assembler {vusdot\.s8\td[0-9]+, d[0-9]+, d[0-9]+\[#?0\]} } } */ +/* { dg-final { scan-assembler {vusdot\.s8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?1\]} } } */ +/* { dg-final { scan-assembler {vsudot\.u8\td[0-9]+, d[0-9]+, d[0-9]+\[#?0\]} } } */ +/* { dg-final { scan-assembler {vsudot\.u8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?1\]} } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-3.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-3.c new file mode 100644 index 0000000000000000000000000000000000000000..e14fe8f4433c9bf4c3347ebf728157bdb54861b2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-3.c @@ -0,0 +1,21 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vusdot_lane_s32 (r, x, y, -1); +} + + +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vusdotq_lane_s32 (r, x, y, 2); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-4.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-4.c new file mode 100644 index 0000000000000000000000000000000000000000..fb7ebb484e1778a1d06611f8c8a639d4c0dcb9a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-4.c @@ -0,0 +1,20 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include <arm_neon.h> + +/* Signed-Unsigned Dot Product instructions. */ + +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vsudot_lane_s32 (r, x, y, -1); +} + +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vsudotq_lane_s32 (r, x, y, 2); +} ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ping][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension 2019-12-13 10:23 [GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension Stam Markianos-Wright @ 2019-12-18 13:26 ` Stam Markianos-Wright 2020-01-10 19:25 ` Stam Markianos-Wright 0 siblings, 1 reply; 9+ messages in thread From: Stam Markianos-Wright @ 2019-12-18 13:26 UTC (permalink / raw) To: gcc-patches Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan, Richard Sandiford On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: > Hi all, > > This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product > operations (vector/by element) to the ARM back-end. > > These are: > usdot (vector), <us/su>dot (by element). > > The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and > for ARM they remain optional as of ARMv8.6-a. > > The functions are declared in arm_neon.h, RTL patterns are defined to > generate assembler and tests are added to verify and perform adequate > checks. > > Regression testing on arm-none-eabi passed successfully. > > This patch depends on: > > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html > > for ARM CLI updates, and on: > > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html > > for testsuite effective_target update. > > Ok for trunk? .Ping :) > > Cheers, > Stam > > > ACLE documents are at https://developer.arm.com/docs/101028/latest > ISA documents are at https://developer.arm.com/docs/ddi0596/latest > > PS. I don't have commit rights, so if someone could commit on my behalf, > that would be great :) > > > gcc/ChangeLog: > > 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com> > > * config/arm/arm-builtins.c (enum arm_type_qualifiers): > (USTERNOP_QUALIFIERS): New define. > (USMAC_LANE_QUADTUP_QUALIFIERS): New define. > (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. > (arm_expand_builtin_args): > Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. > (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. > * config/arm/arm_neon.h (vusdot_s32): New. > (vusdot_lane_s32): New. > (vusdotq_lane_s32): New. > (vsudot_lane_s32): New. > (vsudotq_lane_s32): New. > * config/arm/arm_neon_builtins.def > (usdot,usdot_lane,sudot_lane): New. > * config/arm/iterators.md (DOTPROD_I8MM): New. > (sup, opsuffix): Add <us/su>. > * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. > * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. > > > gcc/testsuite/ChangeLog: > > 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com> > > * gcc.target/arm/simd/vdot-compile-2-1.c: New test. > * gcc.target/arm/simd/vdot-compile-2-2.c: New test. > * gcc.target/arm/simd/vdot-compile-2-3.c: New test. > * gcc.target/arm/simd/vdot-compile-2-4.c: New test. > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ping][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension 2019-12-18 13:26 ` [Ping][GCC][PATCH][ARM]Add " Stam Markianos-Wright @ 2020-01-10 19:25 ` Stam Markianos-Wright 2020-01-16 16:17 ` [Pingx2][GCC][PATCH][ARM]Add " Stam Markianos-Wright 0 siblings, 1 reply; 9+ messages in thread From: Stam Markianos-Wright @ 2020-01-10 19:25 UTC (permalink / raw) To: gcc-patches Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan, Richard Sandiford [-- Attachment #1: Type: text/plain, Size: 3030 bytes --] On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: > > > On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >> Hi all, >> >> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >> operations (vector/by element) to the ARM back-end. >> >> These are: >> usdot (vector), <us/su>dot (by element). >> >> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >> for ARM they remain optional as of ARMv8.6-a. >> >> The functions are declared in arm_neon.h, RTL patterns are defined to >> generate assembler and tests are added to verify and perform adequate checks. >> >> Regression testing on arm-none-eabi passed successfully. >> >> This patch depends on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >> >> for ARM CLI updates, and on: >> >> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >> >> for testsuite effective_target update. >> >> Ok for trunk? > > .Ping :) > Ping :) New diff addressing review comments from Aarch64 version of the patch. _Change of order of operands in RTL patterns. _Change tests to use check-function-bodies, compile with optimisation and check for exact registers. _Rename tests to remove "-compile-" in filename. >> >> Cheers, >> Stam >> >> >> ACLE documents are at https://developer.arm.com/docs/101028/latest >> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >> >> PS. I don't have commit rights, so if someone could commit on my behalf, >> that would be great :) >> >> >> gcc/ChangeLog: >> >> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com> >> >> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >> (USTERNOP_QUALIFIERS): New define. >> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >> (arm_expand_builtin_args): >> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >> * config/arm/arm_neon.h (vusdot_s32): New. >> (vusdot_lane_s32): New. >> (vusdotq_lane_s32): New. >> (vsudot_lane_s32): New. >> (vsudotq_lane_s32): New. >> * config/arm/arm_neon_builtins.def >> (usdot,usdot_lane,sudot_lane): New. >> * config/arm/iterators.md (DOTPROD_I8MM): New. >> (sup, opsuffix): Add <us/su>. >> * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. >> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >> >> >> gcc/testsuite/ChangeLog: >> >> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com> >> >> * gcc.target/arm/simd/vdot-compile-2-1.c: New test. >> * gcc.target/arm/simd/vdot-compile-2-2.c: New test. >> * gcc.target/arm/simd/vdot-compile-2-3.c: New test. >> * gcc.target/arm/simd/vdot-compile-2-4.c: New test. >> >> [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: I8MM-32-final.patch --] [-- Type: text/x-patch; name="I8MM-32-final.patch", Size: 15884 bytes --] diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index df84560588a..1b4316d0e93 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -86,7 +86,10 @@ enum arm_type_qualifiers qualifier_const_void_pointer = 0x802, /* Lane indices selected in pairs - must be within range of previous argument = a vector. */ - qualifier_lane_pair_index = 0x1000 + qualifier_lane_pair_index = 0x1000, + /* Lane indices selected in quadtuplets - must be within range of previous + argument = a vector. */ + qualifier_lane_quadtup_index = 0x2000 }; /* The qualifier_internal allows generation of a unary builtin from @@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers) +/* T (T, unsigned T, T). */ +static enum arm_type_qualifiers +arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none }; +#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers) + /* T (T, immediate). */ static enum arm_type_qualifiers arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers) +/* T (T, unsigned T, T, lane index). */ +static enum arm_type_qualifiers +arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers) + +/* T (T, T, unsigend T, lane index). */ +static enum arm_type_qualifiers +arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, + qualifier_unsigned, qualifier_lane_quadtup_index }; +#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers) + /* T (T, T, immediate). */ static enum arm_type_qualifiers arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -2148,6 +2172,7 @@ typedef enum { ARG_BUILTIN_LANE_INDEX, ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX, ARG_BUILTIN_LANE_PAIR_INDEX, + ARG_BUILTIN_LANE_QUADTUP_INDEX, ARG_BUILTIN_NEON_MEMORY, ARG_BUILTIN_MEMORY, ARG_BUILTIN_STOP @@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode, if (CONST_INT_P (op[argc])) { machine_mode vmode = mode[argc - 1]; - neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp); + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (vmode) / 2, exp); + } + /* If the lane index isn't a constant then error out. */ + goto constant_arg; + + case ARG_BUILTIN_LANE_QUADTUP_INDEX: + /* Previous argument must be a vector, which this indexes. */ + gcc_assert (argc > 0); + if (CONST_INT_P (op[argc])) + { + machine_mode vmode = mode[argc - 1]; + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (vmode) / 4, exp); } - /* If the lane index isn't a constant then the next - case will error. */ - /* Fall through. */ + /* If the lane index isn't a constant then error out. */ + goto constant_arg; + case ARG_BUILTIN_CONSTANT: constant_arg: if (!(*insn_data[icode].operand[opno].predicate) @@ -2464,6 +2502,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target, args[k] = ARG_BUILTIN_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index) args[k] = ARG_BUILTIN_LANE_PAIR_INDEX; + else if (d->qualifiers[qualifiers_k] & qualifier_lane_quadtup_index) + args[k] = ARG_BUILTIN_LANE_QUADTUP_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index) args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_immediate) diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index db8db53614a..ede89ec2c64 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18738,6 +18738,52 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); } + +/* AdvSIMD Matrix Multiply-Accumulate and Dot Product intrinsics. */ +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+i8mm") + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b) +{ + return __builtin_neon_usdotv8qi_ssus (__r, __a, __b); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a, + int8x8_t __b, const int __index) +{ + return __builtin_neon_usdot_lanev8qi_ssuss (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdotq_lane_s32 (int32x4_t __r, uint8x16_t __a, + int8x8_t __b, const int __index) +{ + return __builtin_neon_usdot_lanev16qi_ssuss (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vsudot_lane_s32 (int32x2_t __r, int8x8_t __a, + uint8x8_t __b, const int __index) +{ + return __builtin_neon_sudot_lanev8qi_sssus (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a, + uint8x8_t __b, const int __index) +{ + return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index); +} + +#pragma GCC pop_options + #pragma GCC pop_options #endif diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index e9ff4e501cb..b4537ff5de9 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -352,6 +352,10 @@ VAR2 (UTERNOP, udot, v8qi, v16qi) VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi) VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi) +VAR1 (USTERNOP, usdot, v8qi) +VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi) +VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi) + VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf) VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf) VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 7da8b74abc0..afea7f823e0 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -466,6 +466,8 @@ (define_int_iterator DOTPROD [UNSPEC_DOT_S UNSPEC_DOT_U]) +(define_int_iterator DOTPROD_I8MM [UNSPEC_DOT_US UNSPEC_DOT_SU]) + (define_int_iterator VFMLHALVES [UNSPEC_VFML_LO UNSPEC_VFML_HI]) (define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270]) @@ -920,6 +922,7 @@ (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u") (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u") (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u") + (UNSPEC_DOT_US "us") (UNSPEC_DOT_SU "su") (UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u") ]) @@ -1151,6 +1154,9 @@ (define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")]) (define_int_attr opsuffix [(UNSPEC_DOT_S "s8") - (UNSPEC_DOT_U "u8")]) + (UNSPEC_DOT_U "u8") + (UNSPEC_DOT_US "s8") + (UNSPEC_DOT_SU "u8") + ]) (define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index dace9470c41..8b83cba8fb7 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -3279,6 +3279,20 @@ [(set_attr "type" "neon_dot<q>")] ) +;; These instructions map to the __builtins for the Dot Product operations. +(define_insn "neon_usdot<vsi2qi>" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI + (unspec:VCVTI + [(match_operand:<VSI2QI> 2 "register_operand" "w") + (match_operand:<VSI2QI> 3 "register_operand" "w")] + UNSPEC_DOT_US) + (match_operand:VCVTI 1 "register_operand" "0")))] + "TARGET_I8MM" + "vusdot.s8\\t%<V_reg>0, %<V_reg>2, %<V_reg>3" + [(set_attr "type" "neon_dot<q>")] +) + ;; These instructions map to the __builtins for the Dot Product ;; indexed operations. (define_insn "neon_<sup>dot_lane<vsi2qi>" @@ -3298,6 +3312,25 @@ [(set_attr "type" "neon_dot<q>")] ) +;; These instructions map to the __builtins for the Dot Product +;; indexed operations in the v8.6 I8MM extension. +(define_insn "neon_<sup>dot_lane<vsi2qi>" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI + (unspec:VCVTI + [(match_operand:<VSI2QI> 2 "register_operand" "w") + (match_operand:V8QI 3 "register_operand" "t") + (match_operand:SI 4 "immediate_operand" "i")] + DOTPROD_I8MM) + (match_operand:VCVTI 1 "register_operand" "0")))] + "TARGET_I8MM" + { + operands[4] = GEN_INT (INTVAL (operands[4])); + return "v<sup>dot.<opsuffix>\\t%<V_reg>0, %<V_reg>2, %P3[%c4]"; + } + [(set_attr "type" "neon_dot<q>")] +) + ;; These expands map to the Dot Product optab the vectorizer checks for. ;; The auto-vectorizer expects a dot product builtin that also does an ;; accumulation into the provided register. diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index ade6b1af994..0aaff3b4bfc 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -485,6 +485,8 @@ UNSPEC_VRNDX UNSPEC_DOT_S UNSPEC_DOT_U + UNSPEC_DOT_US + UNSPEC_DOT_SU UNSPEC_VFML_LO UNSPEC_VFML_HI UNSPEC_VCADD90 diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c new file mode 100644 index 00000000000..4d5f07b771b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c @@ -0,0 +1,91 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "-O -save-temps" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +/* +**usfoo: +** ... +** vusdot\.s8 d0, d1, d2 +** bx lr +*/ +int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane: +** ... +** vusdot\.s8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + +/* +**usfooq_lane: +** ... +** vusdot\.s8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + return vusdotq_lane_s32 (r, x, y, 1); +} + +/* Signed-Unsigned Dot Product instructions. */ + +/* +**sfoo_lane: +** ... +** vsudot\.u8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + return vsudot_lane_s32 (r, x, y, 0); +} + +/* +**sfooq_lane: +** ... +** vsudot\.u8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + return vsudotq_lane_s32 (r, x, y, 1); +} + +/* +**usfoo_untied: +** ... +** vusdot\.s8 d1, d2, d3 +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane_untied: +** ... +** vusdot.s8 d1, d2, d3\[0\] +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c new file mode 100644 index 00000000000..b7b76e27486 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c @@ -0,0 +1,90 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "-O -save-temps -mbig-endian" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +/* +**usfoo: +** ... +** vusdot\.s8 d0, d1, d2 +** bx lr +*/ +int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane: +** ... +** vusdot\.s8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + +/* +**usfooq_lane: +** ... +** vusdot\.s8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + return vusdotq_lane_s32 (r, x, y, 1); +} + +/* Signed-Unsigned Dot Product instructions. */ + +/* +**sfoo_lane: +** ... +** vsudot\.u8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + return vsudot_lane_s32 (r, x, y, 0); +} + +/* +**sfooq_lane: +** ... +** vsudot\.u8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + return vsudotq_lane_s32 (r, x, y, 1); +} + +/* +**usfoo_untied: +** ... +** vusdot\.s8 d1, d2, d3 +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane_untied: +** ... +** vusdot.s8 d1, d2, d3\[0\] +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c new file mode 100644 index 00000000000..e14fe8f4433 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c @@ -0,0 +1,21 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vusdot_lane_s32 (r, x, y, -1); +} + + +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vusdotq_lane_s32 (r, x, y, 2); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c new file mode 100644 index 00000000000..fb7ebb484e1 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c @@ -0,0 +1,20 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include <arm_neon.h> + +/* Signed-Unsigned Dot Product instructions. */ + +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vsudot_lane_s32 (r, x, y, -1); +} + +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vsudotq_lane_s32 (r, x, y, 2); +} ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Pingx2][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension 2020-01-10 19:25 ` Stam Markianos-Wright @ 2020-01-16 16:17 ` Stam Markianos-Wright 2020-01-27 16:08 ` [Pingx3][GCC][PATCH][ARM]Add " Stam Markianos-Wright 0 siblings, 1 reply; 9+ messages in thread From: Stam Markianos-Wright @ 2020-01-16 16:17 UTC (permalink / raw) To: gcc-patches Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan, Richard Sandiford On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: > > > On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >> >> >> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>> Hi all, >>> >>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>> operations (vector/by element) to the ARM back-end. >>> >>> These are: >>> usdot (vector), <us/su>dot (by element). >>> >>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>> for ARM they remain optional as of ARMv8.6-a. >>> >>> The functions are declared in arm_neon.h, RTL patterns are defined to >>> generate assembler and tests are added to verify and perform adequate checks. >>> >>> Regression testing on arm-none-eabi passed successfully. >>> >>> This patch depends on: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>> >>> for ARM CLI updates, and on: >>> >>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>> >>> for testsuite effective_target update. >>> >>> Ok for trunk? >> >> .Ping :) >> > Ping :) > > New diff addressing review comments from Aarch64 version of the patch. > > _Change of order of operands in RTL patterns. > _Change tests to use check-function-bodies, compile with optimisation and check > for exact registers. > _Rename tests to remove "-compile-" in filename. > Ping! Cheers, Stam >>> >>> Cheers, >>> Stam >>> >>> >>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>> >>> PS. I don't have commit rights, so if someone could commit on my behalf, >>> that would be great :) >>> >>> >>> gcc/ChangeLog: >>> >>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com> >>> >>> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>> (USTERNOP_QUALIFIERS): New define. >>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>> (arm_expand_builtin_args): >>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>> * config/arm/arm_neon.h (vusdot_s32): New. >>> (vusdot_lane_s32): New. >>> (vusdotq_lane_s32): New. >>> (vsudot_lane_s32): New. >>> (vsudotq_lane_s32): New. >>> * config/arm/arm_neon_builtins.def >>> (usdot,usdot_lane,sudot_lane): New. >>> * config/arm/iterators.md (DOTPROD_I8MM): New. >>> (sup, opsuffix): Add <us/su>. >>> * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. >>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>> >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com> >>> >>> * gcc.target/arm/simd/vdot-compile-2-1.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-2.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-3.c: New test. >>> * gcc.target/arm/simd/vdot-compile-2-4.c: New test. >>> >>> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension 2020-01-16 16:17 ` [Pingx2][GCC][PATCH][ARM]Add " Stam Markianos-Wright @ 2020-01-27 16:08 ` Stam Markianos-Wright 2020-02-03 11:20 ` Stam Markianos-Wright 0 siblings, 1 reply; 9+ messages in thread From: Stam Markianos-Wright @ 2020-01-27 16:08 UTC (permalink / raw) To: gcc-patches Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan, Richard Sandiford On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: > > > On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: >> >> >> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >>> >>> >>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>>> Hi all, >>>> >>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>>> operations (vector/by element) to the ARM back-end. >>>> >>>> These are: >>>> usdot (vector), <us/su>dot (by element). >>>> >>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>>> for ARM they remain optional as of ARMv8.6-a. >>>> >>>> The functions are declared in arm_neon.h, RTL patterns are defined to >>>> generate assembler and tests are added to verify and perform adequate checks. >>>> >>>> Regression testing on arm-none-eabi passed successfully. >>>> >>>> This patch depends on: >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>>> >>>> for ARM CLI updates, and on: >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>>> >>>> for testsuite effective_target update. >>>> >>>> Ok for trunk? >>> >>> .Ping :) >>> >> Ping :) >> >> New diff addressing review comments from Aarch64 version of the patch. >> >> _Change of order of operands in RTL patterns. >> _Change tests to use check-function-bodies, compile with optimisation and >> check for exact registers. >> _Rename tests to remove "-compile-" in filename. >> > > Ping! > > Cheers, > Stam > >>>> >>>> Cheers, >>>> Stam >>>> >>>> >>>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>>> >>>> PS. I don't have commit rights, so if someone could commit on my behalf, >>>> that would be great :) >>>> >>>> >>>> gcc/ChangeLog: >>>> >>>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com> >>>> >>>> * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>>> (USTERNOP_QUALIFIERS): New define. >>>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>> (arm_expand_builtin_args): >>>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>>> * config/arm/arm_neon.h (vusdot_s32): New. >>>> (vusdot_lane_s32): New. >>>> (vusdotq_lane_s32): New. >>>> (vsudot_lane_s32): New. >>>> (vsudotq_lane_s32): New. >>>> * config/arm/arm_neon_builtins.def >>>> (usdot,usdot_lane,sudot_lane): New. >>>> * config/arm/iterators.md (DOTPROD_I8MM): New. >>>> (sup, opsuffix): Add <us/su>. >>>> * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. >>>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>>> >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com> >>>> >>>> * gcc.target/arm/simd/vdot-compile-2-1.c: New test. >>>> * gcc.target/arm/simd/vdot-compile-2-2.c: New test. >>>> * gcc.target/arm/simd/vdot-compile-2-3.c: New test. >>>> * gcc.target/arm/simd/vdot-compile-2-4.c: New test. >>>> >>>> >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension 2020-01-27 16:08 ` [Pingx3][GCC][PATCH][ARM]Add " Stam Markianos-Wright @ 2020-02-03 11:20 ` Stam Markianos-Wright 2020-02-10 13:36 ` Stam Markianos-Wright 0 siblings, 1 reply; 9+ messages in thread From: Stam Markianos-Wright @ 2020-02-03 11:20 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Earnshaw, kyrylo.tkachov, nickc, ramana.radhakrishnan On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: > > On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: >> >> >> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: >>> >>> >>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >>>> >>>> >>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>>>> Hi all, >>>>> >>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>>>> operations (vector/by element) to the ARM back-end. >>>>> >>>>> These are: >>>>> usdot (vector), <us/su>dot (by element). >>>>> >>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>>>> for ARM they remain optional as of ARMv8.6-a. >>>>> >>>>> The functions are declared in arm_neon.h, RTL patterns are defined to >>>>> generate assembler and tests are added to verify and perform adequate checks. >>>>> >>>>> Regression testing on arm-none-eabi passed successfully. >>>>> >>>>> This patch depends on: >>>>> >>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>>>> >>>>> for ARM CLI updates, and on: >>>>> >>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>>>> >>>>> for testsuite effective_target update. >>>>> >>>>> Ok for trunk? >>>> >>> >>> New diff addressing review comments from Aarch64 version of the patch. >>> >>> _Change of order of operands in RTL patterns. >>> _Change tests to use check-function-bodies, compile with optimisation and >>> check for exact registers. >>> _Rename tests to remove "-compile-" in filename. >>> >> .Ping! . >> >> Cheers, >> Stam >> >>>>> >>>>> >>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>>>> >>>>> PS. I don't have commit rights, so if someone could commit on my behalf, >>>>> that would be great :) >>>>> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com> >>>>> >>>>>      * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>>>>      (USTERNOP_QUALIFIERS): New define. >>>>>      (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>>>      (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>>>      (arm_expand_builtin_args): >>>>>         Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>>>>      (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>>>>      * config/arm/arm_neon.h (vusdot_s32): New. >>>>>      (vusdot_lane_s32): New. >>>>>      (vusdotq_lane_s32): New. >>>>>      (vsudot_lane_s32): New. >>>>>      (vsudotq_lane_s32): New. >>>>>      * config/arm/arm_neon_builtins.def >>>>>         (usdot,usdot_lane,sudot_lane): New. >>>>>      * config/arm/iterators.md (DOTPROD_I8MM): New. >>>>>         (sup, opsuffix): Add <us/su>. >>>>>        * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. >>>>>      * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>>>> >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com> >>>>> >>>>>      * gcc.target/arm/simd/vdot-compile-2-1.c: New test. >>>>>      * gcc.target/arm/simd/vdot-compile-2-2.c: New test. >>>>>      * gcc.target/arm/simd/vdot-compile-2-3.c: New test. >>>>>      * gcc.target/arm/simd/vdot-compile-2-4.c: New test. >>>>> >>>>> >>> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension 2020-02-03 11:20 ` Stam Markianos-Wright @ 2020-02-10 13:36 ` Stam Markianos-Wright 2020-02-11 10:26 ` Kyrill Tkachov 0 siblings, 1 reply; 9+ messages in thread From: Stam Markianos-Wright @ 2020-02-10 13:36 UTC (permalink / raw) To: gcc-patches; +Cc: Richard Earnshaw, kyrylo.tkachov, nickc, ramana.radhakrishnan [-- Attachment #1: Type: text/plain, Size: 3822 bytes --] On 2/3/20 11:20 AM, Stam Markianos-Wright wrote: > > > On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: >> >> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: >>> >>> >>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: >>>> >>>> >>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >>>>> >>>>> >>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >>>>>> Hi all, >>>>>> >>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >>>>>> operations (vector/by element) to the ARM back-end. >>>>>> >>>>>> These are: >>>>>> usdot (vector), <us/su>dot (by element). >>>>>> >>>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >>>>>> for ARM they remain optional as of ARMv8.6-a. >>>>>> >>>>>> The functions are declared in arm_neon.h, RTL patterns are defined to >>>>>> generate assembler and tests are added to verify and perform adequate checks. >>>>>> >>>>>> Regression testing on arm-none-eabi passed successfully. >>>>>> >>>>>> This patch depends on: >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >>>>>> >>>>>> for ARM CLI updates, and on: >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >>>>>> >>>>>> for testsuite effective_target update. >>>>>> >>>>>> Ok for trunk? >>>>> >>>> >>>> New diff addressing review comments from Aarch64 version of the patch. >>>> >>>> _Change of order of operands in RTL patterns. >>>> _Change tests to use check-function-bodies, compile with optimisation and >>>> check for exact registers. >>>> _Rename tests to remove "-compile-" in filename. >>>> >>> > .Ping! Ping :) Diff re-attached in this ping email is same as the one posted on 10/01 Thank you! > . >>> >>> Cheers, >>> Stam >>> >>>>>> >>>>>> >>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest >>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >>>>>> >>>>>> PS. I don't have commit rights, so if someone could commit on my behalf, >>>>>> that would be great :) >>>>>> >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com> >>>>>> >>>>>>      * config/arm/arm-builtins.c (enum arm_type_qualifiers): >>>>>>      (USTERNOP_QUALIFIERS): New define. >>>>>>      (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>>>>      (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >>>>>>      (arm_expand_builtin_args): >>>>>>         Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >>>>>>      (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >>>>>>      * config/arm/arm_neon.h (vusdot_s32): New. >>>>>>      (vusdot_lane_s32): New. >>>>>>      (vusdotq_lane_s32): New. >>>>>>      (vsudot_lane_s32): New. >>>>>>      (vsudotq_lane_s32): New. >>>>>>      * config/arm/arm_neon_builtins.def >>>>>>         (usdot,usdot_lane,sudot_lane): New. >>>>>>      * config/arm/iterators.md (DOTPROD_I8MM): New. >>>>>>         (sup, opsuffix): Add <us/su>. >>>>>>        * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. >>>>>>      * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >>>>>> >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> >>>>>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com> >>>>>> >>>>>>      * gcc.target/arm/simd/vdot-2-1.c: New test. >>>>>>      * gcc.target/arm/simd/vdot-2-2.c: New test. >>>>>>      * gcc.target/arm/simd/vdot-2-3.c: New test. >>>>>>      * gcc.target/arm/simd/vdot-2-4.c: New test. >>>>>> >>>>>> >>>> [-- Attachment #2: I8MM-32-final.patch --] [-- Type: text/x-patch, Size: 15372 bytes --] diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index df84560588a..1b4316d0e93 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -86,7 +86,10 @@ enum arm_type_qualifiers qualifier_const_void_pointer = 0x802, /* Lane indices selected in pairs - must be within range of previous argument = a vector. */ - qualifier_lane_pair_index = 0x1000 + qualifier_lane_pair_index = 0x1000, + /* Lane indices selected in quadtuplets - must be within range of previous + argument = a vector. */ + qualifier_lane_quadtup_index = 0x2000 }; /* The qualifier_internal allows generation of a unary builtin from @@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned }; #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers) +/* T (T, unsigned T, T). */ +static enum arm_type_qualifiers +arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none }; +#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers) + /* T (T, immediate). */ static enum arm_type_qualifiers arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] qualifier_unsigned, qualifier_lane_index }; #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers) +/* T (T, unsigned T, T, lane index). */ +static enum arm_type_qualifiers +arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned, + qualifier_none, qualifier_lane_quadtup_index }; +#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers) + +/* T (T, T, unsigend T, lane index). */ +static enum arm_type_qualifiers +arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_none, + qualifier_unsigned, qualifier_lane_quadtup_index }; +#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers) + /* T (T, T, immediate). */ static enum arm_type_qualifiers arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS] @@ -2148,6 +2172,7 @@ typedef enum { ARG_BUILTIN_LANE_INDEX, ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX, ARG_BUILTIN_LANE_PAIR_INDEX, + ARG_BUILTIN_LANE_QUADTUP_INDEX, ARG_BUILTIN_NEON_MEMORY, ARG_BUILTIN_MEMORY, ARG_BUILTIN_STOP @@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode, if (CONST_INT_P (op[argc])) { machine_mode vmode = mode[argc - 1]; - neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp); + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (vmode) / 2, exp); + } + /* If the lane index isn't a constant then error out. */ + goto constant_arg; + + case ARG_BUILTIN_LANE_QUADTUP_INDEX: + /* Previous argument must be a vector, which this indexes. */ + gcc_assert (argc > 0); + if (CONST_INT_P (op[argc])) + { + machine_mode vmode = mode[argc - 1]; + neon_lane_bounds (op[argc], 0, + GET_MODE_NUNITS (vmode) / 4, exp); } - /* If the lane index isn't a constant then the next - case will error. */ - /* Fall through. */ + /* If the lane index isn't a constant then error out. */ + goto constant_arg; + case ARG_BUILTIN_CONSTANT: constant_arg: if (!(*insn_data[icode].operand[opno].predicate) @@ -2464,6 +2502,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target, args[k] = ARG_BUILTIN_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index) args[k] = ARG_BUILTIN_LANE_PAIR_INDEX; + else if (d->qualifiers[qualifiers_k] & qualifier_lane_quadtup_index) + args[k] = ARG_BUILTIN_LANE_QUADTUP_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index) args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX; else if (d->qualifiers[qualifiers_k] & qualifier_immediate) diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index db8db53614a..ede89ec2c64 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -18738,6 +18738,52 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b, return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index); } + +/* AdvSIMD Matrix Multiply-Accumulate and Dot Product intrinsics. */ +#pragma GCC push_options +#pragma GCC target ("arch=armv8.2-a+i8mm") + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b) +{ + return __builtin_neon_usdotv8qi_ssus (__r, __a, __b); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a, + int8x8_t __b, const int __index) +{ + return __builtin_neon_usdot_lanev8qi_ssuss (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vusdotq_lane_s32 (int32x4_t __r, uint8x16_t __a, + int8x8_t __b, const int __index) +{ + return __builtin_neon_usdot_lanev16qi_ssuss (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x2_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vsudot_lane_s32 (int32x2_t __r, int8x8_t __a, + uint8x8_t __b, const int __index) +{ + return __builtin_neon_sudot_lanev8qi_sssus (__r, __a, __b, __index); +} + +__extension__ extern __inline int32x4_t +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) +vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a, + uint8x8_t __b, const int __index) +{ + return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index); +} + +#pragma GCC pop_options + #pragma GCC pop_options #endif diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index e9ff4e501cb..b4537ff5de9 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -352,6 +352,10 @@ VAR2 (UTERNOP, udot, v8qi, v16qi) VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi) VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi) +VAR1 (USTERNOP, usdot, v8qi) +VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi) +VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi) + VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf) VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf) VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 7da8b74abc0..afea7f823e0 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -466,6 +466,8 @@ (define_int_iterator DOTPROD [UNSPEC_DOT_S UNSPEC_DOT_U]) +(define_int_iterator DOTPROD_I8MM [UNSPEC_DOT_US UNSPEC_DOT_SU]) + (define_int_iterator VFMLHALVES [UNSPEC_VFML_LO UNSPEC_VFML_HI]) (define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270]) @@ -920,6 +922,7 @@ (UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u") (UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u") (UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u") + (UNSPEC_DOT_US "us") (UNSPEC_DOT_SU "su") (UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u") ]) @@ -1151,6 +1154,9 @@ (define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")]) (define_int_attr opsuffix [(UNSPEC_DOT_S "s8") - (UNSPEC_DOT_U "u8")]) + (UNSPEC_DOT_U "u8") + (UNSPEC_DOT_US "s8") + (UNSPEC_DOT_SU "u8") + ]) (define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")]) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index dace9470c41..8b83cba8fb7 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -3279,6 +3279,20 @@ [(set_attr "type" "neon_dot<q>")] ) +;; These instructions map to the __builtins for the Dot Product operations. +(define_insn "neon_usdot<vsi2qi>" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI + (unspec:VCVTI + [(match_operand:<VSI2QI> 2 "register_operand" "w") + (match_operand:<VSI2QI> 3 "register_operand" "w")] + UNSPEC_DOT_US) + (match_operand:VCVTI 1 "register_operand" "0")))] + "TARGET_I8MM" + "vusdot.s8\\t%<V_reg>0, %<V_reg>2, %<V_reg>3" + [(set_attr "type" "neon_dot<q>")] +) + ;; These instructions map to the __builtins for the Dot Product ;; indexed operations. (define_insn "neon_<sup>dot_lane<vsi2qi>" @@ -3298,6 +3312,25 @@ [(set_attr "type" "neon_dot<q>")] ) +;; These instructions map to the __builtins for the Dot Product +;; indexed operations in the v8.6 I8MM extension. +(define_insn "neon_<sup>dot_lane<vsi2qi>" + [(set (match_operand:VCVTI 0 "register_operand" "=w") + (plus:VCVTI + (unspec:VCVTI + [(match_operand:<VSI2QI> 2 "register_operand" "w") + (match_operand:V8QI 3 "register_operand" "t") + (match_operand:SI 4 "immediate_operand" "i")] + DOTPROD_I8MM) + (match_operand:VCVTI 1 "register_operand" "0")))] + "TARGET_I8MM" + { + operands[4] = GEN_INT (INTVAL (operands[4])); + return "v<sup>dot.<opsuffix>\\t%<V_reg>0, %<V_reg>2, %P3[%c4]"; + } + [(set_attr "type" "neon_dot<q>")] +) + ;; These expands map to the Dot Product optab the vectorizer checks for. ;; The auto-vectorizer expects a dot product builtin that also does an ;; accumulation into the provided register. diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index ade6b1af994..0aaff3b4bfc 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -485,6 +485,8 @@ UNSPEC_VRNDX UNSPEC_DOT_S UNSPEC_DOT_U + UNSPEC_DOT_US + UNSPEC_DOT_SU UNSPEC_VFML_LO UNSPEC_VFML_HI UNSPEC_VCADD90 diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c new file mode 100644 index 00000000000..4d5f07b771b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c @@ -0,0 +1,91 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "-O -save-temps" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +/* +**usfoo: +** ... +** vusdot\.s8 d0, d1, d2 +** bx lr +*/ +int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane: +** ... +** vusdot\.s8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + +/* +**usfooq_lane: +** ... +** vusdot\.s8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + return vusdotq_lane_s32 (r, x, y, 1); +} + +/* Signed-Unsigned Dot Product instructions. */ + +/* +**sfoo_lane: +** ... +** vsudot\.u8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + return vsudot_lane_s32 (r, x, y, 0); +} + +/* +**sfooq_lane: +** ... +** vsudot\.u8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + return vsudotq_lane_s32 (r, x, y, 1); +} + +/* +**usfoo_untied: +** ... +** vusdot\.s8 d1, d2, d3 +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane_untied: +** ... +** vusdot.s8 d1, d2, d3\[0\] +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c new file mode 100644 index 00000000000..b7b76e27486 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c @@ -0,0 +1,90 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "-O -save-temps -mbig-endian" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +/* +**usfoo: +** ... +** vusdot\.s8 d0, d1, d2 +** bx lr +*/ +int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane: +** ... +** vusdot\.s8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} + +/* +**usfooq_lane: +** ... +** vusdot\.s8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + return vusdotq_lane_s32 (r, x, y, 1); +} + +/* Signed-Unsigned Dot Product instructions. */ + +/* +**sfoo_lane: +** ... +** vsudot\.u8 d0, d1, d2\[0\] +** bx lr +*/ +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + return vsudot_lane_s32 (r, x, y, 0); +} + +/* +**sfooq_lane: +** ... +** vsudot\.u8 q0, q1, d4\[1\] +** bx lr +*/ +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + return vsudotq_lane_s32 (r, x, y, 1); +} + +/* +**usfoo_untied: +** ... +** vusdot\.s8 d1, d2, d3 +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_s32 (r, x, y); +} + +/* +**usfoo_lane_untied: +** ... +** vusdot.s8 d1, d2, d3\[0\] +** vmov d0, d1 @ v2si +** bx lr +*/ +int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y) +{ + return vusdot_lane_s32 (r, x, y, 0); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c new file mode 100644 index 00000000000..e14fe8f4433 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c @@ -0,0 +1,21 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include <arm_neon.h> + +/* Unsigned-Signed Dot Product instructions. */ + +int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y) +{ + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vusdot_lane_s32 (r, x, y, -1); +} + + +int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y) +{ + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vusdotq_lane_s32 (r, x, y, 2); +} diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c new file mode 100644 index 00000000000..fb7ebb484e1 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c @@ -0,0 +1,20 @@ +/* { dg-do assemble { target { arm*-*-* } } } */ +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */ +/* { dg-add-options arm_v8_2a_i8mm } */ +/* { dg-additional-options "--save-temps" } */ + +#include <arm_neon.h> + +/* Signed-Unsigned Dot Product instructions. */ + +int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y) +{ + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vsudot_lane_s32 (r, x, y, -1); +} + +int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y) +{ + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + return vsudotq_lane_s32 (r, x, y, 2); +} ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension 2020-02-10 13:36 ` Stam Markianos-Wright @ 2020-02-11 10:26 ` Kyrill Tkachov 2020-02-11 11:19 ` Stam Markianos-Wright 0 siblings, 1 reply; 9+ messages in thread From: Kyrill Tkachov @ 2020-02-11 10:26 UTC (permalink / raw) To: Stam Markianos-Wright, gcc-patches Cc: Richard Earnshaw, nickc, Ramana Radhakrishnan Hi Stam, On 2/10/20 1:35 PM, Stam Markianos-Wright wrote: > > > On 2/3/20 11:20 AM, Stam Markianos-Wright wrote: > > > > > > On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: > >> > >> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: > >>> > >>> > >>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: > >>>> > >>>> > >>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: > >>>>> > >>>>> > >>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot > product > >>>>>> operations (vector/by element) to the ARM back-end. > >>>>>> > >>>>>> These are: > >>>>>> usdot (vector), <us/su>dot (by element). > >>>>>> > >>>>>> The functions are optional from ARMv8.2-a as > -march=armv8.2-a+i8mm and > >>>>>> for ARM they remain optional as of ARMv8.6-a. > >>>>>> > >>>>>> The functions are declared in arm_neon.h, RTL patterns are > defined to > >>>>>> generate assembler and tests are added to verify and perform > adequate checks. > >>>>>> > >>>>>> Regression testing on arm-none-eabi passed successfully. > >>>>>> > >>>>>> This patch depends on: > >>>>>> > >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html > >>>>>> > >>>>>> for ARM CLI updates, and on: > >>>>>> > >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html > >>>>>> > >>>>>> for testsuite effective_target update. > >>>>>> > >>>>>> Ok for trunk? > >>>>> > >>>> > >>>> New diff addressing review comments from Aarch64 version of the > patch. > >>>> > >>>> _Change of order of operands in RTL patterns. > >>>> _Change tests to use check-function-bodies, compile with > optimisation and > >>>> check for exact registers. > >>>> _Rename tests to remove "-compile-" in filename. > >>>> > >>> > > .Ping! > > Ping :) > > Diff re-attached in this ping email is same as the one posted on 10/01 > > Thank you! Sorry for the delay. This is ok. Thanks, Kyrill > > . > >>> > >>> Cheers, > >>> Stam > >>> > >>>>>> > >>>>>> > >>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest > >>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest > >>>>>> > >>>>>> PS. I don't have commit rights, so if someone could commit on > my behalf, > >>>>>> that would be great :) > >>>>>> > >>>>>> > >>>>>> gcc/ChangeLog: > >>>>>> > >>>>>> 2019-11-28Â Stam Markianos-Wright <stam.markianos-wright@arm.com> > >>>>>> > >>>>>> Â Â Â Â Â * config/arm/arm-builtins.c (enum arm_type_qualifiers): > >>>>>> Â Â Â Â Â (USTERNOP_QUALIFIERS): New define. > >>>>>> Â Â Â Â Â (USMAC_LANE_QUADTUP_QUALIFIERS): New define. > >>>>>> Â Â Â Â Â (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. > >>>>>> Â Â Â Â Â (arm_expand_builtin_args): > >>>>>> Â Â Â Â Â Â Â Â Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. > >>>>>> Â Â Â Â Â (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. > >>>>>> Â Â Â Â Â * config/arm/arm_neon.h (vusdot_s32): New. > >>>>>> Â Â Â Â Â (vusdot_lane_s32): New. > >>>>>> Â Â Â Â Â (vusdotq_lane_s32): New. > >>>>>> Â Â Â Â Â (vsudot_lane_s32): New. > >>>>>> Â Â Â Â Â (vsudotq_lane_s32): New. > >>>>>> Â Â Â Â Â * config/arm/arm_neon_builtins.def > >>>>>> (usdot,usdot_lane,sudot_lane): New. > >>>>>> Â Â Â Â Â * config/arm/iterators.md (DOTPROD_I8MM): New. > >>>>>> Â Â Â Â Â Â Â Â (sup, opsuffix): Add <us/su>. > >>>>>> Â Â Â Â Â Â Â * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. > >>>>>> Â Â Â Â Â * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. > >>>>>> > >>>>>> > >>>>>> gcc/testsuite/ChangeLog: > >>>>>> > >>>>>> 2019-12-12Â Stam Markianos-Wright <stam.markianos-wright@arm.com> > >>>>>> > >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-1.c: New test. > >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-2.c: New test. > >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-3.c: New test. > >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-4.c: New test. > >>>>>> > >>>>>> > >>>> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension 2020-02-11 10:26 ` Kyrill Tkachov @ 2020-02-11 11:19 ` Stam Markianos-Wright 0 siblings, 0 replies; 9+ messages in thread From: Stam Markianos-Wright @ 2020-02-11 11:19 UTC (permalink / raw) To: Kyrill Tkachov, gcc-patches; +Cc: Richard Earnshaw, nickc, Ramana Radhakrishnan On 2/11/20 10:25 AM, Kyrill Tkachov wrote: > Hi Stam, > > On 2/10/20 1:35 PM, Stam Markianos-Wright wrote: >> >> >> On 2/3/20 11:20 AM, Stam Markianos-Wright wrote: >> > >> > >> > On 1/27/20 3:54 PM, Stam Markianos-Wright wrote: >> >> >> >> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote: >> >>> >> >>> >> >>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote: >> >>>> >> >>>> >> >>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote: >> >>>>> >> >>>>> >> >>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote: >> >>>>>> Hi all, >> >>>>>> >> >>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product >> >>>>>> operations (vector/by element) to the ARM back-end. >> >>>>>> >> >>>>>> These are: >> >>>>>> usdot (vector), <us/su>dot (by element). >> >>>>>> >> >>>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and >> >>>>>> for ARM they remain optional as of ARMv8.6-a. >> >>>>>> >> >>>>>> The functions are declared in arm_neon.h, RTL patterns are defined to >> >>>>>> generate assembler and tests are added to verify and perform adequate >> checks. >> >>>>>> >> >>>>>> Regression testing on arm-none-eabi passed successfully. >> >>>>>> >> >>>>>> This patch depends on: >> >>>>>> >> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html >> >>>>>> >> >>>>>> for ARM CLI updates, and on: >> >>>>>> >> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html >> >>>>>> >> >>>>>> for testsuite effective_target update. >> >>>>>> >> >>>>>> Ok for trunk? >> >>>>> >> >>>> >> >>>> New diff addressing review comments from Aarch64 version of the patch. >> >>>> >> >>>> _Change of order of operands in RTL patterns. >> >>>> _Change tests to use check-function-bodies, compile with optimisation and >> >>>> check for exact registers. >> >>>> _Rename tests to remove "-compile-" in filename. >> >>>> >> >>> >> > .Ping! >> >> Ping :) >> >> Diff re-attached in this ping email is same as the one posted on 10/01 >> >> Thank you! > > > Sorry for the delay. > > This is ok. No worries, thank you! Committed as r10-6575. Cheers, Stam > > Thanks, > > Kyrill > > >> > . >> >>> >> >>> Cheers, >> >>> Stam >> >>> >> >>>>>> >> >>>>>> >> >>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest >> >>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest >> >>>>>> >> >>>>>> PS. I don't have commit rights, so if someone could commit on my behalf, >> >>>>>> that would be great :) >> >>>>>> >> >>>>>> >> >>>>>> gcc/ChangeLog: >> >>>>>> >> >>>>>> 2019-11-28Â Stam Markianos-Wright <stam.markianos-wright@arm.com> >> >>>>>> >> >>>>>> Â Â Â Â Â * config/arm/arm-builtins.c (enum arm_type_qualifiers): >> >>>>>> Â Â Â Â Â (USTERNOP_QUALIFIERS): New define. >> >>>>>> Â Â Â Â Â (USMAC_LANE_QUADTUP_QUALIFIERS): New define. >> >>>>>> Â Â Â Â Â (SUMAC_LANE_QUADTUP_QUALIFIERS): New define. >> >>>>>> Â Â Â Â Â (arm_expand_builtin_args): >> >>>>>> Â Â Â Â Â Â Â Â Add case ARG_BUILTIN_LANE_QUADTUP_INDEX. >> >>>>>> Â Â Â Â Â (arm_expand_builtin_1): Add qualifier_lane_quadtup_index. >> >>>>>> Â Â Â Â Â * config/arm/arm_neon.h (vusdot_s32): New. >> >>>>>> Â Â Â Â Â (vusdot_lane_s32): New. >> >>>>>> Â Â Â Â Â (vusdotq_lane_s32): New. >> >>>>>> Â Â Â Â Â (vsudot_lane_s32): New. >> >>>>>> Â Â Â Â Â (vsudotq_lane_s32): New. >> >>>>>> Â Â Â Â Â * config/arm/arm_neon_builtins.def >> >>>>>> (usdot,usdot_lane,sudot_lane): New. >> >>>>>> Â Â Â Â Â * config/arm/iterators.md (DOTPROD_I8MM): New. >> >>>>>> Â Â Â Â Â Â Â Â (sup, opsuffix): Add <us/su>. >> >>>>>> Â Â Â Â Â Â Â * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New. >> >>>>>> Â Â Â Â Â * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New. >> >>>>>> >> >>>>>> >> >>>>>> gcc/testsuite/ChangeLog: >> >>>>>> >> >>>>>> 2019-12-12Â Stam Markianos-Wright <stam.markianos-wright@arm.com> >> >>>>>> >> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-1.c: New test. >> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-2.c: New test. >> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-3.c: New test. >> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-4.c: New test. >> >>>>>> >> >>>>>> >> >>>> ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-02-11 11:19 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-13 10:23 [GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension Stam Markianos-Wright 2019-12-18 13:26 ` [Ping][GCC][PATCH][ARM]Add " Stam Markianos-Wright 2020-01-10 19:25 ` Stam Markianos-Wright 2020-01-16 16:17 ` [Pingx2][GCC][PATCH][ARM]Add " Stam Markianos-Wright 2020-01-27 16:08 ` [Pingx3][GCC][PATCH][ARM]Add " Stam Markianos-Wright 2020-02-03 11:20 ` Stam Markianos-Wright 2020-02-10 13:36 ` Stam Markianos-Wright 2020-02-11 10:26 ` Kyrill Tkachov 2020-02-11 11:19 ` Stam Markianos-Wright
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).