* [GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
@ 2019-12-13 10:23 Stam Markianos-Wright
2019-12-18 13:26 ` [Ping][GCC][PATCH][ARM]Add " Stam Markianos-Wright
0 siblings, 1 reply; 9+ messages in thread
From: Stam Markianos-Wright @ 2019-12-13 10:23 UTC (permalink / raw)
To: gcc-patches
Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan,
Richard Sandiford
[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]
Hi all,
This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
operations (vector/by element) to the ARM back-end.
These are:
usdot (vector), <us/su>dot (by element).
The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
for ARM they remain optional as of ARMv8.6-a.
The functions are declared in arm_neon.h, RTL patterns are defined to
generate assembler and tests are added to verify and perform adequate
checks.
Regression testing on arm-none-eabi passed successfully.
This patch depends on:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
for ARM CLI updates, and on:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
for testsuite effective_target update.
Ok for trunk?
Cheers,
Stam
ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest
PS. I don't have commit rights, so if someone could commit on my behalf,
that would be great :)
gcc/ChangeLog:
2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com>
* config/arm/arm-builtins.c (enum arm_type_qualifiers):
(USTERNOP_QUALIFIERS): New define.
(USMAC_LANE_QUADTUP_QUALIFIERS): New define.
(SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
(arm_expand_builtin_args):
Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
(arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
* config/arm/arm_neon.h (vusdot_s32): New.
(vusdot_lane_s32): New.
(vusdotq_lane_s32): New.
(vsudot_lane_s32): New.
(vsudotq_lane_s32): New.
* config/arm/arm_neon_builtins.def
(usdot,usdot_lane,sudot_lane): New.
* config/arm/iterators.md (DOTPROD_I8MM): New.
(sup, opsuffix): Add <us/su>.
* config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
* config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
gcc/testsuite/ChangeLog:
2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com>
* gcc.target/arm/simd/vdot-compile-2-1.c: New test.
* gcc.target/arm/simd/vdot-compile-2-2.c: New test.
* gcc.target/arm/simd/vdot-compile-2-3.c: New test.
* gcc.target/arm/simd/vdot-compile-2-4.c: New test.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: i8mm-us-su-dot-arm.patch --]
[-- Type: text/x-patch; name="i8mm-us-su-dot-arm.patch", Size: 15615 bytes --]
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2d902d0b325bc1fe5e22831ef8a59a2bb37c1225..a63c1a978fb1d436065ce9f5f082249c4ebf5ade 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -86,7 +86,10 @@ enum arm_type_qualifiers
qualifier_const_void_pointer = 0x802,
/* Lane indices selected in pairs - must be within range of previous
argument = a vector. */
- qualifier_lane_pair_index = 0x1000
+ qualifier_lane_pair_index = 0x1000,
+ /* Lane indices selected in quadtuplets - must be within range of previous
+ argument = a vector. */
+ qualifier_lane_quadtup_index = 0x2000
};
/* The qualifier_internal allows generation of a unary builtin from
@@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
qualifier_unsigned };
#define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers)
+/* T (T, unsigned T, T). */
+static enum arm_type_qualifiers
+arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_unsigned,
+ qualifier_none };
+#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers)
+
/* T (T, immediate). */
static enum arm_type_qualifiers
arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
qualifier_unsigned, qualifier_lane_index };
#define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers)
+/* T (T, unsigned T, T, lane index). */
+static enum arm_type_qualifiers
+arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_unsigned,
+ qualifier_none, qualifier_lane_quadtup_index };
+#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers)
+
+/* T (T, T, unsigend T, lane index). */
+static enum arm_type_qualifiers
+arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_none,
+ qualifier_unsigned, qualifier_lane_quadtup_index };
+#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers)
+
/* T (T, T, immediate). */
static enum arm_type_qualifiers
arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2148,6 +2172,7 @@ typedef enum {
ARG_BUILTIN_LANE_INDEX,
ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX,
ARG_BUILTIN_LANE_PAIR_INDEX,
+ ARG_BUILTIN_LANE_QUADTUP_INDEX,
ARG_BUILTIN_NEON_MEMORY,
ARG_BUILTIN_MEMORY,
ARG_BUILTIN_STOP
@@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
if (CONST_INT_P (op[argc]))
{
machine_mode vmode = mode[argc - 1];
- neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp);
+ neon_lane_bounds (op[argc], 0,
+ GET_MODE_NUNITS (vmode) / 2, exp);
+ }
+ /* If the lane index isn't a constant then error out. */
+ goto constant_arg;
+
+ case ARG_BUILTIN_LANE_QUADTUP_INDEX:
+ /* Previous argument must be a vector, which this indexes. */
+ gcc_assert (argc > 0);
+ if (CONST_INT_P (op[argc]))
+ {
+ machine_mode vmode = mode[argc - 1];
+ neon_lane_bounds (op[argc], 0,
+ GET_MODE_NUNITS (vmode) / 4, exp);
}
- /* If the lane index isn't a constant then the next
- case will error. */
- /* Fall through. */
+ /* If the lane index isn't a constant then error out. */
+ goto constant_arg;
+
case ARG_BUILTIN_CONSTANT:
constant_arg:
if (!(*insn_data[icode].operand[opno].predicate)
@@ -2464,6 +2502,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
args[k] = ARG_BUILTIN_LANE_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index)
args[k] = ARG_BUILTIN_LANE_PAIR_INDEX;
+ else if (d->qualifiers[qualifiers_k] & qualifier_lane_quadtup_index)
+ args[k] = ARG_BUILTIN_LANE_QUADTUP_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_immediate)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 1f200d491d1de3993bc3a682d586da137958ff6b..53602773a341535bfc9ff16dc4ac8f2b999df2ad 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18738,6 +18738,52 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
}
+
+/* AdvSIMD Matrix Multiply-Accumulate and Dot Product intrinsics. */
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
+{
+ return __builtin_neon_usdotv8qi_ssus (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a,
+ int8x8_t __b, const int __index)
+{
+ return __builtin_neon_usdot_lanev8qi_ssuss (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdotq_lane_s32 (int32x4_t __r, uint8x16_t __a,
+ int8x8_t __b, const int __index)
+{
+ return __builtin_neon_usdot_lanev16qi_ssuss (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vsudot_lane_s32 (int32x2_t __r, int8x8_t __a,
+ uint8x8_t __b, const int __index)
+{
+ return __builtin_neon_sudot_lanev8qi_sssus (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a,
+ uint8x8_t __b, const int __index)
+{
+ return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+
#pragma GCC pop_options
#endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index bcccf93f7fa2750e9006e5856efecbec0fb331b9..7af85ee27bc84b4229633a0337b550e4ef2ec14b 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -352,6 +352,10 @@ VAR2 (UTERNOP, udot, v8qi, v16qi)
VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
+VAR1 (USTERNOP, usdot, v8qi)
+VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
+VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi)
+
VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index c412851843f4468c2c18bce264288705e076ac50..e58c706f9fb63271d1aadb1498c0b32674838f46 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -466,6 +466,8 @@
(define_int_iterator DOTPROD [UNSPEC_DOT_S UNSPEC_DOT_U])
+(define_int_iterator DOTPROD_I8MM [UNSPEC_DOT_US UNSPEC_DOT_SU])
+
(define_int_iterator VFMLHALVES [UNSPEC_VFML_LO UNSPEC_VFML_HI])
(define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270])
@@ -920,6 +922,7 @@
(UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
(UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
(UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
+ (UNSPEC_DOT_US "us") (UNSPEC_DOT_SU "su")
(UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u")
])
@@ -1151,6 +1154,9 @@
(define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")])
(define_int_attr opsuffix [(UNSPEC_DOT_S "s8")
- (UNSPEC_DOT_U "u8")])
+ (UNSPEC_DOT_U "u8")
+ (UNSPEC_DOT_US "s8")
+ (UNSPEC_DOT_SU "u8")
+ ])
(define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a0ee28efc9aa9f1fba7b5ae031564f40aa095fe..7de31220d5d0712269137bc2c64d90ec9bfdcb2c 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3279,6 +3279,20 @@
[(set_attr "type" "neon_dot<q>")]
)
+;; These instructions map to the __builtins for the Dot Product operations.
+(define_insn "neon_usdot<vsi2qi>"
+ [(set (match_operand:VCVTI 0 "register_operand" "=w")
+ (plus:VCVTI (match_operand:VCVTI 1 "register_operand" "0")
+ (unspec:VCVTI [(match_operand:<VSI2QI> 2
+ "register_operand" "w")
+ (match_operand:<VSI2QI> 3
+ "register_operand" "w")]
+ UNSPEC_DOT_US)))]
+ "TARGET_I8MM"
+ "vusdot.s8\\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
+ [(set_attr "type" "neon_dot<q>")]
+)
+
;; These instructions map to the __builtins for the Dot Product
;; indexed operations.
(define_insn "neon_<sup>dot_lane<vsi2qi>"
@@ -3298,6 +3312,24 @@
[(set_attr "type" "neon_dot<q>")]
)
+;; These instructions map to the __builtins for the Dot Product
+;; indexed operations in the v8.6 I8MM extension.
+(define_insn "neon_<sup>dot_lane<vsi2qi>"
+ [(set (match_operand:VCVTI 0 "register_operand" "=w")
+ (plus:VCVTI (match_operand:VCVTI 1 "register_operand" "0")
+ (unspec:VCVTI [(match_operand:<VSI2QI> 2
+ "register_operand" "w")
+ (match_operand:V8QI 3 "register_operand" "t")
+ (match_operand:SI 4 "immediate_operand" "i")]
+ DOTPROD_I8MM)))]
+ "TARGET_I8MM"
+ {
+ operands[4] = GEN_INT (INTVAL (operands[4]));
+ return "v<sup>dot.<opsuffix>\\t%<V_reg>0, %<V_reg>2, %P3[%c4]";
+ }
+ [(set_attr "type" "neon_dot<q>")]
+)
+
;; These expands map to the Dot Product optab the vectorizer checks for.
;; The auto-vectorizer expects a dot product builtin that also does an
;; accumulation into the provided register.
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b4196b0e5cd939c3ee5e3f9bd19622fcc963adae..837471f49543d4faa8614bd54f2db8d37991c443 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -485,6 +485,8 @@
UNSPEC_VRNDX
UNSPEC_DOT_S
UNSPEC_DOT_U
+ UNSPEC_DOT_US
+ UNSPEC_DOT_SU
UNSPEC_VFML_LO
UNSPEC_VFML_HI
UNSPEC_VCADD90
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-1.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..862cf3211e71cbf8127f2b0f141c206676bf9bdb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-1.c
@@ -0,0 +1,42 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "--save-temps" } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
+
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ return vusdotq_lane_s32 (r, x, y, 1);
+}
+
+/* Signed-Unsigned Dot Product instructions. */
+
+
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ return vsudot_lane_s32 (r, x, y, 0);
+}
+
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ return vsudotq_lane_s32 (r, x, y, 1);
+}
+
+/* { dg-final { scan-assembler {vusdot\.s8\td[0-9]+, d[0-9]+, d[0-9]+} } } */
+/* { dg-final { scan-assembler {vusdot\.s8\td[0-9]+, d[0-9]+, d[0-9]+\[#?0\]} } } */
+/* { dg-final { scan-assembler {vusdot\.s8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?1\]} } } */
+/* { dg-final { scan-assembler {vsudot\.u8\td[0-9]+, d[0-9]+, d[0-9]+\[#?0\]} } } */
+/* { dg-final { scan-assembler {vsudot\.u8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?1\]} } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-2.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..91ecb073fdb5bd4523c9b1e62aed03de5adb820d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-2.c
@@ -0,0 +1,42 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "-mbig-endian --save-temps" } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
+
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ return vusdotq_lane_s32 (r, x, y, 1);
+}
+
+/* Signed-Unsigned Dot Product instructions. */
+
+
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ return vsudot_lane_s32 (r, x, y, 0);
+}
+
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ return vsudotq_lane_s32 (r, x, y, 1);
+}
+
+/* { dg-final { scan-assembler {vusdot\.s8\td[0-9]+, d[0-9]+, d[0-9]+} } } */
+/* { dg-final { scan-assembler {vusdot\.s8\td[0-9]+, d[0-9]+, d[0-9]+\[#?0\]} } } */
+/* { dg-final { scan-assembler {vusdot\.s8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?1\]} } } */
+/* { dg-final { scan-assembler {vsudot\.u8\td[0-9]+, d[0-9]+, d[0-9]+\[#?0\]} } } */
+/* { dg-final { scan-assembler {vsudot\.u8\tq[0-9]+, q[0-9]+, d[0-9]+\[#?1\]} } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-3.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..e14fe8f4433c9bf4c3347ebf728157bdb54861b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-3.c
@@ -0,0 +1,21 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "--save-temps" } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vusdot_lane_s32 (r, x, y, -1);
+}
+
+
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vusdotq_lane_s32 (r, x, y, 2);
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-4.c b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..fb7ebb484e1778a1d06611f8c8a639d4c0dcb9a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-compile-2-4.c
@@ -0,0 +1,20 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "--save-temps" } */
+
+#include <arm_neon.h>
+
+/* Signed-Unsigned Dot Product instructions. */
+
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vsudot_lane_s32 (r, x, y, -1);
+}
+
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vsudotq_lane_s32 (r, x, y, 2);
+}
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Ping][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
2019-12-13 10:23 [GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension Stam Markianos-Wright
@ 2019-12-18 13:26 ` Stam Markianos-Wright
2020-01-10 19:25 ` Stam Markianos-Wright
0 siblings, 1 reply; 9+ messages in thread
From: Stam Markianos-Wright @ 2019-12-18 13:26 UTC (permalink / raw)
To: gcc-patches
Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan,
Richard Sandiford
On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
> Hi all,
>
> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
> operations (vector/by element) to the ARM back-end.
>
> These are:
> usdot (vector), <us/su>dot (by element).
>
> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
> for ARM they remain optional as of ARMv8.6-a.
>
> The functions are declared in arm_neon.h, RTL patterns are defined to
> generate assembler and tests are added to verify and perform adequate
> checks.
>
> Regression testing on arm-none-eabi passed successfully.
>
> This patch depends on:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>
> for ARM CLI updates, and on:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>
> for testsuite effective_target update.
>
> Ok for trunk?
.Ping :)
>
> Cheers,
> Stam
>
>
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>
> PS. I don't have commit rights, so if someone could commit on my behalf,
> that would be great :)
>
>
> gcc/ChangeLog:
>
> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>
> * config/arm/arm-builtins.c (enum arm_type_qualifiers):
> (USTERNOP_QUALIFIERS): New define.
> (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
> (arm_expand_builtin_args):
> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
> * config/arm/arm_neon.h (vusdot_s32): New.
> (vusdot_lane_s32): New.
> (vusdotq_lane_s32): New.
> (vsudot_lane_s32): New.
> (vsudotq_lane_s32): New.
> * config/arm/arm_neon_builtins.def
> (usdot,usdot_lane,sudot_lane): New.
> * config/arm/iterators.md (DOTPROD_I8MM): New.
> (sup, opsuffix): Add <us/su>.
> * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>
>
> gcc/testsuite/ChangeLog:
>
> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>
> * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
> * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
> * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
> * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Ping][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
2019-12-18 13:26 ` [Ping][GCC][PATCH][ARM]Add " Stam Markianos-Wright
@ 2020-01-10 19:25 ` Stam Markianos-Wright
2020-01-16 16:17 ` [Pingx2][GCC][PATCH][ARM]Add " Stam Markianos-Wright
0 siblings, 1 reply; 9+ messages in thread
From: Stam Markianos-Wright @ 2020-01-10 19:25 UTC (permalink / raw)
To: gcc-patches
Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan,
Richard Sandiford
[-- Attachment #1: Type: text/plain, Size: 3030 bytes --]
On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>
>
> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>> operations (vector/by element) to the ARM back-end.
>>
>> These are:
>> usdot (vector), <us/su>dot (by element).
>>
>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>> for ARM they remain optional as of ARMv8.6-a.
>>
>> The functions are declared in arm_neon.h, RTL patterns are defined to
>> generate assembler and tests are added to verify and perform adequate checks.
>>
>> Regression testing on arm-none-eabi passed successfully.
>>
>> This patch depends on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>
>> for ARM CLI updates, and on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>
>> for testsuite effective_target update.
>>
>> Ok for trunk?
>
> .Ping :)
>
Ping :)
New diff addressing review comments from Aarch64 version of the patch.
_Change of order of operands in RTL patterns.
_Change tests to use check-function-bodies, compile with optimisation and check
for exact registers.
_Rename tests to remove "-compile-" in filename.
>>
>> Cheers,
>> Stam
>>
>>
>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>
>> PS. I don't have commit rights, so if someone could commit on my behalf,
>> that would be great :)
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>
>> * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>> (USTERNOP_QUALIFIERS): New define.
>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>> (arm_expand_builtin_args):
>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>> * config/arm/arm_neon.h (vusdot_s32): New.
>> (vusdot_lane_s32): New.
>> (vusdotq_lane_s32): New.
>> (vsudot_lane_s32): New.
>> (vsudotq_lane_s32): New.
>> * config/arm/arm_neon_builtins.def
>> (usdot,usdot_lane,sudot_lane): New.
>> * config/arm/iterators.md (DOTPROD_I8MM): New.
>> (sup, opsuffix): Add <us/su>.
>> * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>
>> * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
>> * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
>> * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
>> * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
>>
>>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: I8MM-32-final.patch --]
[-- Type: text/x-patch; name="I8MM-32-final.patch", Size: 15884 bytes --]
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index df84560588a..1b4316d0e93 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -86,7 +86,10 @@ enum arm_type_qualifiers
qualifier_const_void_pointer = 0x802,
/* Lane indices selected in pairs - must be within range of previous
argument = a vector. */
- qualifier_lane_pair_index = 0x1000
+ qualifier_lane_pair_index = 0x1000,
+ /* Lane indices selected in quadtuplets - must be within range of previous
+ argument = a vector. */
+ qualifier_lane_quadtup_index = 0x2000
};
/* The qualifier_internal allows generation of a unary builtin from
@@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
qualifier_unsigned };
#define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers)
+/* T (T, unsigned T, T). */
+static enum arm_type_qualifiers
+arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_unsigned,
+ qualifier_none };
+#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers)
+
/* T (T, immediate). */
static enum arm_type_qualifiers
arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
qualifier_unsigned, qualifier_lane_index };
#define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers)
+/* T (T, unsigned T, T, lane index). */
+static enum arm_type_qualifiers
+arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_unsigned,
+ qualifier_none, qualifier_lane_quadtup_index };
+#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers)
+
+/* T (T, T, unsigend T, lane index). */
+static enum arm_type_qualifiers
+arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_none,
+ qualifier_unsigned, qualifier_lane_quadtup_index };
+#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers)
+
/* T (T, T, immediate). */
static enum arm_type_qualifiers
arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2148,6 +2172,7 @@ typedef enum {
ARG_BUILTIN_LANE_INDEX,
ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX,
ARG_BUILTIN_LANE_PAIR_INDEX,
+ ARG_BUILTIN_LANE_QUADTUP_INDEX,
ARG_BUILTIN_NEON_MEMORY,
ARG_BUILTIN_MEMORY,
ARG_BUILTIN_STOP
@@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
if (CONST_INT_P (op[argc]))
{
machine_mode vmode = mode[argc - 1];
- neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp);
+ neon_lane_bounds (op[argc], 0,
+ GET_MODE_NUNITS (vmode) / 2, exp);
+ }
+ /* If the lane index isn't a constant then error out. */
+ goto constant_arg;
+
+ case ARG_BUILTIN_LANE_QUADTUP_INDEX:
+ /* Previous argument must be a vector, which this indexes. */
+ gcc_assert (argc > 0);
+ if (CONST_INT_P (op[argc]))
+ {
+ machine_mode vmode = mode[argc - 1];
+ neon_lane_bounds (op[argc], 0,
+ GET_MODE_NUNITS (vmode) / 4, exp);
}
- /* If the lane index isn't a constant then the next
- case will error. */
- /* Fall through. */
+ /* If the lane index isn't a constant then error out. */
+ goto constant_arg;
+
case ARG_BUILTIN_CONSTANT:
constant_arg:
if (!(*insn_data[icode].operand[opno].predicate)
@@ -2464,6 +2502,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
args[k] = ARG_BUILTIN_LANE_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index)
args[k] = ARG_BUILTIN_LANE_PAIR_INDEX;
+ else if (d->qualifiers[qualifiers_k] & qualifier_lane_quadtup_index)
+ args[k] = ARG_BUILTIN_LANE_QUADTUP_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_immediate)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index db8db53614a..ede89ec2c64 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18738,6 +18738,52 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
}
+
+/* AdvSIMD Matrix Multiply-Accumulate and Dot Product intrinsics. */
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
+{
+ return __builtin_neon_usdotv8qi_ssus (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a,
+ int8x8_t __b, const int __index)
+{
+ return __builtin_neon_usdot_lanev8qi_ssuss (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdotq_lane_s32 (int32x4_t __r, uint8x16_t __a,
+ int8x8_t __b, const int __index)
+{
+ return __builtin_neon_usdot_lanev16qi_ssuss (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vsudot_lane_s32 (int32x2_t __r, int8x8_t __a,
+ uint8x8_t __b, const int __index)
+{
+ return __builtin_neon_sudot_lanev8qi_sssus (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a,
+ uint8x8_t __b, const int __index)
+{
+ return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+
#pragma GCC pop_options
#endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index e9ff4e501cb..b4537ff5de9 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -352,6 +352,10 @@ VAR2 (UTERNOP, udot, v8qi, v16qi)
VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
+VAR1 (USTERNOP, usdot, v8qi)
+VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
+VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi)
+
VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 7da8b74abc0..afea7f823e0 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -466,6 +466,8 @@
(define_int_iterator DOTPROD [UNSPEC_DOT_S UNSPEC_DOT_U])
+(define_int_iterator DOTPROD_I8MM [UNSPEC_DOT_US UNSPEC_DOT_SU])
+
(define_int_iterator VFMLHALVES [UNSPEC_VFML_LO UNSPEC_VFML_HI])
(define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270])
@@ -920,6 +922,7 @@
(UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
(UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
(UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
+ (UNSPEC_DOT_US "us") (UNSPEC_DOT_SU "su")
(UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u")
])
@@ -1151,6 +1154,9 @@
(define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")])
(define_int_attr opsuffix [(UNSPEC_DOT_S "s8")
- (UNSPEC_DOT_U "u8")])
+ (UNSPEC_DOT_U "u8")
+ (UNSPEC_DOT_US "s8")
+ (UNSPEC_DOT_SU "u8")
+ ])
(define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index dace9470c41..8b83cba8fb7 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3279,6 +3279,20 @@
[(set_attr "type" "neon_dot<q>")]
)
+;; These instructions map to the __builtins for the Dot Product operations.
+(define_insn "neon_usdot<vsi2qi>"
+ [(set (match_operand:VCVTI 0 "register_operand" "=w")
+ (plus:VCVTI
+ (unspec:VCVTI
+ [(match_operand:<VSI2QI> 2 "register_operand" "w")
+ (match_operand:<VSI2QI> 3 "register_operand" "w")]
+ UNSPEC_DOT_US)
+ (match_operand:VCVTI 1 "register_operand" "0")))]
+ "TARGET_I8MM"
+ "vusdot.s8\\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
+ [(set_attr "type" "neon_dot<q>")]
+)
+
;; These instructions map to the __builtins for the Dot Product
;; indexed operations.
(define_insn "neon_<sup>dot_lane<vsi2qi>"
@@ -3298,6 +3312,25 @@
[(set_attr "type" "neon_dot<q>")]
)
+;; These instructions map to the __builtins for the Dot Product
+;; indexed operations in the v8.6 I8MM extension.
+(define_insn "neon_<sup>dot_lane<vsi2qi>"
+ [(set (match_operand:VCVTI 0 "register_operand" "=w")
+ (plus:VCVTI
+ (unspec:VCVTI
+ [(match_operand:<VSI2QI> 2 "register_operand" "w")
+ (match_operand:V8QI 3 "register_operand" "t")
+ (match_operand:SI 4 "immediate_operand" "i")]
+ DOTPROD_I8MM)
+ (match_operand:VCVTI 1 "register_operand" "0")))]
+ "TARGET_I8MM"
+ {
+ operands[4] = GEN_INT (INTVAL (operands[4]));
+ return "v<sup>dot.<opsuffix>\\t%<V_reg>0, %<V_reg>2, %P3[%c4]";
+ }
+ [(set_attr "type" "neon_dot<q>")]
+)
+
;; These expands map to the Dot Product optab the vectorizer checks for.
;; The auto-vectorizer expects a dot product builtin that also does an
;; accumulation into the provided register.
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index ade6b1af994..0aaff3b4bfc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -485,6 +485,8 @@
UNSPEC_VRNDX
UNSPEC_DOT_S
UNSPEC_DOT_U
+ UNSPEC_DOT_US
+ UNSPEC_DOT_SU
UNSPEC_VFML_LO
UNSPEC_VFML_HI
UNSPEC_VCADD90
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c
new file mode 100644
index 00000000000..4d5f07b771b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c
@@ -0,0 +1,91 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "-O -save-temps" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+/*
+**usfoo:
+** ...
+** vusdot\.s8 d0, d1, d2
+** bx lr
+*/
+int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+/*
+**usfoo_lane:
+** ...
+** vusdot\.s8 d0, d1, d2\[0\]
+** bx lr
+*/
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
+
+/*
+**usfooq_lane:
+** ...
+** vusdot\.s8 q0, q1, d4\[1\]
+** bx lr
+*/
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ return vusdotq_lane_s32 (r, x, y, 1);
+}
+
+/* Signed-Unsigned Dot Product instructions. */
+
+/*
+**sfoo_lane:
+** ...
+** vsudot\.u8 d0, d1, d2\[0\]
+** bx lr
+*/
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ return vsudot_lane_s32 (r, x, y, 0);
+}
+
+/*
+**sfooq_lane:
+** ...
+** vsudot\.u8 q0, q1, d4\[1\]
+** bx lr
+*/
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ return vsudotq_lane_s32 (r, x, y, 1);
+}
+
+/*
+**usfoo_untied:
+** ...
+** vusdot\.s8 d1, d2, d3
+** vmov d0, d1 @ v2si
+** bx lr
+*/
+int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+/*
+**usfoo_lane_untied:
+** ...
+** vusdot.s8 d1, d2, d3\[0\]
+** vmov d0, d1 @ v2si
+** bx lr
+*/
+int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c
new file mode 100644
index 00000000000..b7b76e27486
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c
@@ -0,0 +1,90 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "-O -save-temps -mbig-endian" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+/*
+**usfoo:
+** ...
+** vusdot\.s8 d0, d1, d2
+** bx lr
+*/
+int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+/*
+**usfoo_lane:
+** ...
+** vusdot\.s8 d0, d1, d2\[0\]
+** bx lr
+*/
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
+
+/*
+**usfooq_lane:
+** ...
+** vusdot\.s8 q0, q1, d4\[1\]
+** bx lr
+*/
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ return vusdotq_lane_s32 (r, x, y, 1);
+}
+
+/* Signed-Unsigned Dot Product instructions. */
+
+/*
+**sfoo_lane:
+** ...
+** vsudot\.u8 d0, d1, d2\[0\]
+** bx lr
+*/
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ return vsudot_lane_s32 (r, x, y, 0);
+}
+
+/*
+**sfooq_lane:
+** ...
+** vsudot\.u8 q0, q1, d4\[1\]
+** bx lr
+*/
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ return vsudotq_lane_s32 (r, x, y, 1);
+}
+
+/*
+**usfoo_untied:
+** ...
+** vusdot\.s8 d1, d2, d3
+** vmov d0, d1 @ v2si
+** bx lr
+*/
+int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+/*
+**usfoo_lane_untied:
+** ...
+** vusdot.s8 d1, d2, d3\[0\]
+** vmov d0, d1 @ v2si
+** bx lr
+*/
+int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c
new file mode 100644
index 00000000000..e14fe8f4433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c
@@ -0,0 +1,21 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "--save-temps" } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vusdot_lane_s32 (r, x, y, -1);
+}
+
+
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vusdotq_lane_s32 (r, x, y, 2);
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c
new file mode 100644
index 00000000000..fb7ebb484e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c
@@ -0,0 +1,20 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "--save-temps" } */
+
+#include <arm_neon.h>
+
+/* Signed-Unsigned Dot Product instructions. */
+
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vsudot_lane_s32 (r, x, y, -1);
+}
+
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vsudotq_lane_s32 (r, x, y, 2);
+}
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Pingx2][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
2020-01-10 19:25 ` Stam Markianos-Wright
@ 2020-01-16 16:17 ` Stam Markianos-Wright
2020-01-27 16:08 ` [Pingx3][GCC][PATCH][ARM]Add " Stam Markianos-Wright
0 siblings, 1 reply; 9+ messages in thread
From: Stam Markianos-Wright @ 2020-01-16 16:17 UTC (permalink / raw)
To: gcc-patches
Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan,
Richard Sandiford
On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
>
>
> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>>
>>
>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>>> Hi all,
>>>
>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>>> operations (vector/by element) to the ARM back-end.
>>>
>>> These are:
>>> usdot (vector), <us/su>dot (by element).
>>>
>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>>> for ARM they remain optional as of ARMv8.6-a.
>>>
>>> The functions are declared in arm_neon.h, RTL patterns are defined to
>>> generate assembler and tests are added to verify and perform adequate checks.
>>>
>>> Regression testing on arm-none-eabi passed successfully.
>>>
>>> This patch depends on:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>>
>>> for ARM CLI updates, and on:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>
>>> for testsuite effective_target update.
>>>
>>> Ok for trunk?
>>
>> .Ping :)
>>
> Ping :)
>
> New diff addressing review comments from Aarch64 version of the patch.
>
> _Change of order of operands in RTL patterns.
> _Change tests to use check-function-bodies, compile with optimisation and check
> for exact registers.
> _Rename tests to remove "-compile-" in filename.
>
Ping!
Cheers,
Stam
>>>
>>> Cheers,
>>> Stam
>>>
>>>
>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>
>>> PS. I don't have commit rights, so if someone could commit on my behalf,
>>> that would be great :)
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>
>>> * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>> (USTERNOP_QUALIFIERS): New define.
>>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>> (arm_expand_builtin_args):
>>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>> * config/arm/arm_neon.h (vusdot_s32): New.
>>> (vusdot_lane_s32): New.
>>> (vusdotq_lane_s32): New.
>>> (vsudot_lane_s32): New.
>>> (vsudotq_lane_s32): New.
>>> * config/arm/arm_neon_builtins.def
>>> (usdot,usdot_lane,sudot_lane): New.
>>> * config/arm/iterators.md (DOTPROD_I8MM): New.
>>> (sup, opsuffix): Add <us/su>.
>>> * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
>>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>>
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>
>>> * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
>>> * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
>>> * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
>>> * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
>>>
>>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
2020-01-16 16:17 ` [Pingx2][GCC][PATCH][ARM]Add " Stam Markianos-Wright
@ 2020-01-27 16:08 ` Stam Markianos-Wright
2020-02-03 11:20 ` Stam Markianos-Wright
0 siblings, 1 reply; 9+ messages in thread
From: Stam Markianos-Wright @ 2020-01-27 16:08 UTC (permalink / raw)
To: gcc-patches
Cc: Richard Earnshaw, Kyrylo Tkachov, nickc, Ramana Radhakrishnan,
Richard Sandiford
On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
>
>
> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
>>
>>
>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>>>
>>>
>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>>>> Hi all,
>>>>
>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>>>> operations (vector/by element) to the ARM back-end.
>>>>
>>>> These are:
>>>> usdot (vector), <us/su>dot (by element).
>>>>
>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>>>> for ARM they remain optional as of ARMv8.6-a.
>>>>
>>>> The functions are declared in arm_neon.h, RTL patterns are defined to
>>>> generate assembler and tests are added to verify and perform adequate checks.
>>>>
>>>> Regression testing on arm-none-eabi passed successfully.
>>>>
>>>> This patch depends on:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>>>
>>>> for ARM CLI updates, and on:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>>
>>>> for testsuite effective_target update.
>>>>
>>>> Ok for trunk?
>>>
>>> .Ping :)
>>>
>> Ping :)
>>
>> New diff addressing review comments from Aarch64 version of the patch.
>>
>> _Change of order of operands in RTL patterns.
>> _Change tests to use check-function-bodies, compile with optimisation and
>> check for exact registers.
>> _Rename tests to remove "-compile-" in filename.
>>
>
> Ping!
>
> Cheers,
> Stam
>
>>>>
>>>> Cheers,
>>>> Stam
>>>>
>>>>
>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>>
>>>> PS. I don't have commit rights, so if someone could commit on my behalf,
>>>> that would be great :)
>>>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>>
>>>> * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>>> (USTERNOP_QUALIFIERS): New define.
>>>> (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>> (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>> (arm_expand_builtin_args):
>>>> Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>>> (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>>> * config/arm/arm_neon.h (vusdot_s32): New.
>>>> (vusdot_lane_s32): New.
>>>> (vusdotq_lane_s32): New.
>>>> (vsudot_lane_s32): New.
>>>> (vsudotq_lane_s32): New.
>>>> * config/arm/arm_neon_builtins.def
>>>> (usdot,usdot_lane,sudot_lane): New.
>>>> * config/arm/iterators.md (DOTPROD_I8MM): New.
>>>> (sup, opsuffix): Add <us/su>.
>>>> * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
>>>> * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>>>
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>>
>>>> * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
>>>> * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
>>>> * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
>>>> * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
>>>>
>>>>
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
2020-01-27 16:08 ` [Pingx3][GCC][PATCH][ARM]Add " Stam Markianos-Wright
@ 2020-02-03 11:20 ` Stam Markianos-Wright
2020-02-10 13:36 ` Stam Markianos-Wright
0 siblings, 1 reply; 9+ messages in thread
From: Stam Markianos-Wright @ 2020-02-03 11:20 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Earnshaw, kyrylo.tkachov, nickc, ramana.radhakrishnan
On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:
>
> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
>>
>>
>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
>>>
>>>
>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>>>>
>>>>
>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>>>>> Hi all,
>>>>>
>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>>>>> operations (vector/by element) to the ARM back-end.
>>>>>
>>>>> These are:
>>>>> usdot (vector), <us/su>dot (by element).
>>>>>
>>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>>>>> for ARM they remain optional as of ARMv8.6-a.
>>>>>
>>>>> The functions are declared in arm_neon.h, RTL patterns are defined to
>>>>> generate assembler and tests are added to verify and perform adequate checks.
>>>>>
>>>>> Regression testing on arm-none-eabi passed successfully.
>>>>>
>>>>> This patch depends on:
>>>>>
>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>>>>
>>>>> for ARM CLI updates, and on:
>>>>>
>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>>>
>>>>> for testsuite effective_target update.
>>>>>
>>>>> Ok for trunk?
>>>>
>>>
>>> New diff addressing review comments from Aarch64 version of the patch.
>>>
>>> _Change of order of operands in RTL patterns.
>>> _Change tests to use check-function-bodies, compile with optimisation and
>>> check for exact registers.
>>> _Rename tests to remove "-compile-" in filename.
>>>
>>
.Ping!
.
>>
>> Cheers,
>> Stam
>>
>>>>>
>>>>>
>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>>>
>>>>> PS. I don't have commit rights, so if someone could commit on my behalf,
>>>>> that would be great :)
>>>>>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>>>
>>>>> Â Â Â Â Â * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>>>> Â Â Â Â Â (USTERNOP_QUALIFIERS): New define.
>>>>> Â Â Â Â Â (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>>> Â Â Â Â Â (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>>> Â Â Â Â Â (arm_expand_builtin_args):
>>>>> Â Â Â Â Â Â Â Â Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>>>> Â Â Â Â Â (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>>>> Â Â Â Â Â * config/arm/arm_neon.h (vusdot_s32): New.
>>>>> Â Â Â Â Â (vusdot_lane_s32): New.
>>>>> Â Â Â Â Â (vusdotq_lane_s32): New.
>>>>> Â Â Â Â Â (vsudot_lane_s32): New.
>>>>> Â Â Â Â Â (vsudotq_lane_s32): New.
>>>>> Â Â Â Â Â * config/arm/arm_neon_builtins.def
>>>>> Â Â Â Â Â Â Â Â (usdot,usdot_lane,sudot_lane): New.
>>>>> Â Â Â Â Â * config/arm/iterators.md (DOTPROD_I8MM): New.
>>>>> Â Â Â Â Â Â Â Â (sup, opsuffix): Add <us/su>.
>>>>> Â Â Â Â Â Â Â * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
>>>>> Â Â Â Â Â * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>>>>
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>>>
>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
>>>>>
>>>>>
>>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
2020-02-03 11:20 ` Stam Markianos-Wright
@ 2020-02-10 13:36 ` Stam Markianos-Wright
2020-02-11 10:26 ` Kyrill Tkachov
0 siblings, 1 reply; 9+ messages in thread
From: Stam Markianos-Wright @ 2020-02-10 13:36 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Earnshaw, kyrylo.tkachov, nickc, ramana.radhakrishnan
[-- Attachment #1: Type: text/plain, Size: 3822 bytes --]
On 2/3/20 11:20 AM, Stam Markianos-Wright wrote:
>
>
> On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:
>>
>> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
>>>
>>>
>>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
>>>>
>>>>
>>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>>>>>
>>>>>
>>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>>>>>> operations (vector/by element) to the ARM back-end.
>>>>>>
>>>>>> These are:
>>>>>> usdot (vector), <us/su>dot (by element).
>>>>>>
>>>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>>>>>> for ARM they remain optional as of ARMv8.6-a.
>>>>>>
>>>>>> The functions are declared in arm_neon.h, RTL patterns are defined to
>>>>>> generate assembler and tests are added to verify and perform adequate checks.
>>>>>>
>>>>>> Regression testing on arm-none-eabi passed successfully.
>>>>>>
>>>>>> This patch depends on:
>>>>>>
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>>>>>
>>>>>> for ARM CLI updates, and on:
>>>>>>
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>>>>
>>>>>> for testsuite effective_target update.
>>>>>>
>>>>>> Ok for trunk?
>>>>>
>>>>
>>>> New diff addressing review comments from Aarch64 version of the patch.
>>>>
>>>> _Change of order of operands in RTL patterns.
>>>> _Change tests to use check-function-bodies, compile with optimisation and
>>>> check for exact registers.
>>>> _Rename tests to remove "-compile-" in filename.
>>>>
>>>
> .Ping!
Ping :)
Diff re-attached in this ping email is same as the one posted on 10/01
Thank you!
> .
>>>
>>> Cheers,
>>> Stam
>>>
>>>>>>
>>>>>>
>>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>>>>
>>>>>> PS. I don't have commit rights, so if someone could commit on my behalf,
>>>>>> that would be great :)
>>>>>>
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 2019-11-28 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>>>>
>>>>>> Â Â Â Â Â * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>>>>> Â Â Â Â Â (USTERNOP_QUALIFIERS): New define.
>>>>>> Â Â Â Â Â (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>>>> Â Â Â Â Â (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>>>> Â Â Â Â Â (arm_expand_builtin_args):
>>>>>> Â Â Â Â Â Â Â Â Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>>>>> Â Â Â Â Â (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>>>>> Â Â Â Â Â * config/arm/arm_neon.h (vusdot_s32): New.
>>>>>> Â Â Â Â Â (vusdot_lane_s32): New.
>>>>>> Â Â Â Â Â (vusdotq_lane_s32): New.
>>>>>> Â Â Â Â Â (vsudot_lane_s32): New.
>>>>>> Â Â Â Â Â (vsudotq_lane_s32): New.
>>>>>> Â Â Â Â Â * config/arm/arm_neon_builtins.def
>>>>>> Â Â Â Â Â Â Â Â (usdot,usdot_lane,sudot_lane): New.
>>>>>> Â Â Â Â Â * config/arm/iterators.md (DOTPROD_I8MM): New.
>>>>>> Â Â Â Â Â Â Â Â (sup, opsuffix): Add <us/su>.
>>>>>> Â Â Â Â Â Â Â * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
>>>>>> Â Â Â Â Â * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>>>>>
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>> 2019-12-12 Stam Markianos-Wright <stam.markianos-wright@arm.com>
>>>>>>
>>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-1.c: New test.
>>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-2.c: New test.
>>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-3.c: New test.
>>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-4.c: New test.
>>>>>>
>>>>>>
>>>>
[-- Attachment #2: I8MM-32-final.patch --]
[-- Type: text/x-patch, Size: 15372 bytes --]
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index df84560588a..1b4316d0e93 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -86,7 +86,10 @@ enum arm_type_qualifiers
qualifier_const_void_pointer = 0x802,
/* Lane indices selected in pairs - must be within range of previous
argument = a vector. */
- qualifier_lane_pair_index = 0x1000
+ qualifier_lane_pair_index = 0x1000,
+ /* Lane indices selected in quadtuplets - must be within range of previous
+ argument = a vector. */
+ qualifier_lane_quadtup_index = 0x2000
};
/* The qualifier_internal allows generation of a unary builtin from
@@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
qualifier_unsigned };
#define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers)
+/* T (T, unsigned T, T). */
+static enum arm_type_qualifiers
+arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_unsigned,
+ qualifier_none };
+#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers)
+
/* T (T, immediate). */
static enum arm_type_qualifiers
arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
qualifier_unsigned, qualifier_lane_index };
#define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers)
+/* T (T, unsigned T, T, lane index). */
+static enum arm_type_qualifiers
+arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_unsigned,
+ qualifier_none, qualifier_lane_quadtup_index };
+#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers)
+
+/* T (T, T, unsigend T, lane index). */
+static enum arm_type_qualifiers
+arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+ = { qualifier_none, qualifier_none, qualifier_none,
+ qualifier_unsigned, qualifier_lane_quadtup_index };
+#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers)
+
/* T (T, T, immediate). */
static enum arm_type_qualifiers
arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2148,6 +2172,7 @@ typedef enum {
ARG_BUILTIN_LANE_INDEX,
ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX,
ARG_BUILTIN_LANE_PAIR_INDEX,
+ ARG_BUILTIN_LANE_QUADTUP_INDEX,
ARG_BUILTIN_NEON_MEMORY,
ARG_BUILTIN_MEMORY,
ARG_BUILTIN_STOP
@@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
if (CONST_INT_P (op[argc]))
{
machine_mode vmode = mode[argc - 1];
- neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp);
+ neon_lane_bounds (op[argc], 0,
+ GET_MODE_NUNITS (vmode) / 2, exp);
+ }
+ /* If the lane index isn't a constant then error out. */
+ goto constant_arg;
+
+ case ARG_BUILTIN_LANE_QUADTUP_INDEX:
+ /* Previous argument must be a vector, which this indexes. */
+ gcc_assert (argc > 0);
+ if (CONST_INT_P (op[argc]))
+ {
+ machine_mode vmode = mode[argc - 1];
+ neon_lane_bounds (op[argc], 0,
+ GET_MODE_NUNITS (vmode) / 4, exp);
}
- /* If the lane index isn't a constant then the next
- case will error. */
- /* Fall through. */
+ /* If the lane index isn't a constant then error out. */
+ goto constant_arg;
+
case ARG_BUILTIN_CONSTANT:
constant_arg:
if (!(*insn_data[icode].operand[opno].predicate)
@@ -2464,6 +2502,8 @@ arm_expand_builtin_1 (int fcode, tree exp, rtx target,
args[k] = ARG_BUILTIN_LANE_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_lane_pair_index)
args[k] = ARG_BUILTIN_LANE_PAIR_INDEX;
+ else if (d->qualifiers[qualifiers_k] & qualifier_lane_quadtup_index)
+ args[k] = ARG_BUILTIN_LANE_QUADTUP_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_struct_load_store_lane_index)
args[k] = ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX;
else if (d->qualifiers[qualifiers_k] & qualifier_immediate)
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index db8db53614a..ede89ec2c64 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18738,6 +18738,52 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,
return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
}
+
+/* AdvSIMD Matrix Multiply-Accumulate and Dot Product intrinsics. */
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+i8mm")
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
+{
+ return __builtin_neon_usdotv8qi_ssus (__r, __a, __b);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a,
+ int8x8_t __b, const int __index)
+{
+ return __builtin_neon_usdot_lanev8qi_ssuss (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vusdotq_lane_s32 (int32x4_t __r, uint8x16_t __a,
+ int8x8_t __b, const int __index)
+{
+ return __builtin_neon_usdot_lanev16qi_ssuss (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vsudot_lane_s32 (int32x2_t __r, int8x8_t __a,
+ uint8x8_t __b, const int __index)
+{
+ return __builtin_neon_sudot_lanev8qi_sssus (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline int32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a,
+ uint8x8_t __b, const int __index)
+{
+ return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index);
+}
+
+#pragma GCC pop_options
+
#pragma GCC pop_options
#endif
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index e9ff4e501cb..b4537ff5de9 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -352,6 +352,10 @@ VAR2 (UTERNOP, udot, v8qi, v16qi)
VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
+VAR1 (USTERNOP, usdot, v8qi)
+VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
+VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi)
+
VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
VAR4 (TERNOP, vcmla0, v2sf, v4sf, v4hf, v8hf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 7da8b74abc0..afea7f823e0 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -466,6 +466,8 @@
(define_int_iterator DOTPROD [UNSPEC_DOT_S UNSPEC_DOT_U])
+(define_int_iterator DOTPROD_I8MM [UNSPEC_DOT_US UNSPEC_DOT_SU])
+
(define_int_iterator VFMLHALVES [UNSPEC_VFML_LO UNSPEC_VFML_HI])
(define_int_iterator VCADD [UNSPEC_VCADD90 UNSPEC_VCADD270])
@@ -920,6 +922,7 @@
(UNSPEC_VRSRA_S_N "s") (UNSPEC_VRSRA_U_N "u")
(UNSPEC_VCVTH_S "s") (UNSPEC_VCVTH_U "u")
(UNSPEC_DOT_S "s") (UNSPEC_DOT_U "u")
+ (UNSPEC_DOT_US "us") (UNSPEC_DOT_SU "su")
(UNSPEC_SSAT16 "s") (UNSPEC_USAT16 "u")
])
@@ -1151,6 +1154,9 @@
(define_int_attr MRRC [(VUNSPEC_MRRC "MRRC") (VUNSPEC_MRRC2 "MRRC2")])
(define_int_attr opsuffix [(UNSPEC_DOT_S "s8")
- (UNSPEC_DOT_U "u8")])
+ (UNSPEC_DOT_U "u8")
+ (UNSPEC_DOT_US "s8")
+ (UNSPEC_DOT_SU "u8")
+ ])
(define_int_attr smlaw_op [(UNSPEC_SMLAWB "smlawb") (UNSPEC_SMLAWT "smlawt")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index dace9470c41..8b83cba8fb7 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3279,6 +3279,20 @@
[(set_attr "type" "neon_dot<q>")]
)
+;; These instructions map to the __builtins for the Dot Product operations.
+(define_insn "neon_usdot<vsi2qi>"
+ [(set (match_operand:VCVTI 0 "register_operand" "=w")
+ (plus:VCVTI
+ (unspec:VCVTI
+ [(match_operand:<VSI2QI> 2 "register_operand" "w")
+ (match_operand:<VSI2QI> 3 "register_operand" "w")]
+ UNSPEC_DOT_US)
+ (match_operand:VCVTI 1 "register_operand" "0")))]
+ "TARGET_I8MM"
+ "vusdot.s8\\t%<V_reg>0, %<V_reg>2, %<V_reg>3"
+ [(set_attr "type" "neon_dot<q>")]
+)
+
;; These instructions map to the __builtins for the Dot Product
;; indexed operations.
(define_insn "neon_<sup>dot_lane<vsi2qi>"
@@ -3298,6 +3312,25 @@
[(set_attr "type" "neon_dot<q>")]
)
+;; These instructions map to the __builtins for the Dot Product
+;; indexed operations in the v8.6 I8MM extension.
+(define_insn "neon_<sup>dot_lane<vsi2qi>"
+ [(set (match_operand:VCVTI 0 "register_operand" "=w")
+ (plus:VCVTI
+ (unspec:VCVTI
+ [(match_operand:<VSI2QI> 2 "register_operand" "w")
+ (match_operand:V8QI 3 "register_operand" "t")
+ (match_operand:SI 4 "immediate_operand" "i")]
+ DOTPROD_I8MM)
+ (match_operand:VCVTI 1 "register_operand" "0")))]
+ "TARGET_I8MM"
+ {
+ operands[4] = GEN_INT (INTVAL (operands[4]));
+ return "v<sup>dot.<opsuffix>\\t%<V_reg>0, %<V_reg>2, %P3[%c4]";
+ }
+ [(set_attr "type" "neon_dot<q>")]
+)
+
;; These expands map to the Dot Product optab the vectorizer checks for.
;; The auto-vectorizer expects a dot product builtin that also does an
;; accumulation into the provided register.
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index ade6b1af994..0aaff3b4bfc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -485,6 +485,8 @@
UNSPEC_VRNDX
UNSPEC_DOT_S
UNSPEC_DOT_U
+ UNSPEC_DOT_US
+ UNSPEC_DOT_SU
UNSPEC_VFML_LO
UNSPEC_VFML_HI
UNSPEC_VCADD90
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c
new file mode 100644
index 00000000000..4d5f07b771b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-1.c
@@ -0,0 +1,91 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "-O -save-temps" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+/*
+**usfoo:
+** ...
+** vusdot\.s8 d0, d1, d2
+** bx lr
+*/
+int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+/*
+**usfoo_lane:
+** ...
+** vusdot\.s8 d0, d1, d2\[0\]
+** bx lr
+*/
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
+
+/*
+**usfooq_lane:
+** ...
+** vusdot\.s8 q0, q1, d4\[1\]
+** bx lr
+*/
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ return vusdotq_lane_s32 (r, x, y, 1);
+}
+
+/* Signed-Unsigned Dot Product instructions. */
+
+/*
+**sfoo_lane:
+** ...
+** vsudot\.u8 d0, d1, d2\[0\]
+** bx lr
+*/
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ return vsudot_lane_s32 (r, x, y, 0);
+}
+
+/*
+**sfooq_lane:
+** ...
+** vsudot\.u8 q0, q1, d4\[1\]
+** bx lr
+*/
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ return vsudotq_lane_s32 (r, x, y, 1);
+}
+
+/*
+**usfoo_untied:
+** ...
+** vusdot\.s8 d1, d2, d3
+** vmov d0, d1 @ v2si
+** bx lr
+*/
+int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+/*
+**usfoo_lane_untied:
+** ...
+** vusdot.s8 d1, d2, d3\[0\]
+** vmov d0, d1 @ v2si
+** bx lr
+*/
+int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c
new file mode 100644
index 00000000000..b7b76e27486
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-2.c
@@ -0,0 +1,90 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "-O -save-temps -mbig-endian" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+/*
+**usfoo:
+** ...
+** vusdot\.s8 d0, d1, d2
+** bx lr
+*/
+int32x2_t usfoo (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+/*
+**usfoo_lane:
+** ...
+** vusdot\.s8 d0, d1, d2\[0\]
+** bx lr
+*/
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
+
+/*
+**usfooq_lane:
+** ...
+** vusdot\.s8 q0, q1, d4\[1\]
+** bx lr
+*/
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ return vusdotq_lane_s32 (r, x, y, 1);
+}
+
+/* Signed-Unsigned Dot Product instructions. */
+
+/*
+**sfoo_lane:
+** ...
+** vsudot\.u8 d0, d1, d2\[0\]
+** bx lr
+*/
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ return vsudot_lane_s32 (r, x, y, 0);
+}
+
+/*
+**sfooq_lane:
+** ...
+** vsudot\.u8 q0, q1, d4\[1\]
+** bx lr
+*/
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ return vsudotq_lane_s32 (r, x, y, 1);
+}
+
+/*
+**usfoo_untied:
+** ...
+** vusdot\.s8 d1, d2, d3
+** vmov d0, d1 @ v2si
+** bx lr
+*/
+int32x2_t usfoo_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_s32 (r, x, y);
+}
+
+/*
+**usfoo_lane_untied:
+** ...
+** vusdot.s8 d1, d2, d3\[0\]
+** vmov d0, d1 @ v2si
+** bx lr
+*/
+int32x2_t usfoo_lane_untied (int32x2_t unused, int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ return vusdot_lane_s32 (r, x, y, 0);
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c
new file mode 100644
index 00000000000..e14fe8f4433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-3.c
@@ -0,0 +1,21 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "--save-temps" } */
+
+#include <arm_neon.h>
+
+/* Unsigned-Signed Dot Product instructions. */
+
+int32x2_t usfoo_lane (int32x2_t r, uint8x8_t x, int8x8_t y)
+{
+ /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vusdot_lane_s32 (r, x, y, -1);
+}
+
+
+int32x4_t usfooq_lane (int32x4_t r, uint8x16_t x, int8x8_t y)
+{
+ /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vusdotq_lane_s32 (r, x, y, 2);
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c
new file mode 100644
index 00000000000..fb7ebb484e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vdot-2-4.c
@@ -0,0 +1,20 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm } */
+/* { dg-additional-options "--save-temps" } */
+
+#include <arm_neon.h>
+
+/* Signed-Unsigned Dot Product instructions. */
+
+int32x2_t sfoo_lane (int32x2_t r, int8x8_t x, uint8x8_t y)
+{
+ /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vsudot_lane_s32 (r, x, y, -1);
+}
+
+int32x4_t sfooq_lane (int32x4_t r, int8x16_t x, uint8x8_t y)
+{
+ /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */
+ return vsudotq_lane_s32 (r, x, y, 2);
+}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
2020-02-10 13:36 ` Stam Markianos-Wright
@ 2020-02-11 10:26 ` Kyrill Tkachov
2020-02-11 11:19 ` Stam Markianos-Wright
0 siblings, 1 reply; 9+ messages in thread
From: Kyrill Tkachov @ 2020-02-11 10:26 UTC (permalink / raw)
To: Stam Markianos-Wright, gcc-patches
Cc: Richard Earnshaw, nickc, Ramana Radhakrishnan
Hi Stam,
On 2/10/20 1:35 PM, Stam Markianos-Wright wrote:
>
>
> On 2/3/20 11:20 AM, Stam Markianos-Wright wrote:
> >
> >
> > On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:
> >>
> >> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
> >>>
> >>>
> >>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
> >>>>
> >>>>
> >>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
> >>>>>
> >>>>>
> >>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot
> product
> >>>>>> operations (vector/by element) to the ARM back-end.
> >>>>>>
> >>>>>> These are:
> >>>>>> usdot (vector), <us/su>dot (by element).
> >>>>>>
> >>>>>> The functions are optional from ARMv8.2-a as
> -march=armv8.2-a+i8mm and
> >>>>>> for ARM they remain optional as of ARMv8.6-a.
> >>>>>>
> >>>>>> The functions are declared in arm_neon.h, RTL patterns are
> defined to
> >>>>>> generate assembler and tests are added to verify and perform
> adequate checks.
> >>>>>>
> >>>>>> Regression testing on arm-none-eabi passed successfully.
> >>>>>>
> >>>>>> This patch depends on:
> >>>>>>
> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
> >>>>>>
> >>>>>> for ARM CLI updates, and on:
> >>>>>>
> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
> >>>>>>
> >>>>>> for testsuite effective_target update.
> >>>>>>
> >>>>>> Ok for trunk?
> >>>>>
> >>>>
> >>>> New diff addressing review comments from Aarch64 version of the
> patch.
> >>>>
> >>>> _Change of order of operands in RTL patterns.
> >>>> _Change tests to use check-function-bodies, compile with
> optimisation and
> >>>> check for exact registers.
> >>>> _Rename tests to remove "-compile-" in filename.
> >>>>
> >>>
> > .Ping!
>
> Ping :)
>
> Diff re-attached in this ping email is same as the one posted on 10/01
>
> Thank you!
Sorry for the delay.
This is ok.
Thanks,
Kyrill
> > .
> >>>
> >>> Cheers,
> >>> Stam
> >>>
> >>>>>>
> >>>>>>
> >>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
> >>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
> >>>>>>
> >>>>>> PS. I don't have commit rights, so if someone could commit on
> my behalf,
> >>>>>> that would be great :)
> >>>>>>
> >>>>>>
> >>>>>> gcc/ChangeLog:
> >>>>>>
> >>>>>> 2019-11-28Â Stam Markianos-Wright <stam.markianos-wright@arm.com>
> >>>>>>
> >>>>>> Â Â Â Â Â * config/arm/arm-builtins.c (enum arm_type_qualifiers):
> >>>>>> Â Â Â Â Â (USTERNOP_QUALIFIERS): New define.
> >>>>>> Â Â Â Â Â (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
> >>>>>> Â Â Â Â Â (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
> >>>>>> Â Â Â Â Â (arm_expand_builtin_args):
> >>>>>> Â Â Â Â Â Â Â Â Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
> >>>>>> Â Â Â Â Â (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
> >>>>>> Â Â Â Â Â * config/arm/arm_neon.h (vusdot_s32): New.
> >>>>>> Â Â Â Â Â (vusdot_lane_s32): New.
> >>>>>> Â Â Â Â Â (vusdotq_lane_s32): New.
> >>>>>> Â Â Â Â Â (vsudot_lane_s32): New.
> >>>>>> Â Â Â Â Â (vsudotq_lane_s32): New.
> >>>>>> Â Â Â Â Â * config/arm/arm_neon_builtins.def
> >>>>>> (usdot,usdot_lane,sudot_lane): New.
> >>>>>> Â Â Â Â Â * config/arm/iterators.md (DOTPROD_I8MM): New.
> >>>>>> Â Â Â Â Â Â Â Â (sup, opsuffix): Add <us/su>.
> >>>>>> Â Â Â Â Â Â Â * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
> >>>>>> Â Â Â Â Â * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
> >>>>>>
> >>>>>>
> >>>>>> gcc/testsuite/ChangeLog:
> >>>>>>
> >>>>>> 2019-12-12Â Stam Markianos-Wright <stam.markianos-wright@arm.com>
> >>>>>>
> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-1.c: New test.
> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-2.c: New test.
> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-3.c: New test.
> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-4.c: New test.
> >>>>>>
> >>>>>>
> >>>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension
2020-02-11 10:26 ` Kyrill Tkachov
@ 2020-02-11 11:19 ` Stam Markianos-Wright
0 siblings, 0 replies; 9+ messages in thread
From: Stam Markianos-Wright @ 2020-02-11 11:19 UTC (permalink / raw)
To: Kyrill Tkachov, gcc-patches; +Cc: Richard Earnshaw, nickc, Ramana Radhakrishnan
On 2/11/20 10:25 AM, Kyrill Tkachov wrote:
> Hi Stam,
>
> On 2/10/20 1:35 PM, Stam Markianos-Wright wrote:
>>
>>
>> On 2/3/20 11:20 AM, Stam Markianos-Wright wrote:
>> >
>> >
>> > On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:
>> >>
>> >> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
>> >>>
>> >>>
>> >>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
>> >>>>
>> >>>>
>> >>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>> >>>>>
>> >>>>>
>> >>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>> >>>>>> operations (vector/by element) to the ARM back-end.
>> >>>>>>
>> >>>>>> These are:
>> >>>>>> usdot (vector), <us/su>dot (by element).
>> >>>>>>
>> >>>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>> >>>>>> for ARM they remain optional as of ARMv8.6-a.
>> >>>>>>
>> >>>>>> The functions are declared in arm_neon.h, RTL patterns are defined to
>> >>>>>> generate assembler and tests are added to verify and perform adequate
>> checks.
>> >>>>>>
>> >>>>>> Regression testing on arm-none-eabi passed successfully.
>> >>>>>>
>> >>>>>> This patch depends on:
>> >>>>>>
>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>> >>>>>>
>> >>>>>> for ARM CLI updates, and on:
>> >>>>>>
>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>> >>>>>>
>> >>>>>> for testsuite effective_target update.
>> >>>>>>
>> >>>>>> Ok for trunk?
>> >>>>>
>> >>>>
>> >>>> New diff addressing review comments from Aarch64 version of the patch.
>> >>>>
>> >>>> _Change of order of operands in RTL patterns.
>> >>>> _Change tests to use check-function-bodies, compile with optimisation and
>> >>>> check for exact registers.
>> >>>> _Rename tests to remove "-compile-" in filename.
>> >>>>
>> >>>
>> > .Ping!
>>
>> Ping :)
>>
>> Diff re-attached in this ping email is same as the one posted on 10/01
>>
>> Thank you!
>
>
> Sorry for the delay.
>
> This is ok.
No worries, thank you!
Committed as r10-6575.
Cheers,
Stam
>
> Thanks,
>
> Kyrill
>
>
>> > .
>> >>>
>> >>> Cheers,
>> >>> Stam
>> >>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>> >>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>> >>>>>>
>> >>>>>> PS. I don't have commit rights, so if someone could commit on my behalf,
>> >>>>>> that would be great :)
>> >>>>>>
>> >>>>>>
>> >>>>>> gcc/ChangeLog:
>> >>>>>>
>> >>>>>> 2019-11-28Â Stam Markianos-Wright <stam.markianos-wright@arm.com>
>> >>>>>>
>> >>>>>> Â Â Â Â Â * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>> >>>>>> Â Â Â Â Â (USTERNOP_QUALIFIERS): New define.
>> >>>>>> Â Â Â Â Â (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>> >>>>>> Â Â Â Â Â (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>> >>>>>> Â Â Â Â Â (arm_expand_builtin_args):
>> >>>>>> Â Â Â Â Â Â Â Â Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>> >>>>>> Â Â Â Â Â (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>> >>>>>> Â Â Â Â Â * config/arm/arm_neon.h (vusdot_s32): New.
>> >>>>>> Â Â Â Â Â (vusdot_lane_s32): New.
>> >>>>>> Â Â Â Â Â (vusdotq_lane_s32): New.
>> >>>>>> Â Â Â Â Â (vsudot_lane_s32): New.
>> >>>>>> Â Â Â Â Â (vsudotq_lane_s32): New.
>> >>>>>> Â Â Â Â Â * config/arm/arm_neon_builtins.def
>> >>>>>> (usdot,usdot_lane,sudot_lane): New.
>> >>>>>> Â Â Â Â Â * config/arm/iterators.md (DOTPROD_I8MM): New.
>> >>>>>> Â Â Â Â Â Â Â Â (sup, opsuffix): Add <us/su>.
>> >>>>>> Â Â Â Â Â Â Â * config/arm/neon.md (neon_usdot, <us/su>dot_lane: New.
>> >>>>>> Â Â Â Â Â * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>> >>>>>>
>> >>>>>>
>> >>>>>> gcc/testsuite/ChangeLog:
>> >>>>>>
>> >>>>>> 2019-12-12Â Stam Markianos-Wright <stam.markianos-wright@arm.com>
>> >>>>>>
>> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-1.c: New test.
>> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-2.c: New test.
>> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-3.c: New test.
>> >>>>>> Â Â Â Â Â * gcc.target/arm/simd/vdot-2-4.c: New test.
>> >>>>>>
>> >>>>>>
>> >>>>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-02-11 11:19 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-13 10:23 [GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, v<us/su>dot - by element) for AArch32 AdvSIMD ARMv8.6 Extension Stam Markianos-Wright
2019-12-18 13:26 ` [Ping][GCC][PATCH][ARM]Add " Stam Markianos-Wright
2020-01-10 19:25 ` Stam Markianos-Wright
2020-01-16 16:17 ` [Pingx2][GCC][PATCH][ARM]Add " Stam Markianos-Wright
2020-01-27 16:08 ` [Pingx3][GCC][PATCH][ARM]Add " Stam Markianos-Wright
2020-02-03 11:20 ` Stam Markianos-Wright
2020-02-10 13:36 ` Stam Markianos-Wright
2020-02-11 10:26 ` Kyrill Tkachov
2020-02-11 11:19 ` Stam Markianos-Wright
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).