[PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16
@ 2019-11-22 14:33 Dennis Zhang
  2019-12-12 17:30 ` Dennis Zhang
  2020-03-12 12:05 ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Dennis Zhang
  0 siblings, 2 replies; 41+ messages in thread
From: Dennis Zhang @ 2019-11-22 14:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

Hi all,

This patch is part of a series adding support for Armv8.6-A features.
It enables options including -march=armv8.6-a, +i8mm and +bf16.
The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
Documents are at https://developer.arm.com/docs/ddi0596/latest

Regtested for arm-none-linux-gnueabi-armv8-a.

Please help to check if ready for trunk.

Many thanks!
Dennis

gcc/ChangeLog:

2019-11-15  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/arm-c.c (arm_cpu_builtins): Define
	__ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
	__ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
	__ARM_BF16_FORMAT_ALTERNATIVE when enabled.
	* config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
	* config/arm/arm-tables.opt: Regenerated.
	* config/arm/arm.c (arm_option_reconfigure_globals): Init
	arm_arch_i8mm and arm_arch_bf16 to enable features.
	* config/arm/arm.h (TARGET_I8MM): New macro.
	(TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
	* config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
	* config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
	* config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
	(v8_6_a_simd_variants): New.
	(v8_*_a_simd_variants): Add i8mm and bf16.
	* doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.

gcc/testsuite/ChangeLog:

2019-11-15  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/multilib.exp: Add combination tests for armv8.6-a.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: cli-arm-armv8.6-a+i8mm+bf16-20191119.patch --]
[-- Type: text/x-patch; name="cli-arm-armv8.6-a+i8mm+bf16-20191119.patch", Size: 14360 bytes --]

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index c4485ce7af1..b47e64c2151 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -225,6 +225,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
       builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
     }
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
+		      TARGET_BF16_FP);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
+		      TARGET_BF16_SIMD);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+		      TARGET_BF16_FP || TARGET_BF16_SIMD);
 }
 
 void
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 50379a0a10a..d373406649c 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -123,6 +123,9 @@ define feature armv8_4
 # Architecture rel 8.5.
 define feature armv8_5
 
+# Architecture rel 8.6.
+define feature armv8_6
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -191,6 +194,12 @@ define feature sb
 # v8-A architectures, added by default from v8.5-A
 define feature predres
 
+# 8-bit Integer Matrix Multiply extension. Optional from v8.2-A.
+define feature i8mm
+
+# Brain half-precision floating-point extension. Optional from v8.2-A.
+define feature bf16
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -213,7 +222,7 @@ define fgroup ALL_CRYPTO	crypto
 # strip off 32 D-registers, but does not remove support for
 # double-precision FP.
 define fgroup ALL_SIMD_INTERNAL	fp_d32 neon ALL_CRYPTO
-define fgroup ALL_SIMD	ALL_SIMD_INTERNAL dotprod fp16fml
+define fgroup ALL_SIMD	ALL_SIMD_INTERNAL dotprod fp16fml i8mm
 
 # List of all FPU bits to strip out if -mfpu is used to override the
 # default.  fp16 is deliberately missing from this list.
@@ -253,6 +262,7 @@ define fgroup ARMv8_2a    ARMv8_1a armv8_2
 define fgroup ARMv8_3a    ARMv8_2a armv8_3
 define fgroup ARMv8_4a    ARMv8_3a armv8_4
 define fgroup ARMv8_5a    ARMv8_4a armv8_5 sb predres
+define fgroup ARMv8_6a    ARMv8_5a armv8_6
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r      ARMv8a
@@ -560,6 +570,8 @@ begin arch armv8.2-a
  option dotprod add FP_ARMv8 DOTPROD
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.2-a
 
 begin arch armv8.3-a
@@ -577,6 +589,8 @@ begin arch armv8.3-a
  option dotprod add FP_ARMv8 DOTPROD
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.3-a
 
 begin arch armv8.4-a
@@ -592,6 +606,8 @@ begin arch armv8.4-a
  option nofp remove ALL_FP
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.4-a
 
 begin arch armv8.5-a
@@ -605,8 +621,25 @@ begin arch armv8.5-a
  option crypto add FP_ARMv8 CRYPTO DOTPROD
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.5-a
 
+begin arch armv8.6-a
+ tune for cortex-a53
+ tune flags CO_PROC
+ base 8A
+ profile A
+ isa ARMv8_6a
+ option simd add FP_ARMv8 DOTPROD
+ option fp16 add fp16 fp16fml FP_ARMv8 DOTPROD
+ option crypto add FP_ARMv8 CRYPTO DOTPROD
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
+end arch armv8.6-a
+
 begin arch armv8-m.base
  tune for cortex-m23
  base 8M_BASE
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index aeb5b3fbf62..e509081678e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -344,19 +344,22 @@ EnumValue
 Enum(arm_arch) String(armv8.5-a) Value(25)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.base) Value(26)
+Enum(arm_arch) String(armv8.6-a) Value(26)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.main) Value(27)
+Enum(arm_arch) String(armv8-m.base) Value(27)
 
 EnumValue
-Enum(arm_arch) String(armv8-r) Value(28)
+Enum(arm_arch) String(armv8-m.main) Value(28)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(29)
+Enum(arm_arch) String(armv8-r) Value(29)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(30)
+Enum(arm_arch) String(iwmmxt) Value(30)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(31)
 
 Enum
 Name(arm_fpu) Type(enum fpu_type)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1fd30c238cd..290db1129f2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -996,6 +996,12 @@ int arm_arch_cmse = 0;
 /* Nonzero if the core has a very small, high-latency, multiply unit.  */
 int arm_m_profile_small_mul = 0;
 
+/* Nonzero if chip supports the AdvSIMD I8MM instructions.  */
+int arm_arch_i8mm = 0;
+
+/* Nonzero if chip supports the BFloat16 instructions.  */
+int arm_arch_bf16 = 0;
+
 /* The condition codes of the ARM, and the inverse function.  */
 static const char * const arm_condition_codes[] =
 {
@@ -3649,8 +3655,11 @@ arm_option_reconfigure_globals (void)
   arm_arch_arm_hwdiv = bitmap_bit_p (arm_active_target.isa, isa_bit_adiv);
   arm_arch_crc = bitmap_bit_p (arm_active_target.isa, isa_bit_crc32);
   arm_arch_cmse = bitmap_bit_p (arm_active_target.isa, isa_bit_cmse);
-  arm_fp16_inst = bitmap_bit_p (arm_active_target.isa, isa_bit_fp16);
   arm_arch_lpae = bitmap_bit_p (arm_active_target.isa, isa_bit_lpae);
+  arm_arch_i8mm = bitmap_bit_p (arm_active_target.isa, isa_bit_i8mm);
+  arm_arch_bf16 = bitmap_bit_p (arm_active_target.isa, isa_bit_bf16);
+
+  arm_fp16_inst = bitmap_bit_p (arm_active_target.isa, isa_bit_fp16);
   if (arm_fp16_inst)
     {
       if (arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3a1ba8b9a57..6c8ff6637d2 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -246,6 +246,15 @@ emission of floating point pcs attributes.  */
 /* FPU supports the AdvSIMD FP16 instructions for ARMv8.2 and later.  */
 #define TARGET_NEON_FP16INST (TARGET_VFP_FP16INST && TARGET_NEON_RDMA)
 
+/* FPU supports 8-bit Integer Matrix Multiply (I8MM) AdvSIMD extensions.  */
+#define TARGET_I8MM (TARGET_NEON && arm_arch8_2 && arm_arch_i8mm)
+
+/* FPU supports Brain half-precision floating-point (BFloat16) extension.  */
+#define TARGET_BF16_FP (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP5 \
+			&& arm_arch8_2 && arm_arch_bf16)
+#define TARGET_BF16_SIMD (TARGET_NEON && TARGET_VFP5 \
+			  && arm_arch8_2 && arm_arch_bf16)
+
 /* Q-bit is present.  */
 #define TARGET_ARM_QBIT \
   (TARGET_32BIT && arm_arch5te && (arm_arch_notm || arm_arch7))
@@ -517,6 +526,12 @@ extern int arm_arch_crc;
 /* Nonzero if chip supports the ARMv8-M Security Extensions.  */
 extern int arm_arch_cmse;
 
+/* Nonzero if chip supports the ARMv8 I8MM instructions.  */
+extern int arm_arch_i8mm;
+
+/* Nonzero if chip supports the BFloat16 instructions.  */
+extern int arm_arch_bf16;
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 1556f1b23e3..e5f3c3b42d6 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -122,6 +122,13 @@ MULTILIB_MATCHES	+= march?armv8-a=march?armv8.5-a
 MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_5_a_simd_variants), \
 			     march?armv8-a+simd=march?armv8.5-a$(ARCH))
 
+# Baseline v8.6-a: map down to baseline v8-a
+MULTILIB_MATCHES	+= march?armv8-a=march?armv8.6-a
+
+# Map all v8.6-a SIMD variants to v8-a+simd
+MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_6_a_simd_variants), \
+			     march?armv8-a+simd=march?armv8.6-a$(ARCH))
+
 # Use Thumb libraries for everything.
 
 MULTILIB_REUSE		+= mthumb/march.armv7-a/mfloat-abi.soft=marm/march.armv7-a/mfloat-abi.soft
diff --git a/gcc/config/arm/t-arm-elf b/gcc/config/arm/t-arm-elf
index 8911d489f14..970cc43a9e4 100644
--- a/gcc/config/arm/t-arm-elf
+++ b/gcc/config/arm/t-arm-elf
@@ -47,7 +47,7 @@ all_early_arch	:= armv5tej armv6 armv6j armv6k armv6z armv6kz \
 all_v7_a_r	:= armv7-a armv7ve armv7-r
 
 all_v8_archs	:= armv8-a armv8-a+crc armv8.1-a armv8.2-a armv8.3-a armv8.4-a \
-		   armv8.5-a
+		   armv8.5-a armv8.6-a
 
 # No floating point variants, require thumb1 softfp
 all_nofp_t	:= armv6-m armv6s-m armv8-m.base
diff --git a/gcc/config/arm/t-multilib b/gcc/config/arm/t-multilib
index dc97c8f09fb..fcf3b0b46e3 100644
--- a/gcc/config/arm/t-multilib
+++ b/gcc/config/arm/t-multilib
@@ -73,9 +73,10 @@ v7ve_vfpv4_simd_variants := +simd
 v8_a_nosimd_variants	:= +crc
 v8_a_simd_variants	:= $(call all_feat_combs, simd crypto)
 v8_1_a_simd_variants	:= $(call all_feat_combs, simd crypto)
-v8_2_a_simd_variants	:= $(call all_feat_combs, simd fp16 fp16fml crypto dotprod)
-v8_4_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto)
-v8_5_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto)
+v8_2_a_simd_variants	:= $(call all_feat_combs, simd fp16 fp16fml crypto dotprod i8mm bf16)
+v8_4_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto i8mm bf16)
+v8_5_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto i8mm bf16)
+v8_6_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto i8mm bf16)
 v8_r_nosimd_variants	:= +crc
 
 ifneq (,$(HAS_APROFILE))
@@ -185,6 +186,13 @@ MULTILIB_MATCHES	+= march?armv7=march?armv8.5-a
 MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_5_a_simd_variants), \
 			     march?armv7+fp=march?armv8.5-a$(ARCH))
 
+# Baseline v8.6-a: map down to baseline v8-a
+MULTILIB_MATCHES	+= march?armv7=march?armv8.6-a
+
+# Map all v8.6-a SIMD variants
+MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_6_a_simd_variants), \
+			     march?armv7+fp=march?armv8.6-a$(ARCH))
+
 # Use Thumb libraries for everything.
 
 MULTILIB_REUSE		+= mthumb/march.armv7/mfloat-abi.soft=marm/march.armv7/mfloat-abi.soft
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2897982705e..7a31bf0cf27 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17423,6 +17423,7 @@ Permissible names are:
 @samp{armv8-a}, @samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a},
 @samp{armv8.4-a},
 @samp{armv8.5-a},
+@samp{armv8.6-a},
 @samp{armv7-r},
 @samp{armv8-r},
 @samp{armv6-m}, @samp{armv6s-m},
@@ -17666,6 +17667,14 @@ Speculation Barrier Instruction.
 
 @item +predres
 Execution and Data Prediction Restriction Instructions.
+
+@item +i8mm
+8-bit Integer Matrix Multiply instructions.
+This also enables Advanced SIMD instructions.
+
+@item +bf16
+Brain half-precision floating-point instructions.
+This also enables Advanced SIMD and floating-point instructions.
 @end table
 
 @item armv8.4-a
@@ -17695,6 +17704,14 @@ Speculation Barrier Instruction.
 
 @item +predres
 Execution and Data Prediction Restriction Instructions.
+
+@item +i8mm
+8-bit Integer Matrix Multiply instructions.
+This also enables Advanced SIMD instructions.
+
+@item +bf16
+Brain half-precision floating-point instructions.
+This also enables Advanced SIMD and floating-point instructions.
 @end table
 
 @item armv8.5-a
@@ -17718,6 +17735,45 @@ Disable the cryptographic extension.
 
 @item +nofp
 Disable the floating-point, Advanced SIMD and cryptographic instructions.
+
+@item +i8mm
+8-bit Integer Matrix Multiply instructions.
+This also enables Advanced SIMD instructions.
+
+@item +bf16
+Brain half-precision floating-point instructions.
+This also enables Advanced SIMD and floating-point instructions.
+@end table
+
+@item armv8.6-a
+@table @samp
+@item +fp16
+The half-precision floating-point data processing instructions.
+This also enables the Advanced SIMD and floating-point instructions as well
+as the Dot Product extension and the half-precision floating-point fmla
+extension.
+
+@item +simd
+The ARMv8.3-A Advanced SIMD and floating-point instructions as well as the
+Dot Product extension.
+
+@item +crypto
+The cryptographic instructions.  This also enables the Advanced SIMD and
+floating-point instructions as well as the Dot Product extension.
+
+@item +nocrypto
+Disable the cryptographic extension.
+
+@item +nofp
+Disable the floating-point, Advanced SIMD and cryptographic instructions.
+
+@item +i8mm
+8-bit Integer Matrix Multiply instructions.
+This also enables Advanced SIMD instructions.
+
+@item +bf16
+Brain half-precision floating-point instructions.
+This also enables Advanced SIMD and floating-point instructions.
 @end table
 
 @item armv7-r
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp b/gcc/testsuite/gcc.target/arm/multilib.exp
index dcea829965e..7807485352f 100644
--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -126,6 +126,14 @@ if {[multilib_config "aprofile"] } {
 	{-march=armv8.5-a+simd+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
 	{-march=armv8.5-a+simd+fp16+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
 	{-march=armv8.5-a+simd+nofp+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.6-a+crypto -mfloat-abi=soft} "thumb/v8-a/nofp"
+	{-march=armv8.6-a+simd+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.6-a+simd+crypto+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
+	{-march=armv8.6-a+simd+nofp+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.6-a+fp16 -mfloat-abi=soft} "thumb/v8-a/nofp"
+	{-march=armv8.6-a+simd+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.6-a+simd+fp16+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
+	{-march=armv8.6-a+simd+nofp+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
 	{-mcpu=cortex-a53+crypto -mfloat-abi=hard} "thumb/v8-a+simd/hard"
 	{-mcpu=cortex-a53+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
 	{-march=armv8-a+crc -mfloat-abi=hard -mfpu=vfp} "thumb/v8-a+simd/hard"

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16
  2019-11-22 14:33 [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16 Dennis Zhang
@ 2019-12-12 17:30 ` Dennis Zhang
  2019-12-20 15:35   ` Kyrill Tkachov
  2020-03-12 12:05 ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Dennis Zhang
  1 sibling, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2019-12-12 17:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

Hi all,

On 22/11/2019 14:33, Dennis Zhang wrote:
> Hi all,
> 
> This patch is part of a series adding support for Armv8.6-A features.
> It enables options including -march=armv8.6-a, +i8mm and +bf16.
> The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
> Documents are at https://developer.arm.com/docs/ddi0596/latest
> 
> Regtested for arm-none-linux-gnueabi-armv8-a.
> 

This is an update to rebase the patch to the top.
Some issues are fixed according to the recent CLI patch for AArch64.
ChangeLog is updated as following:

gcc/ChangeLog:

2019-12-12  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/arm-c.c (arm_cpu_builtins): Define
	__ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
	__ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
	__ARM_BF16_FORMAT_ALTERNATIVE when enabled.
	* config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
	* config/arm/arm-tables.opt: Regenerated.
	* config/arm/arm.c (arm_option_reconfigure_globals): Initialize
	arm_arch_i8mm and arm_arch_bf16 when enabled.
	* config/arm/arm.h (TARGET_I8MM): New macro.
	(TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
	* config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
	* config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
	* config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
	(v8_6_a_simd_variants): New.
	(v8_*_a_simd_variants): Add i8mm and bf16.
	* doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.

gcc/testsuite/ChangeLog:

2019-12-12  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/multilib.exp: Add combination tests for armv8.6-a.

Is it OK for trunk?

Many thanks!
Dennis

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: cli-arm-armv8.6-a+i8mm+bf16-20191211.patch --]
[-- Type: text/x-patch; name="cli-arm-armv8.6-a+i8mm+bf16-20191211.patch", Size: 14841 bytes --]

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 546b35a5cbd..9cd1c5bdcba 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -226,6 +226,14 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
       builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
     }
+
+  def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
+		      TARGET_BF16_FP);
+  def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC",
+		      TARGET_BF16_SIMD);
+  def_or_undef_macro (pfile, "__ARM_BF16_FORMAT_ALTERNATIVE",
+		      TARGET_BF16_FP || TARGET_BF16_SIMD);
 }
 
 void
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 7090775aa7e..a2f6ce00af4 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -123,6 +123,9 @@ define feature armv8_4
 # Architecture rel 8.5.
 define feature armv8_5
 
+# Architecture rel 8.6.
+define feature armv8_6
+
 # M-Profile security extensions.
 define feature cmse
 
@@ -191,6 +194,12 @@ define feature sb
 # v8-A architectures, added by default from v8.5-A
 define feature predres
 
+# 8-bit Integer Matrix Multiply extension. Optional from v8.2-A.
+define feature i8mm
+
+# Brain half-precision floating-point extension. Optional from v8.2-A.
+define feature bf16
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -213,7 +222,7 @@ define fgroup ALL_CRYPTO	crypto
 # strip off 32 D-registers, but does not remove support for
 # double-precision FP.
 define fgroup ALL_SIMD_INTERNAL	fp_d32 neon ALL_CRYPTO
-define fgroup ALL_SIMD_EXTERNAL dotprod fp16fml
+define fgroup ALL_SIMD_EXTERNAL dotprod fp16fml i8mm
 define fgroup ALL_SIMD	ALL_SIMD_INTERNAL ALL_SIMD_EXTERNAL
 
 # List of all FPU bits to strip out if -mfpu is used to override the
@@ -221,7 +230,7 @@ define fgroup ALL_SIMD	ALL_SIMD_INTERNAL ALL_SIMD_EXTERNAL
 define fgroup ALL_FPU_INTERNAL	vfpv2 vfpv3 vfpv4 fpv5 fp16conv fp_dbl ALL_SIMD_INTERNAL
 # Similarly, but including fp16 and other extensions that aren't part of
 # -mfpu support.
-define fgroup ALL_FPU_EXTERNAL fp16
+define fgroup ALL_FPU_EXTERNAL fp16 bf16
 
 # Everything related to the FPU extensions (FP or SIMD).
 define fgroup ALL_FP	ALL_FPU_EXTERNAL ALL_FPU_INTERNAL ALL_SIMD
@@ -256,6 +265,7 @@ define fgroup ARMv8_2a    ARMv8_1a armv8_2
 define fgroup ARMv8_3a    ARMv8_2a armv8_3
 define fgroup ARMv8_4a    ARMv8_3a armv8_4
 define fgroup ARMv8_5a    ARMv8_4a armv8_5 sb predres
+define fgroup ARMv8_6a    ARMv8_5a armv8_6
 define fgroup ARMv8m_base ARMv6m armv8 cmse tdiv
 define fgroup ARMv8m_main ARMv7m armv8 cmse
 define fgroup ARMv8r      ARMv8a
@@ -563,6 +573,8 @@ begin arch armv8.2-a
  option dotprod add FP_ARMv8 DOTPROD
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.2-a
 
 begin arch armv8.3-a
@@ -580,6 +592,8 @@ begin arch armv8.3-a
  option dotprod add FP_ARMv8 DOTPROD
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 NEON
+ option bf16 add bf16 FP_ARMv8 NEON
 end arch armv8.3-a
 
 begin arch armv8.4-a
@@ -595,6 +609,8 @@ begin arch armv8.4-a
  option nofp remove ALL_FP
  option sb add sb
  option predres add predres
+ option i8mm add i8mm FP_ARMv8 DOTPROD
+ option bf16 add bf16 FP_ARMv8 DOTPROD
 end arch armv8.4-a
 
 begin arch armv8.5-a
@@ -608,8 +624,25 @@ begin arch armv8.5-a
  option crypto add FP_ARMv8 CRYPTO DOTPROD
  option nocrypto remove ALL_CRYPTO
  option nofp remove ALL_FP
+ option i8mm add i8mm FP_ARMv8 DOTPROD
+ option bf16 add bf16 FP_ARMv8 DOTPROD
 end arch armv8.5-a
 
+begin arch armv8.6-a
+ tune for cortex-a53
+ tune flags CO_PROC
+ base 8A
+ profile A
+ isa ARMv8_6a
+ option simd add FP_ARMv8 DOTPROD
+ option fp16 add fp16 fp16fml FP_ARMv8 DOTPROD
+ option crypto add FP_ARMv8 CRYPTO DOTPROD
+ option nocrypto remove ALL_CRYPTO
+ option nofp remove ALL_FP
+ option i8mm add i8mm FP_ARMv8 DOTPROD
+ option bf16 add bf16 FP_ARMv8 DOTPROD
+end arch armv8.6-a
+
 begin arch armv8-m.base
  tune for cortex-m23
  base 8M_BASE
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index aeb5b3fbf62..e509081678e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -344,19 +344,22 @@ EnumValue
 Enum(arm_arch) String(armv8.5-a) Value(25)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.base) Value(26)
+Enum(arm_arch) String(armv8.6-a) Value(26)
 
 EnumValue
-Enum(arm_arch) String(armv8-m.main) Value(27)
+Enum(arm_arch) String(armv8-m.base) Value(27)
 
 EnumValue
-Enum(arm_arch) String(armv8-r) Value(28)
+Enum(arm_arch) String(armv8-m.main) Value(28)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(29)
+Enum(arm_arch) String(armv8-r) Value(29)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(30)
+Enum(arm_arch) String(iwmmxt) Value(30)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(31)
 
 Enum
 Name(arm_fpu) Type(enum fpu_type)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 983852cc4e3..fc4ba2b1d1b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -999,6 +999,12 @@ int arm_arch_cmse = 0;
 /* Nonzero if the core has a very small, high-latency, multiply unit.  */
 int arm_m_profile_small_mul = 0;
 
+/* Nonzero if chip supports the AdvSIMD I8MM instructions.  */
+int arm_arch_i8mm = 0;
+
+/* Nonzero if chip supports the BFloat16 instructions.  */
+int arm_arch_bf16 = 0;
+
 /* The condition codes of the ARM, and the inverse function.  */
 static const char * const arm_condition_codes[] =
 {
@@ -3672,8 +3678,11 @@ arm_option_reconfigure_globals (void)
   arm_arch_arm_hwdiv = bitmap_bit_p (arm_active_target.isa, isa_bit_adiv);
   arm_arch_crc = bitmap_bit_p (arm_active_target.isa, isa_bit_crc32);
   arm_arch_cmse = bitmap_bit_p (arm_active_target.isa, isa_bit_cmse);
-  arm_fp16_inst = bitmap_bit_p (arm_active_target.isa, isa_bit_fp16);
   arm_arch_lpae = bitmap_bit_p (arm_active_target.isa, isa_bit_lpae);
+  arm_arch_i8mm = bitmap_bit_p (arm_active_target.isa, isa_bit_i8mm);
+  arm_arch_bf16 = bitmap_bit_p (arm_active_target.isa, isa_bit_bf16);
+
+  arm_fp16_inst = bitmap_bit_p (arm_active_target.isa, isa_bit_fp16);
   if (arm_fp16_inst)
     {
       if (arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3a1ba8b9a57..23af941a4f6 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -246,6 +246,15 @@ emission of floating point pcs attributes.  */
 /* FPU supports the AdvSIMD FP16 instructions for ARMv8.2 and later.  */
 #define TARGET_NEON_FP16INST (TARGET_VFP_FP16INST && TARGET_NEON_RDMA)
 
+/* FPU supports 8-bit Integer Matrix Multiply (I8MM) AdvSIMD extensions.  */
+#define TARGET_I8MM (TARGET_NEON && arm_arch8_2 && arm_arch_i8mm)
+
+/* FPU supports Brain half-precision floating-point (BFloat16) extension.  */
+#define TARGET_BF16_FP (TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP5 \
+			&& arm_arch8_2 && arm_arch_bf16)
+#define TARGET_BF16_SIMD (TARGET_NEON && TARGET_VFP5 \
+			  && arm_arch8_2 && arm_arch_bf16)
+
 /* Q-bit is present.  */
 #define TARGET_ARM_QBIT \
   (TARGET_32BIT && arm_arch5te && (arm_arch_notm || arm_arch7))
@@ -517,6 +526,12 @@ extern int arm_arch_crc;
 /* Nonzero if chip supports the ARMv8-M Security Extensions.  */
 extern int arm_arch_cmse;
 
+/* Nonzero if chip supports the I8MM instructions.  */
+extern int arm_arch_i8mm;
+
+/* Nonzero if chip supports the BFloat16 instructions.  */
+extern int arm_arch_bf16;
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 1556f1b23e3..e5f3c3b42d6 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -122,6 +122,13 @@ MULTILIB_MATCHES	+= march?armv8-a=march?armv8.5-a
 MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_5_a_simd_variants), \
 			     march?armv8-a+simd=march?armv8.5-a$(ARCH))
 
+# Baseline v8.6-a: map down to baseline v8-a
+MULTILIB_MATCHES	+= march?armv8-a=march?armv8.6-a
+
+# Map all v8.6-a SIMD variants to v8-a+simd
+MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_6_a_simd_variants), \
+			     march?armv8-a+simd=march?armv8.6-a$(ARCH))
+
 # Use Thumb libraries for everything.
 
 MULTILIB_REUSE		+= mthumb/march.armv7-a/mfloat-abi.soft=marm/march.armv7-a/mfloat-abi.soft
diff --git a/gcc/config/arm/t-arm-elf b/gcc/config/arm/t-arm-elf
index 8911d489f14..970cc43a9e4 100644
--- a/gcc/config/arm/t-arm-elf
+++ b/gcc/config/arm/t-arm-elf
@@ -47,7 +47,7 @@ all_early_arch	:= armv5tej armv6 armv6j armv6k armv6z armv6kz \
 all_v7_a_r	:= armv7-a armv7ve armv7-r
 
 all_v8_archs	:= armv8-a armv8-a+crc armv8.1-a armv8.2-a armv8.3-a armv8.4-a \
-		   armv8.5-a
+		   armv8.5-a armv8.6-a
 
 # No floating point variants, require thumb1 softfp
 all_nofp_t	:= armv6-m armv6s-m armv8-m.base
diff --git a/gcc/config/arm/t-multilib b/gcc/config/arm/t-multilib
index d5ee537193f..afe2e872d7d 100644
--- a/gcc/config/arm/t-multilib
+++ b/gcc/config/arm/t-multilib
@@ -73,9 +73,10 @@ v7ve_vfpv4_simd_variants := +simd
 v8_a_nosimd_variants	:= +crc
 v8_a_simd_variants	:= $(call all_feat_combs, simd crypto)
 v8_1_a_simd_variants	:= $(call all_feat_combs, simd crypto)
-v8_2_a_simd_variants	:= $(call all_feat_combs, simd fp16 fp16fml crypto dotprod)
-v8_4_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto)
-v8_5_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto)
+v8_2_a_simd_variants	:= $(call all_feat_combs, simd fp16 fp16fml crypto dotprod i8mm bf16)
+v8_4_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto i8mm bf16)
+v8_5_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto i8mm bf16)
+v8_6_a_simd_variants	:= $(call all_feat_combs, simd fp16 crypto i8mm bf16)
 v8_r_nosimd_variants	:= +crc
 
 ifneq (,$(HAS_APROFILE))
@@ -185,6 +186,13 @@ MULTILIB_MATCHES	+= march?armv7=march?armv8.5-a
 MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_5_a_simd_variants), \
 			     march?armv7+fp=march?armv8.5-a$(ARCH))
 
+# Baseline v8.6-a: map down to baseline v8-a
+MULTILIB_MATCHES	+= march?armv7=march?armv8.6-a
+
+# Map all v8.6-a SIMD variants
+MULTILIB_MATCHES	+= $(foreach ARCH, $(v8_6_a_simd_variants), \
+			     march?armv7+fp=march?armv8.6-a$(ARCH))
+
 endif		# Not APROFILE.
 
 # Use Thumb libraries for everything.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 236bed92724..eb49f418741 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17429,6 +17429,7 @@ Permissible names are:
 @samp{armv8-a}, @samp{armv8.1-a}, @samp{armv8.2-a}, @samp{armv8.3-a},
 @samp{armv8.4-a},
 @samp{armv8.5-a},
+@samp{armv8.6-a},
 @samp{armv7-r},
 @samp{armv8-r},
 @samp{armv6-m}, @samp{armv6s-m},
@@ -17672,6 +17673,14 @@ Speculation Barrier Instruction.
 
 @item +predres
 Execution and Data Prediction Restriction Instructions.
+
+@item +i8mm
+8-bit Integer Matrix Multiply instructions.
+This also enables Advanced SIMD and floating-point instructions.
+
+@item +bf16
+Brain half-precision floating-point instructions.
+This also enables Advanced SIMD and floating-point instructions.
 @end table
 
 @item armv8.4-a
@@ -17701,6 +17710,14 @@ Speculation Barrier Instruction.
 
 @item +predres
 Execution and Data Prediction Restriction Instructions.
+
+@item +i8mm
+8-bit Integer Matrix Multiply instructions.
+This also enables Advanced SIMD and floating-point instructions.
+
+@item +bf16
+Brain half-precision floating-point instructions.
+This also enables Advanced SIMD and floating-point instructions.
 @end table
 
 @item armv8.5-a
@@ -17724,6 +17741,45 @@ Disable the cryptographic extension.
 
 @item +nofp
 Disable the floating-point, Advanced SIMD and cryptographic instructions.
+
+@item +i8mm
+8-bit Integer Matrix Multiply instructions.
+This also enables Advanced SIMD and floating-point instructions.
+
+@item +bf16
+Brain half-precision floating-point instructions.
+This also enables Advanced SIMD and floating-point instructions.
+@end table
+
+@item armv8.6-a
+@table @samp
+@item +fp16
+The half-precision floating-point data processing instructions.
+This also enables the Advanced SIMD and floating-point instructions as well
+as the Dot Product extension and the half-precision floating-point fmla
+extension.
+
+@item +simd
+The ARMv8.3-A Advanced SIMD and floating-point instructions as well as the
+Dot Product extension.
+
+@item +crypto
+The cryptographic instructions.  This also enables the Advanced SIMD and
+floating-point instructions as well as the Dot Product extension.
+
+@item +nocrypto
+Disable the cryptographic extension.
+
+@item +nofp
+Disable the floating-point, Advanced SIMD and cryptographic instructions.
+
+@item +i8mm
+8-bit Integer Matrix Multiply instructions.
+This also enables Advanced SIMD and floating-point instructions.
+
+@item +bf16
+Brain half-precision floating-point instructions.
+This also enables Advanced SIMD and floating-point instructions.
 @end table
 
 @item armv7-r
diff --git a/gcc/testsuite/gcc.target/arm/multilib.exp b/gcc/testsuite/gcc.target/arm/multilib.exp
index dcea829965e..7807485352f 100644
--- a/gcc/testsuite/gcc.target/arm/multilib.exp
+++ b/gcc/testsuite/gcc.target/arm/multilib.exp
@@ -126,6 +126,14 @@ if {[multilib_config "aprofile"] } {
 	{-march=armv8.5-a+simd+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
 	{-march=armv8.5-a+simd+fp16+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
 	{-march=armv8.5-a+simd+nofp+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.6-a+crypto -mfloat-abi=soft} "thumb/v8-a/nofp"
+	{-march=armv8.6-a+simd+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.6-a+simd+crypto+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
+	{-march=armv8.6-a+simd+nofp+crypto -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.6-a+fp16 -mfloat-abi=soft} "thumb/v8-a/nofp"
+	{-march=armv8.6-a+simd+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
+	{-march=armv8.6-a+simd+fp16+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
+	{-march=armv8.6-a+simd+nofp+fp16 -mfloat-abi=softfp} "thumb/v8-a+simd/softfp"
 	{-mcpu=cortex-a53+crypto -mfloat-abi=hard} "thumb/v8-a+simd/hard"
 	{-mcpu=cortex-a53+nofp -mfloat-abi=softfp} "thumb/v8-a/nofp"
 	{-march=armv8-a+crc -mfloat-abi=hard -mfpu=vfp} "thumb/v8-a+simd/hard"

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16
  2019-12-12 17:30 ` Dennis Zhang
@ 2019-12-20 15:35   ` Kyrill Tkachov
  2020-01-02 17:28     ` Dennis Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Kyrill Tkachov @ 2019-12-20 15:35 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Dennis,

On 12/12/19 5:30 PM, Dennis Zhang wrote:
> Hi all,
>
> On 22/11/2019 14:33, Dennis Zhang wrote:
> > Hi all,
> >
> > This patch is part of a series adding support for Armv8.6-A features.
> > It enables options including -march=armv8.6-a, +i8mm and +bf16.
> > The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
> > Documents are at https://developer.arm.com/docs/ddi0596/latest
> >
> > Regtested for arm-none-linux-gnueabi-armv8-a.
> >
>
> This is an update to rebase the patch to the top.
> Some issues are fixed according to the recent CLI patch for AArch64.
> ChangeLog is updated as following:
>
> gcc/ChangeLog:
>
> 2019-12-12Â  Dennis ZhangÂ  <dennis.zhang@arm.com>
>
> Â Â Â Â Â Â Â  * config/arm/arm-c.c (arm_cpu_builtins): Define
> Â Â Â Â Â Â Â  __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
> Â Â Â Â Â Â Â  __ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
> Â Â Â Â Â Â Â  __ARM_BF16_FORMAT_ALTERNATIVE when enabled.
> Â Â Â Â Â Â Â  * config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
> Â Â Â Â Â Â Â  * config/arm/arm-tables.opt: Regenerated.
> Â Â Â Â Â Â Â  * config/arm/arm.c (arm_option_reconfigure_globals): Initialize
> Â Â Â Â Â Â Â  arm_arch_i8mm and arm_arch_bf16 when enabled.
> Â Â Â Â Â Â Â  * config/arm/arm.h (TARGET_I8MM): New macro.
> Â Â Â Â Â Â Â  (TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
> Â Â Â Â Â Â Â  * config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
> Â Â Â Â Â Â Â  * config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
> Â Â Â Â Â Â Â  * config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
> Â Â Â Â Â Â Â  (v8_6_a_simd_variants): New.
> Â Â Â Â Â Â Â  (v8_*_a_simd_variants): Add i8mm and bf16.
> Â Â Â Â Â Â Â  * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.
>
> gcc/testsuite/ChangeLog:
>
> 2019-12-12Â  Dennis ZhangÂ  <dennis.zhang@arm.com>
>
> Â Â Â Â Â Â Â  * gcc.target/arm/multilib.exp: Add combination tests for 
> armv8.6-a.
>
> Is it OK for trunk?


This is ok for trunk.

Please follow the steps at https://gcc.gnu.org/svnwrite.html to get 
write permission to the repo (listing me as approver).

You can then commit it yourself :)

Thanks,

Kyrill


>
> Many thanks!
> Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16
  2019-12-20 15:35   ` Kyrill Tkachov
@ 2020-01-02 17:28     ` Dennis Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-01-02 17:28 UTC (permalink / raw)
  To: Kyrill Tkachov, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Kyrill,

On 20/12/2019 15:30, Kyrill Tkachov wrote:
> Hi Dennis,
> 
> On 12/12/19 5:30 PM, Dennis Zhang wrote:
>> Hi all,
>>
>> On 22/11/2019 14:33, Dennis Zhang wrote:
>> > Hi all,
>> >
>> > This patch is part of a series adding support for Armv8.6-A features.
>> > It enables options including -march=armv8.6-a, +i8mm and +bf16.
>> > The +i8mm and +bf16 features are optional for Armv8.2-a and onward.
>> > Documents are at https://developer.arm.com/docs/ddi0596/latest
>> >
>> > Regtested for arm-none-linux-gnueabi-armv8-a.
>> >
>>
>> This is an update to rebase the patch to the top.
>> Some issues are fixed according to the recent CLI patch for AArch64.
>> ChangeLog is updated as following:
>>
>> gcc/ChangeLog:
>>
>> 2019-12-12  Dennis Zhang  <dennis.zhang@arm.com>
>>
>>         * config/arm/arm-c.c (arm_cpu_builtins): Define
>>         __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC,
>>         __ARM_FEATURE_BF16_SCALAR_ARITHMETIC, and
>>         __ARM_BF16_FORMAT_ALTERNATIVE when enabled.
>>         * config/arm/arm-cpus.in (armv8_6, i8mm, bf16): New features.
>>         * config/arm/arm-tables.opt: Regenerated.
>>         * config/arm/arm.c (arm_option_reconfigure_globals): Initialize
>>         arm_arch_i8mm and arm_arch_bf16 when enabled.
>>         * config/arm/arm.h (TARGET_I8MM): New macro.
>>         (TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
>>         * config/arm/t-aprofile: Add matching rules for -march=armv8.6-a.
>>         * config/arm/t-arm-elf (all_v8_archs): Add armv8.6-a.
>>         * config/arm/t-multilib: Add matching rules for -march=armv8.6-a.
>>         (v8_6_a_simd_variants): New.
>>         (v8_*_a_simd_variants): Add i8mm and bf16.
>>         * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-12  Dennis Zhang  <dennis.zhang@arm.com>
>>
>>         * gcc.target/arm/multilib.exp: Add combination tests for 
>> armv8.6-a.
>>
>> Is it OK for trunk?
> 
> 
> This is ok for trunk.
> 
> Please follow the steps at https://gcc.gnu.org/svnwrite.html to get 
> write permission to the repo (listing me as approver).
> 
> You can then commit it yourself :)

Thanks for the sponsorship. I have done with the write permission.

The patch is committed as r279839.

Cheers
Dennis



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
  2019-11-22 14:33 [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16 Dennis Zhang
  2019-12-12 17:30 ` Dennis Zhang
@ 2020-03-12 12:05 ` Dennis Zhang
  2020-03-13 19:31   ` [PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers Dennis Zhang
  2020-03-18  9:04   ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Kyrylo Tkachov
  1 sibling, 2 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-03-12 12:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]

Hi all,

This patch is part of a series that adds support for the ARMv8.m Custom Datapath Extension.
This patch defines the options cdecp0-cdecp7 for CLI to enable the CDE on corresponding coprocessor 0-7.
It also adds new check-effective for CDE feature.

ISA has been announced at https://developer.arm.com/architectures/instruction-sets/custom-instructions

Regtested and bootstrapped.

Is it OK to commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-03-11  Dennis Zhang  <dennis.zhang@arm.com>

	* config.gcc: Add arm_cde.h.
	* config/arm/arm-c.c (arm_cpu_builtins): Define or undefine
	__ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
	* config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
	* config/arm/arm.c (arm_option_reconfigure_globals): Configure
	arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
	* config/arm/arm.h (TARGET_CDE): New macro.
	* config/arm/arm_cde.h: New file.
	* doc/invoke.texi: Document cdecp[0-7] options.

gcc/testsuite/ChangeLog:

2020-03-11  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/pragma_cde.c: New test.
	* lib/target-supports.exp (arm_v8m_main_cde): New check effective.
	(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: arm-m-cde-cli-20200306.patch --]
[-- Type: text/x-patch; name="arm-m-cde-cli-20200306.patch", Size: 12792 bytes --]

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2df4b36d190..43967b7d1ff 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -346,7 +346,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_cde.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 38edaff17a2..77753015b34 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -227,6 +227,12 @@ arm_cpu_builtins (struct cpp_reader* pfile)
       builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
     }
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_CDE", TARGET_CDE);
+  cpp_undef (pfile, "__ARM_FEATURE_CDE_COPROC");
+  if (TARGET_CDE)
+    builtin_define_with_int_value ("__ARM_FEATURE_CDE_COPROC",
+				   arm_arch_cde_coproc);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
 		      TARGET_BF16_FP);
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 96f584da325..5a7498e18db 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -207,6 +207,16 @@ define feature i8mm
 # Brain half-precision floating-point extension. Optional from v8.2-A.
 define feature bf16
 
+# Arm Custom Datapath Extension (CDE).
+define feature cdecp0
+define feature cdecp1
+define feature cdecp2
+define feature cdecp3
+define feature cdecp4
+define feature cdecp5
+define feature cdecp6
+define feature cdecp7
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -670,6 +680,14 @@ begin arch armv8-m.main
  option fp.dp add FPv5 FP_DBL
  option nofp remove ALL_FP
  option nodsp remove armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8-m.main
 
 begin arch armv8-r
@@ -701,6 +719,14 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add mve armv7em
  option mve.fp add mve FPv5 fp16 mve_float armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8.1-m.main
 
 begin arch iwmmxt
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9cc7bc0e562..9f1e1ec5c88 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1021,6 +1021,13 @@ int arm_arch_i8mm = 0;
 /* Nonzero if chip supports the BFloat16 instructions.  */
 int arm_arch_bf16 = 0;
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+int arm_arch_cde = 0;
+int arm_arch_cde_coproc = 0;
+const int arm_arch_cde_coproc_bits[] = {
+  0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
+};
+
 /* The condition codes of the ARM, and the inverse function.  */
 static const char * const arm_condition_codes[] =
 {
@@ -3740,6 +3747,21 @@ arm_option_reconfigure_globals (void)
       arm_fp16_format = ARM_FP16_FORMAT_IEEE;
     }
 
+  arm_arch_cde = 0;
+  arm_arch_cde_coproc = 0;
+  int cde_bits[] = {isa_bit_cdecp0, isa_bit_cdecp1, isa_bit_cdecp2,
+		    isa_bit_cdecp3, isa_bit_cdecp4, isa_bit_cdecp5,
+		    isa_bit_cdecp6, isa_bit_cdecp7};
+  for (int i = 0, e = ARRAY_SIZE (cde_bits); i < e; i++)
+    {
+      int cde_bit = bitmap_bit_p (arm_active_target.isa, cde_bits[i]);
+      if (cde_bit)
+	{
+	  arm_arch_cde |= cde_bit;
+	  arm_arch_cde_coproc |= arm_arch_cde_coproc_bits[i];
+	}
+    }
+
   /* And finally, set up some quirks.  */
   arm_arch_no_volatile_ce
     = bitmap_bit_p (arm_active_target.isa, isa_bit_quirk_no_volatile_ce);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index e07cf03538c..218ded1c015 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -337,6 +337,9 @@ emission of floating point pcs attributes.  */
 /* Nonzero if disallow volatile memory access in IT block.  */
 #define TARGET_NO_VOLATILE_CE		(arm_arch_no_volatile_ce)
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+#define TARGET_CDE	(arm_arch_cde && arm_arch8 && !arm_arch_notm)
+
 /* Should constant I be slplit for OP.  */
 #define DONT_EARLY_SPLIT_CONSTANT(i, op) \
 				((optimize >= 2) \
@@ -551,6 +554,11 @@ extern int arm_arch_i8mm;
 /* Nonzero if chip supports the BFloat16 instructions.  */
 extern int arm_arch_bf16;
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+extern int arm_arch_cde;
+extern int arm_arch_cde_coproc;
+extern const int arm_arch_cde_coproc_bits[];
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/arm_cde.h b/gcc/config/arm/arm_cde.h
new file mode 100644
index 00000000000..f975754632f
--- /dev/null
+++ b/gcc/config/arm/arm_cde.h
@@ -0,0 +1,40 @@
+/* Arm Custom Datapath Extension (CDE) intrinsics include file.
+
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_CDE_H
+#define _GCC_ARM_CDE_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index af28015234c..1cbac766e7f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -18654,6 +18654,10 @@ The single- and double-precision floating-point instructions.
 
 @item +nofp
 Disable the floating-point extension.
+
+@item +cdecp0, +cdecp1, ... , +cdecp7
+Enable the Custom Datapath Extension (CDE) on selected coprocessors according
+to the numbers given in the options in the range 0 to 7.
 @end table
 
 @item  armv8-m.main
@@ -18672,6 +18676,10 @@ The single- and double-precision floating-point instructions.
 
 @item +nofp
 Disable the floating-point extension.
+
+@item +cdecp0, +cdecp1, ... , +cdecp7
+Enable the Custom Datapath Extension (CDE) on selected coprocessors according
+to the numbers given in the options in the range 0 to 7.
 @end table
 
 @item armv8-r
diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c b/gcc/testsuite/gcc.target/arm/pragma_cde.c
new file mode 100644
index 00000000000..97643a08405
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
@@ -0,0 +1,98 @@
+/* Test for CDE #prama target macros.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_ok } */
+/* { dg-add-options arm_v8m_main_cde } */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main")
+#ifdef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is defined but should not be"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x1
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x2
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x4
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp3")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x8
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp4")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x10
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp5")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x20
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp6")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x40
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp7")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x80
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+cdecp1")
+#if __ARM_FEATURE_CDE_COPROC != 0x3
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ca3895c2269..35e57beb410 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5010,6 +5010,65 @@ proc add_options_for_arm_v8_2a_bf16_neon { flags } {
     return "$flags $et_arm_v8_2a_bf16_neon_flags"
 }
 
+# A series of routines are created to 1) check if a given architecture is
+# effective (check_effective_target_*_ok) and then 2) give the corresponding
+# flags that enable the architecture (add_options_for_*).
+# The series includes:
+#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
+#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
+#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
+# Usage:
+#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
+#   /* { dg-add-options arm_v8m_main_cde } */
+# The tests are valid for Arm.
+
+foreach { armfunc armflag armdef } {
+	arm_v8m_main_cde
+		"-march=armv8-m.main+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE)"
+	arm_v8m_main_cde_fp
+		"-march=armv8-m.main+fp+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE) && defined (__ARM_FP)"
+	arm_v8_1m_main_cde_mve
+		"-march=armv8.1-m.main+mve+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE) && defined (__ARM_FEATURE_MVE)"
+	} {
+    eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
+	proc check_effective_target_FUNC_ok_nocache { } {
+	    global et_FUNC_flags
+	    set et_FUNC_flags ""
+
+	    if { ![istarget arm*-*-*] } {
+		return 0;
+	    }
+
+	    if { [check_no_compiler_messages_nocache FUNC_ok assembly {
+		#if !(DEF)
+		#error "DEF failed"
+		#endif
+	    } "FLAG"] } {
+		    set et_FUNC_flags "FLAG"
+		    return 1
+	    }
+
+	    return 0;
+	}
+
+	proc check_effective_target_FUNC_ok { } {
+	    return [check_cached_effective_target FUNC_ok \
+		    check_effective_target_FUNC_ok_nocache]
+	}
+
+	proc add_options_for_FUNC { flags } {
+	    if { ! [check_effective_target_FUNC_ok] } {
+		return "$flags"
+	    }
+	    global et_FUNC_flags
+	    return "$flags $et_FUNC_flags"
+	}
+    }]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers
  2020-03-12 12:05 ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Dennis Zhang
@ 2020-03-13 19:31   ` Dennis Zhang
  2020-03-20 15:18     ` Dennis Zhang
  2020-03-18  9:04   ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Kyrylo Tkachov
  1 sibling, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-03-13 19:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 2358 bytes --]

Hi all,

This patch is part of a series that adds support for the ARMv8.m Custom Datapath Extension (CDE).
It enables the ACLE intrinsics calling VCX1<A>, VCX2<A>, and VCX3<A> instructions who work with FPU/MVE 32-bit/64-bit registers.

This patch depends on the CDE feature patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
It also depends on the MVE framework patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
ISA has been announced at https://developer.arm.com/architectures/instruction-sets/custom-instructions

Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.

Is it OK for commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-03-12  Dennis Zhang  <dennis.zhang@arm.com>
	     Matthew Malcomson <matthew.malcomson@arm.com>

	* config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
	(CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
	(CX_TERNARY_QUALIFIERS): Likewise.
	(ARM_BUILTIN_CDE_PATTERN_START): Likewise.
	(ARM_BUILTIN_CDE_PATTERN_END): Likewise.
	(arm_init_acle_builtins): Initialize CDE builtins.
	(arm_expand_acle_builtin): Check CDE constant operands.
	* config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the range
	of CDE constant operand.
	(ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3): Likewise.
	* config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
	(__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
	(__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
	(__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
	(__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
	* config/arm/arm_cde_builtins.def: New file.
	* config/arm/iterators.md (V_reg): New attribute of SI.
	* config/arm/predicates.md (const_int_coproc_operand): New.
	(const_int_vcde1_operand, const_int_vcde2_operand): New.
	(const_int_vcde3_operand): New.
	* config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
	* config/arm/vfp.md (arm_vcx1<mode>): New entry.
	(arm_vcx1a<mode>, arm_vcx2<mode>, arm_vcx2a<mode>): Likewise.
	(arm_vcx3<mode>, arm_vcx3a<mode>): Likewise.

gcc/testsuite/ChangeLog:

2020-03-12  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/acle/cde_v_1.c: New test.
	* gcc.target/arm/acle/cde_v_1_err.c: New test.
	* gcc.target/arm/acle/cde_v_1_mve.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: arm-m-cde-vcxsidi-20200312.patch --]
[-- Type: text/x-patch; name="arm-m-cde-vcxsidi-20200312.patch", Size: 29130 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 4d31405cf6e09e3a61faa3e8142940bbdb23c60a..89142a276b071b069cddabb5170ad0d4ca213d20 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -305,6 +305,35 @@ arm_mrrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define MRRC_QUALIFIERS \
   (arm_mrrc_qualifiers)
 
+/* T (immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate, qualifier_unsigned_immediate };
+#define CX_IMM_QUALIFIERS (arm_cx_imm_qualifiers)
+
+/* T (immediate, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_unary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_UNARY_QUALIFIERS (arm_cx_unary_qualifiers)
+
+/* T (immediate, T, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_binary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate,
+      qualifier_none, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_BINARY_QUALIFIERS (arm_cx_binary_qualifiers)
+
+/* T (immediate, T, T, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_ternary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate,
+      qualifier_none, qualifier_none, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_TERNARY_QUALIFIERS (arm_cx_ternary_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
    which we represent with qualifier_void.  Their first operand will be
    a DImode pointer to the location to store to, so we must use
@@ -438,7 +467,23 @@ static arm_builtin_datum acle_builtin_data[] =
 };
 
 #undef VAR1
+/* IMM_MAX sets the maximum valid value of the CDE immediate operand.
+   ECF_FLAG sets the flag used for set_call_expr_flags.  */
+#define VAR1(T, N, A, IMM_MAX, ECF_FLAG) \
+  {{#N #A, UP (A), CODE_FOR_arm_##N##A, 0, T##_QUALIFIERS}, IMM_MAX, ECF_FLAG},
+
+typedef struct {
+  arm_builtin_datum base;
+  unsigned int imm_max;
+  int ecf_flag;
+} arm_builtin_cde_datum;
+
+static arm_builtin_cde_datum cde_builtin_data[] =
+{
+#include "arm_cde_builtins.def"
+};
 
+#undef VAR1
 #define VAR1(T, N, X) \
   ARM_BUILTIN_NEON_##N##X,
 
@@ -732,6 +777,14 @@ enum arm_builtins
 
 #include "arm_acle_builtins.def"
 
+#undef VAR1
+#define VAR1(T, N, X, ... ) \
+  ARM_BUILTIN_##N##X,
+
+  ARM_BUILTIN_CDE_BASE,
+
+#include "arm_cde_builtins.def"
+
   ARM_BUILTIN_MAX
 };
 
@@ -744,6 +797,12 @@ enum arm_builtins
 #define ARM_BUILTIN_ACLE_PATTERN_START \
   (ARM_BUILTIN_ACLE_BASE + 1)
 
+#define ARM_BUILTIN_CDE_PATTERN_START \
+  (ARM_BUILTIN_CDE_BASE + 1)
+
+#define ARM_BUILTIN_CDE_PATTERN_END \
+  (ARM_BUILTIN_CDE_BASE + ARRAY_SIZE (cde_builtin_data))
+
 #undef CF
 #undef VAR1
 #undef VAR2
@@ -1263,6 +1322,15 @@ arm_init_acle_builtins (void)
       arm_builtin_datum *d = &acle_builtin_data[i];
       arm_init_builtin (fcode, d, "__builtin_arm");
     }
+
+  fcode = ARM_BUILTIN_CDE_PATTERN_START;
+  for (i = 0; i < ARRAY_SIZE (cde_builtin_data); i++, fcode++)
+    {
+      arm_builtin_cde_datum *cde = &cde_builtin_data[i];
+      arm_builtin_datum *d = &cde->base;
+      arm_init_builtin (fcode, d, "__builtin_arm");
+      set_call_expr_flags (arm_builtin_decls[fcode], cde->ecf_flag);
+    }
 }
 
 /* Set up all the NEON builtins, even builtins for instructions that are not
@@ -2373,8 +2441,29 @@ constant_arg:
 	      if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
 		{
-		  error ("%Kargument %d must be a constant immediate",
-			 exp, argc + 1);
+		  if (IN_RANGE (fcode, ARM_BUILTIN_CDE_PATTERN_START,
+				ARM_BUILTIN_CDE_PATTERN_END))
+		    {
+		      if (argc == 0)
+			{
+			  unsigned int cp_bit = UINTVAL (op[argc]);
+			  if (IN_RANGE (cp_bit, 0, ARM_CDE_CONST_COPROC))
+			    error ("%Kcoprocessor %d is not enabled "
+				   "with +cdecp%d", exp, cp_bit, cp_bit);
+			  else
+			    error ("%Kcoproc must be a constant immediate in "
+				   "range [0-%d] enabled with +cdecp<N>", exp,
+				   ARM_CDE_CONST_COPROC);
+			}
+		      else
+			error ("%Kargument %d must be a constant immediate "
+			       "in range [0-%d]", exp, argc + 1,
+			       cde_builtin_data[fcode -
+			       ARM_BUILTIN_CDE_PATTERN_START].imm_max);
+		    }
+		  else
+		    error ("%Kargument %d must be a constant immediate",
+			   exp, argc + 1);
 		  /* We have failed to expand the pattern, and are safely
 		     in to invalid code.  But the mid-end will still try to
 		     build an assignment for this node while it expands,
@@ -2595,8 +2684,12 @@ arm_expand_acle_builtin (int fcode, tree exp, rtx target)
       /* Don't generate any RTL.  */
       return const0_rtx;
     }
+
+  gcc_assert (fcode != ARM_BUILTIN_CDE_BASE);
   arm_builtin_datum *d
-    = &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START];
+    = (fcode < ARM_BUILTIN_CDE_BASE)
+      ? &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START]
+      : &cde_builtin_data[fcode - ARM_BUILTIN_CDE_PATTERN_START].base;
 
   return arm_expand_builtin_1 (fcode, exp, target, d);
 }
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 218ded1c015389214a647e346dba09c1f30ed407..27baa006c1867f66c38009137968c4e365034cbe 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -558,6 +558,10 @@ extern int arm_arch_bf16;
 extern int arm_arch_cde;
 extern int arm_arch_cde_coproc;
 extern const int arm_arch_cde_coproc_bits[];
+#define ARM_CDE_CONST_COPROC	7
+#define ARM_VCDE_CONST_1	((1 << 11) - 1)
+#define ARM_VCDE_CONST_2	((1 << 6 ) - 1)
+#define ARM_VCDE_CONST_3	((1 << 3 ) - 1)
 
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
diff --git a/gcc/config/arm/arm_cde.h b/gcc/config/arm/arm_cde.h
index f975754632f6e87da331a19a63300c4de3c1f033..4c9f7ebeed4e2abf532f53040f5891da8b1aadac 100644
--- a/gcc/config/arm/arm_cde.h
+++ b/gcc/config/arm/arm_cde.h
@@ -33,6 +33,77 @@ extern "C" {
 
 #include <stdint.h>
 
+#if defined (__ARM_FEATURE_CDE)
+
+#if defined (__ARM_FP) || defined (__ARM_FEATURE_MVE)
+
+/* CDE builtins using FPU/MVE registers.  */
+
+/* uint32_t
+   __arm_vcx1_u32(int coproc, uint32_t imm);  */
+#define __arm_vcx1_u32(coproc, imm) \
+	__builtin_arm_vcx1si(coproc, imm)
+
+/* uint32_t
+   __arm_vcx1a_u32(int coproc, uint32_t acc, uint32_t imm);  */
+#define __arm_vcx1a_u32(coproc, acc, imm) \
+	__builtin_arm_vcx1asi(coproc, acc, imm)
+
+/* uint32_t
+   __arm_vcx2_u32(int coproc, uint32_t n, uint32_t imm);  */
+#define __arm_vcx2_u32(coproc, n, imm) \
+	__builtin_arm_vcx2si(coproc, n, imm)
+
+/* uint32_t
+   __arm_vcx2a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t imm);  */
+#define __arm_vcx2a_u32(coproc, acc, n, imm) \
+	__builtin_arm_vcx2asi(coproc, acc, n, imm)
+
+/* uint32_t
+   __arm_vcx3_u32(int coproc, uint32_t n, uint32_t m, uint32_t imm);  */
+#define __arm_vcx3_u32(coproc, n, m, imm) \
+	__builtin_arm_vcx3si(coproc, n, m, imm)
+
+/* uint32_t
+   __arm_vcx3a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t m,
+		   uint32_t imm);  */
+#define __arm_vcx3a_u32(coproc, acc, n, m, imm) \
+	__builtin_arm_vcx3asi(coproc, acc, n, m, imm)
+
+/* uint64_t
+   __arm_vcx1d_u64(int coproc, uint32_t imm);  */
+#define __arm_vcx1d_u64(coproc, imm) \
+	__builtin_arm_vcx1di(coproc, imm)
+
+/* uint64_t
+   __arm_vcx1da_u64(int coproc, uint64_t acc, uint32_t imm);  */
+#define __arm_vcx1da_u64(coproc, acc, imm) \
+	__builtin_arm_vcx1adi(coproc, acc, imm)
+
+/* uint64_t
+   __arm_vcx2d_u64(int coproc, uint64_t m, uint32_t imm);  */
+#define __arm_vcx2d_u64(coproc, m, imm) \
+	__builtin_arm_vcx2di(coproc, m, imm)
+
+/* uint64_t
+   __arm_vcx2da_u64(int coproc, uint64_t acc, uint64_t m, uint32_t imm);  */
+#define __arm_vcx2da_u64(coproc, acc, m, imm) \
+	__builtin_arm_vcx2adi(coproc, acc, m, imm)
+
+/* uint64_t
+   __arm_vcx3d_u64(int coproc, uint64_t n, uint64_t m, uint32_t imm);  */
+#define __arm_vcx3d_u64(coproc, n, m, imm) \
+	__builtin_arm_vcx3di(coproc, n, m, imm)
+
+/* uint64_t
+   __arm_vcx3da_u64(int coproc, uint64_t acc, uint64_t n, uint64_t m,
+		    uint32_t imm);  */
+#define __arm_vcx3da_u64(coproc, acc, n, m, imm) \
+	__builtin_arm_vcx3adi(coproc, acc, n, m, imm)
+
+#endif /* __ARM_FP || __ARM_FEATURE_MVE.  */
+#endif /* __ARM_FEATURE_CDE.  */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_cde_builtins.def b/gcc/config/arm/arm_cde_builtins.def
new file mode 100644
index 0000000000000000000000000000000000000000..a9fea937b9650f21a26d8183572b550e39b0fe7d
--- /dev/null
+++ b/gcc/config/arm/arm_cde_builtins.def
@@ -0,0 +1,33 @@
+/* Arm Custom Datapath Extension (CDE) builtin definitions.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#undef CDE_VAR2
+#define CDE_VAR2(T, N, A, B, IMM_MAX, ECF_FLAG) \
+  VAR1 (T, N, A, IMM_MAX, ECF_FLAG) \
+  VAR1 (T, N, B, IMM_MAX, ECF_FLAG)
+
+CDE_VAR2 (CX_IMM, vcx1, si, di, ARM_VCDE_CONST_1, ECF_CONST)
+CDE_VAR2 (CX_UNARY, vcx1a, si, di, ARM_VCDE_CONST_1, ECF_CONST)
+CDE_VAR2 (CX_UNARY, vcx2, si, di, ARM_VCDE_CONST_2, ECF_CONST)
+CDE_VAR2 (CX_BINARY, vcx2a, si, di, ARM_VCDE_CONST_2, ECF_CONST)
+CDE_VAR2 (CX_BINARY, vcx3, si, di, ARM_VCDE_CONST_3, ECF_CONST)
+CDE_VAR2 (CX_TERNARY, vcx3a, si, di, ARM_VCDE_CONST_3, ECF_CONST)
+
+#undef CDE_VAR2
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 5f4e3d1235813ab81c176505f9a98d702359f7ec..66ef724945bdf4c53e7363fb7c371ccc6208d387 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -596,7 +596,7 @@
 			 (V2SI "P") (V4SI  "q")
 			 (V2SF "P") (V4SF  "q")
 			 (DI   "P") (V2DI  "q")
-			 (V2HF "") (SF   "")
+			 (V2HF "") (SF   "") (SI "")
 			 (DF    "P") (HF   "")])
 
 ;; Output template to select the high VFP register of a mult-register value.
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 3a3941e22462c435d1bcff74b2db08d6f00ea61c..39ac7181323e0ace4c703710aa21c3aafc355955 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -182,6 +182,23 @@
   (and (match_operand 0 "const_int_operand")
        (match_test "satisfies_constraint_M (op)")))
 
+(define_predicate "const_int_coproc_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_CDE_CONST_COPROC)")
+       (match_test "arm_arch_cde_coproc_bits[UINTVAL (op)] & arm_arch_cde_coproc")))
+
+(define_predicate "const_int_vcde1_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_1)")))
+
+(define_predicate "const_int_vcde2_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_2)")))
+
+(define_predicate "const_int_vcde3_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_3)")))
+
 ;; This doesn't have to do much because the constant is already checked
 ;; in the shift_operator predicate.
 (define_predicate "shift_amount_operand"
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b36ae512a6ebcf231b46a24e127c62e22e71a34f..7ee24f63fcc1309ae43c8569eb9c072c3c9e6876 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -154,6 +154,8 @@
   UNSPEC_SMUADX		; Represent the SMUADX operation.
   UNSPEC_SSAT16		; Represent the SSAT16 operation.
   UNSPEC_USAT16		; Represent the USAT16 operation.
+  UNSPEC_VCDE		; Custom Datapath Extension instruction.
+  UNSPEC_VCDEA		; Custom Datapath Extension instruction.
 ])
 
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 99d6be4a94210d05a877a0cf38c02a73cc8cb1d6..fd4db6fadbcb5bce3d2f3ec091e214be43a70312 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -2133,3 +2133,74 @@
   DONE;
 }
 )
+
+;; CDE instructions using FPU/MVE S/D registers
+
+(define_insn "arm_vcx1<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SI 2 "const_int_vcde1_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx1\\tp%c1, %<V_reg>0, #%c2"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx1a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SI 3 "const_int_vcde1_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx1a\\tp%c1, %<V_reg>0, #%c3"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx2<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "t")
+		      (match_operand:SI 3 "const_int_vcde2_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx2\\tp%c1, %<V_reg>0, %<V_reg>2, #%c3"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx2a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SI 4 "const_int_vcde2_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx2a\\tp%c1, %<V_reg>0, %<V_reg>3, #%c4"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx3<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "t")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SI 4 "const_int_vcde3_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx3\\tp%c1, %<V_reg>0, %<V_reg>2, %<V_reg>3, #%c4"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx3a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SIDI 4 "register_operand" "t")
+		      (match_operand:SI 5 "const_int_vcde3_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx3a\\tp%c1, %<V_reg>0, %<V_reg>3, %<V_reg>4, #%c5"
+  [(set_attr "type" "coproc")]
+)
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..3104db4ae608365667f4b617c5a4d58c90f5f5aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c
@@ -0,0 +1,94 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_fp_ok } */
+/* { dg-options "-save-temps -O2" } */
+/* { dg-add-options arm_v8m_main_cde_fp } */
+
+#include "arm_cde.h"
+
+#define TEST0(T, N, C, I) \
+T test_arm_##N##_##C##_##I () { \
+  return __arm_##N (C, I); \
+}
+
+#define TEST1(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, I); \
+}
+
+#define TEST2(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, I); \
+}
+
+#define TEST3(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, a, I); \
+}
+
+#define TEST_ALL(C) \
+TEST0 (uint32_t, vcx1_u32,	C, 0) \
+TEST1 (uint32_t, vcx1a_u32,	C, 0) \
+TEST1 (uint32_t, vcx2_u32,	C, 0) \
+TEST2 (uint32_t, vcx2a_u32,	C, 0) \
+TEST2 (uint32_t, vcx3_u32,	C, 0) \
+TEST3 (uint32_t, vcx3a_u32,	C, 0) \
+TEST0 (uint64_t, vcx1d_u64,	C, 0) \
+TEST1 (uint64_t, vcx1da_u64,	C, 0) \
+TEST1 (uint64_t, vcx2d_u64,	C, 0) \
+TEST2 (uint64_t, vcx2da_u64,	C, 0) \
+TEST2 (uint64_t, vcx3d_u64,	C, 0) \
+TEST3 (uint64_t, vcx3da_u64,	C, 0) \
+TEST0 (uint32_t, vcx1_u32,	C, 2047) \
+TEST1 (uint32_t, vcx1a_u32,	C, 2047) \
+TEST1 (uint32_t, vcx2_u32,	C, 63) \
+TEST2 (uint32_t, vcx2a_u32,	C, 63) \
+TEST2 (uint32_t, vcx3_u32,	C, 7) \
+TEST3 (uint32_t, vcx3a_u32,	C, 7) \
+TEST0 (uint64_t, vcx1d_u64,	C, 2047) \
+TEST1 (uint64_t, vcx1da_u64,	C, 2047) \
+TEST1 (uint64_t, vcx2d_u64,	C, 63) \
+TEST2 (uint64_t, vcx2da_u64,	C, 63) \
+TEST2 (uint64_t, vcx3d_u64,	C, 7) \
+TEST3 (uint64_t, vcx3da_u64,	C, 7)
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+fp")
+TEST_ALL (0)
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1+fp")
+TEST_ALL (1)
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2+cdecp3+cdecp4+cdecp5+cdecp6+cdecp7+fp")
+TEST_ALL (2)
+TEST_ALL (3)
+TEST_ALL (4)
+TEST_ALL (5)
+TEST_ALL (6)
+TEST_ALL (7)
+#pragma GCC pop_options
+
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp1, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp2, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp3, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp4, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp5, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp6, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp7, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp[0-7], s[0-9]+, #2047} 8 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp[0-7], s[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp[0-7], s[0-9]+, s[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp[0-7], s[0-9]+, s[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp[0-7], s[0-9]+, s[0-9]+, s[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp[0-7], s[0-9]+, s[0-9]+, s[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp[0-7], d[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp[0-7], d[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp[0-7], d[0-9]+, d[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp[0-7], d[0-9]+, d[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp[0-7], d[0-9]+, d[0-9]+, d[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp[0-7], d[0-9]+, d[0-9]+, d[0-9]+, #[0,7]} 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c
new file mode 100644
index 0000000000000000000000000000000000000000..023fab4ef9bf46dbf630d4698c2a0570bd2e4d14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c
@@ -0,0 +1,127 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_fp_ok } */
+/* { dg-add-options arm_v8m_main_cde_fp } */
+
+#include "arm_cde.h"
+
+uint64_t test_coproc_range (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (8, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1a_u32 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2_u32 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2a_u32 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3_u32 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3a_u32 (8, a, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1d_u64 (8, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1da_u64 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2d_u64 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2da_u64 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3d_u64 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3da_u64 (8, a, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  return res;
+}
+
+uint64_t test_imm_range (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (0, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx1a_u32 (0, a, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx2_u32 (0, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx2a_u32 (0, a, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx3_u32 (0, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3a_u32 (0, a, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1d_u64 (0, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx1da_u64 (0, a, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx2d_u64 (0, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx2da_u64 (0, a, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx3d_u64 (0, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3da_u64 (0, a, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  return res;
+}
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1+fp")
+uint64_t test_coproc_match_1 (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1a_u32 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2_u32 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2a_u32 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3_u32 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3a_u32 (0, a, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1d_u64 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1da_u64 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2d_u64 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2da_u64 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3d_u64 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3da_u64 (0, a, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  return res;
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2+fp")
+uint32_t test_coproc_match_2 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp3+fp")
+uint32_t test_coproc_match_3 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp4+fp")
+uint32_t test_coproc_match_4 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp5+fp")
+uint32_t test_coproc_match_5 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp6+fp")
+uint32_t test_coproc_match_6 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp7+fp")
+uint32_t test_coproc_match_7 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+fp")
+uint32_t test_coproc_match_0 ()
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (1, 0); /* { dg-error {coprocessor 1 is not enabled with \+cdecp1} } */
+  res += __arm_vcx1_u32 (2, 0); /* { dg-error {coprocessor 2 is not enabled with \+cdecp2} } */
+  res += __arm_vcx1_u32 (3, 0); /* { dg-error {coprocessor 3 is not enabled with \+cdecp3} } */
+  res += __arm_vcx1_u32 (4, 0); /* { dg-error {coprocessor 4 is not enabled with \+cdecp4} } */
+  res += __arm_vcx1_u32 (5, 0); /* { dg-error {coprocessor 5 is not enabled with \+cdecp5} } */
+  res += __arm_vcx1_u32 (6, 0); /* { dg-error {coprocessor 6 is not enabled with \+cdecp6} } */
+  res += __arm_vcx1_u32 (7, 0); /* { dg-error {coprocessor 7 is not enabled with \+cdecp7} } */
+  return res;
+}
+#pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c
new file mode 100644
index 0000000000000000000000000000000000000000..5140c3f521a628c4ccc4ca670876a0b0468efa37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c
@@ -0,0 +1,56 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_main_cde_mve_ok } */
+/* { dg-options "-save-temps -O2" } */
+/* { dg-add-options arm_v8_1m_main_cde_mve } */
+
+#include "arm_cde.h"
+
+#define TEST0(T, N, C, I) \
+T test_arm_##N##_##C##_##I () { \
+  return __arm_##N (C, I); \
+}
+
+#define TEST1(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, I); \
+}
+
+#define TEST2(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, I); \
+}
+
+#define TEST3(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, a, I); \
+}
+
+#define TEST_ALL(C) \
+TEST0 (uint32_t, vcx1_u32,	C, 0) \
+TEST1 (uint32_t, vcx1a_u32,	C, 0) \
+TEST1 (uint32_t, vcx2_u32,	C, 0) \
+TEST2 (uint32_t, vcx2a_u32,	C, 0) \
+TEST2 (uint32_t, vcx3_u32,	C, 0) \
+TEST3 (uint32_t, vcx3a_u32,	C, 0) \
+TEST0 (uint64_t, vcx1d_u64,	C, 0) \
+TEST1 (uint64_t, vcx1da_u64,	C, 0) \
+TEST1 (uint64_t, vcx2d_u64,	C, 0) \
+TEST2 (uint64_t, vcx2da_u64,	C, 0) \
+TEST2 (uint64_t, vcx3d_u64,	C, 0) \
+TEST3 (uint64_t, vcx3da_u64,	C, 0)
+
+TEST_ALL (0)
+
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp0, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp0, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp0, s[0-9]+, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp0, s[0-9]+, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp0, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp0, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp0, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp0, d[0-9]+, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp0, d[0-9]+, d[0-9]+, d[0-9]+, #0} 1 } } */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
  2020-03-12 12:05 ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Dennis Zhang
  2020-03-13 19:31   ` [PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers Dennis Zhang
@ 2020-03-18  9:04   ` Kyrylo Tkachov
  2020-03-19 14:02     ` Dennis Zhang
  1 sibling, 1 reply; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-03-18  9:04 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Dennis,

> -----Original Message-----
> From: Dennis Zhang <Dennis.Zhang@arm.com>
> Sent: 12 March 2020 12:06
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; 
> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo Tkachov 
> <Kyrylo.Tkachov@arm.com>
> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> (CDE): enable the feature
> 
> Hi all,
> 
> This patch is part of a series that adds support for the ARMv8.m 
> Custom Datapath Extension.
> This patch defines the options cdecp0-cdecp7 for CLI to enable the CDE 
> on corresponding coprocessor 0-7.
> It also adds new check-effective for CDE feature.
> 
> ISA has been announced at
> https://developer.arm.com/architectures/instruction-sets/custom-
> instructions
> 
> Regtested and bootstrapped.
> 
> Is it OK to commit please?

Can you please rebase this patch on top of the recent MVE commits?
It currently doesn't apply cleanly to trunk.
Thanks,
Kyrill

> 
> Cheers
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-03-11  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * config.gcc: Add arm_cde.h.
> * config/arm/arm-c.c (arm_cpu_builtins): Define or undefine 
> __ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
> * config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
> * config/arm/arm.c (arm_option_reconfigure_globals): Configure 
> arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
> * config/arm/arm.h (TARGET_CDE): New macro.
> * config/arm/arm_cde.h: New file.
> * doc/invoke.texi: Document cdecp[0-7] options.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-03-11  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * gcc.target/arm/pragma_cde.c: New test.
> * lib/target-supports.exp (arm_v8m_main_cde): New check effective.
> (arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
  2020-03-18  9:04   ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Kyrylo Tkachov
@ 2020-03-19 14:02     ` Dennis Zhang
  2020-03-19 17:48       ` Kyrylo Tkachov
  0 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-03-19 14:02 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

Hi Kyrylo,

>________________________________________
>From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>Sent: Wednesday, March 18, 2020 9:04 AM
>To: Dennis Zhang; gcc-patches@gcc.gnu.org
>Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
>Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
>
>Hi Dennis,
>
>> -----Original Message-----
>> From: Dennis Zhang <Dennis.Zhang@arm.com>
>> Sent: 12 March 2020 12:06
>> To: gcc-patches@gcc.gnu.org
>> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
>> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo Tkachov
>> <Kyrylo.Tkachov@arm.com>
>> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
>> (CDE): enable the feature
>>
>> Hi all,
>>
>> This patch is part of a series that adds support for the ARMv8.m
>> Custom Datapath Extension.
>> This patch defines the options cdecp0-cdecp7 for CLI to enable the CDE
>> on corresponding coprocessor 0-7.
>> It also adds new check-effective for CDE feature.
>>
>> ISA has been announced at
>> https://developer.arm.com/architectures/instruction-sets/custom-
>> instructions
>>
>> Regtested and bootstrapped.
>>
>> Is it OK to commit please?
>
>Can you please rebase this patch on top of the recent MVE commits?
>It currently doesn't apply cleanly to trunk.
>Thanks,
>Kyrill

The rebase patches is as attached.
Is it OK to commit?

Thanks
Dennis

[-- Attachment #2: arm-m-cde-cli-20200318.patch --]
[-- Type: application/octet-stream, Size: 12779 bytes --]

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 13e3cb753e2..7624c654c51 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -346,7 +346,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_mve.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_mve.h arm_cde.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 73bdb9cfae0..7e92e8a83ae 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -237,6 +237,12 @@ arm_cpu_builtins (struct cpp_reader* pfile)
       builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
     }
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_CDE", TARGET_CDE);
+  cpp_undef (pfile, "__ARM_FEATURE_CDE_COPROC");
+  if (TARGET_CDE)
+    builtin_define_with_int_value ("__ARM_FEATURE_CDE_COPROC",
+				   arm_arch_cde_coproc);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
 		      TARGET_BF16_FP);
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 77b43090d69..fba34e556fb 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -211,6 +211,16 @@ define feature i8mm
 # Brain half-precision floating-point extension. Optional from v8.2-A.
 define feature bf16
 
+# Arm Custom Datapath Extension (CDE).
+define feature cdecp0
+define feature cdecp1
+define feature cdecp2
+define feature cdecp3
+define feature cdecp4
+define feature cdecp5
+define feature cdecp6
+define feature cdecp7
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -676,6 +686,14 @@ begin arch armv8-m.main
  option fp.dp add FPv5 FP_DBL
  option nofp remove ALL_FP
  option nodsp remove armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8-m.main
 
 begin arch armv8-r
@@ -707,6 +725,14 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add MVE
  option mve.fp add MVE_FP
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8.1-m.main
 
 begin arch iwmmxt
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b3dfa285f01..55a4ebf5147 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1021,6 +1021,13 @@ int arm_arch_i8mm = 0;
 /* Nonzero if chip supports the BFloat16 instructions.  */
 int arm_arch_bf16 = 0;
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+int arm_arch_cde = 0;
+int arm_arch_cde_coproc = 0;
+const int arm_arch_cde_coproc_bits[] = {
+  0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
+};
+
 /* The condition codes of the ARM, and the inverse function.  */
 static const char * const arm_condition_codes[] =
 {
@@ -3740,6 +3747,21 @@ arm_option_reconfigure_globals (void)
       arm_fp16_format = ARM_FP16_FORMAT_IEEE;
     }
 
+  arm_arch_cde = 0;
+  arm_arch_cde_coproc = 0;
+  int cde_bits[] = {isa_bit_cdecp0, isa_bit_cdecp1, isa_bit_cdecp2,
+		    isa_bit_cdecp3, isa_bit_cdecp4, isa_bit_cdecp5,
+		    isa_bit_cdecp6, isa_bit_cdecp7};
+  for (int i = 0, e = ARRAY_SIZE (cde_bits); i < e; i++)
+    {
+      int cde_bit = bitmap_bit_p (arm_active_target.isa, cde_bits[i]);
+      if (cde_bit)
+	{
+	  arm_arch_cde |= cde_bit;
+	  arm_arch_cde_coproc |= arm_arch_cde_coproc_bits[i];
+	}
+    }
+
   /* And finally, set up some quirks.  */
   arm_arch_no_volatile_ce
     = bitmap_bit_p (arm_active_target.isa, isa_bit_quirk_no_volatile_ce);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index fb55f73c62b..343235d0cbc 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -354,6 +354,9 @@ emission of floating point pcs attributes.  */
 /* Nonzero if disallow volatile memory access in IT block.  */
 #define TARGET_NO_VOLATILE_CE		(arm_arch_no_volatile_ce)
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+#define TARGET_CDE	(arm_arch_cde && arm_arch8 && !arm_arch_notm)
+
 /* Should constant I be slplit for OP.  */
 #define DONT_EARLY_SPLIT_CONSTANT(i, op) \
 				((optimize >= 2) \
@@ -568,6 +571,11 @@ extern int arm_arch_i8mm;
 /* Nonzero if chip supports the BFloat16 instructions.  */
 extern int arm_arch_bf16;
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+extern int arm_arch_cde;
+extern int arm_arch_cde_coproc;
+extern const int arm_arch_cde_coproc_bits[];
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/arm_cde.h b/gcc/config/arm/arm_cde.h
new file mode 100644
index 00000000000..f975754632f
--- /dev/null
+++ b/gcc/config/arm/arm_cde.h
@@ -0,0 +1,40 @@
+/* Arm Custom Datapath Extension (CDE) intrinsics include file.
+
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_CDE_H
+#define _GCC_ARM_CDE_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 96a95162696..79ca005b858 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -18665,6 +18665,10 @@ The single- and double-precision floating-point instructions.
 
 @item +nofp
 Disable the floating-point extension.
+
+@item +cdecp0, +cdecp1, ... , +cdecp7
+Enable the Custom Datapath Extension (CDE) on selected coprocessors according
+to the numbers given in the options in the range 0 to 7.
 @end table
 
 @item  armv8-m.main
@@ -18683,6 +18687,10 @@ The single- and double-precision floating-point instructions.
 
 @item +nofp
 Disable the floating-point extension.
+
+@item +cdecp0, +cdecp1, ... , +cdecp7
+Enable the Custom Datapath Extension (CDE) on selected coprocessors according
+to the numbers given in the options in the range 0 to 7.
 @end table
 
 @item armv8-r
diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c b/gcc/testsuite/gcc.target/arm/pragma_cde.c
new file mode 100644
index 00000000000..97643a08405
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
@@ -0,0 +1,98 @@
+/* Test for CDE #prama target macros.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_ok } */
+/* { dg-add-options arm_v8m_main_cde } */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main")
+#ifdef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is defined but should not be"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x1
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x2
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x4
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp3")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x8
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp4")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x10
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp5")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x20
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp6")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x40
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp7")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x80
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+cdecp1")
+#if __ARM_FEATURE_CDE_COPROC != 0x3
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4413c26fbc9..a32a56ea511 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5053,6 +5053,65 @@ proc add_options_for_arm_v8_2a_bf16_neon { flags } {
     return "$flags $et_arm_v8_2a_bf16_neon_flags"
 }
 
+# A series of routines are created to 1) check if a given architecture is
+# effective (check_effective_target_*_ok) and then 2) give the corresponding
+# flags that enable the architecture (add_options_for_*).
+# The series includes:
+#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
+#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
+#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
+# Usage:
+#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
+#   /* { dg-add-options arm_v8m_main_cde } */
+# The tests are valid for Arm.
+
+foreach { armfunc armflag armdef } {
+	arm_v8m_main_cde
+		"-march=armv8-m.main+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE)"
+	arm_v8m_main_cde_fp
+		"-march=armv8-m.main+fp+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE) && defined (__ARM_FP)"
+	arm_v8_1m_main_cde_mve
+		"-march=armv8.1-m.main+mve+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE) && defined (__ARM_FEATURE_MVE)"
+	} {
+    eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
+	proc check_effective_target_FUNC_ok_nocache { } {
+	    global et_FUNC_flags
+	    set et_FUNC_flags ""
+
+	    if { ![istarget arm*-*-*] } {
+		return 0;
+	    }
+
+	    if { [check_no_compiler_messages_nocache FUNC_ok assembly {
+		#if !(DEF)
+		#error "DEF failed"
+		#endif
+	    } "FLAG"] } {
+		    set et_FUNC_flags "FLAG"
+		    return 1
+	    }
+
+	    return 0;
+	}
+
+	proc check_effective_target_FUNC_ok { } {
+	    return [check_cached_effective_target FUNC_ok \
+		    check_effective_target_FUNC_ok_nocache]
+	}
+
+	proc add_options_for_FUNC { flags } {
+	    if { ! [check_effective_target_FUNC_ok] } {
+		return "$flags"
+	    }
+	    global et_FUNC_flags
+	    return "$flags $et_FUNC_flags"
+	}
+    }]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
  2020-03-19 14:02     ` Dennis Zhang
@ 2020-03-19 17:48       ` Kyrylo Tkachov
  2020-04-08 11:33         ` Dennis Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-03-19 17:48 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Dennis,

> -----Original Message-----
> From: Dennis Zhang <Dennis.Zhang@arm.com>
> Sent: 19 March 2020 14:03
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> (CDE): enable the feature
> 
> Hi Kyrylo,
> 
> >________________________________________
> >From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> >Sent: Wednesday, March 18, 2020 9:04 AM
> >To: Dennis Zhang; gcc-patches@gcc.gnu.org
> >Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> >Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> >Extension (CDE): enable the feature
> >
> >Hi Dennis,
> >
> >> -----Original Message-----
> >> From: Dennis Zhang <Dennis.Zhang@arm.com>
> >> Sent: 12 March 2020 12:06
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> >> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> Tkachov
> >> <Kyrylo.Tkachov@arm.com>
> >> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> >> (CDE): enable the feature
> >>
> >> Hi all,
> >>
> >> This patch is part of a series that adds support for the ARMv8.m
> >> Custom Datapath Extension.
> >> This patch defines the options cdecp0-cdecp7 for CLI to enable the
> >> CDE on corresponding coprocessor 0-7.
> >> It also adds new check-effective for CDE feature.
> >>
> >> ISA has been announced at
> >> https://developer.arm.com/architectures/instruction-sets/custom-
> >> instructions
> >>
> >> Regtested and bootstrapped.
> >>
> >> Is it OK to commit please?
> >
> >Can you please rebase this patch on top of the recent MVE commits?
> >It currently doesn't apply cleanly to trunk.
> >Thanks,
> >Kyrill
> 
> The rebase patches is as attached.
> Is it OK to commit?

Ok, with a few fixes...

diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c b/gcc/testsuite/gcc.target/arm/pragma_cde.c
new file mode 100644
index 00000000000..97643a08405
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
@@ -0,0 +1,98 @@
+/* Test for CDE #prama target macros.  */
+/* { dg-do compile } */

Typo in "pragma" in the comment.


+# A series of routines are created to 1) check if a given architecture is
+# effective (check_effective_target_*_ok) and then 2) give the corresponding
+# flags that enable the architecture (add_options_for_*).
+# The series includes:
+#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
+#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
+#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
+# Usage:
+#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
+#   /* { dg-add-options arm_v8m_main_cde } */
+# The tests are valid for Arm.
+
+foreach { armfunc armflag armdef } {

New effective target checks need to be documented in doc/invoke.texi

Ok with those changes.
Kyrill

> 
> Thanks
> Dennis


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm][2/4]  Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers
  2020-03-13 19:31   ` [PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers Dennis Zhang
@ 2020-03-20 15:18     ` Dennis Zhang
  2020-04-07 12:31       ` Dennis Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-03-20 15:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 2776 bytes --]

Hi all,

This patch is updated as attached.
It's rebased to the top. Is it ready for commit please?

Cheers
Dennis

> Hi all,
>
> This patch is part of a series that adds support for the ARMv8.m Custom Datapath Extension (CDE).
> It enables the ACLE intrinsics calling VCX1<A>, VCX2<A>, and VCX3<A> instructions who work with FPU/MVE 32-bit/64-bit registers.
>
> This patch depends on the CDE feature patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
> It also depends on the MVE framework patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
> ISA has been announced at https://developer.arm.com/architectures/instruction-sets/custom-instructions
>
> Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.
>
> Is it OK for commit please?
>
> Cheers
> Dennis
>
> gcc/ChangeLog:
>
> 2020-03-12  Dennis Zhang  <dennis.zhang@arm.com>
>              Matthew Malcomson <matthew.malcomson@arm.com>
>
>         * config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
>         (CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
>         (CX_TERNARY_QUALIFIERS): Likewise.
>         (ARM_BUILTIN_CDE_PATTERN_START): Likewise.
>         (ARM_BUILTIN_CDE_PATTERN_END): Likewise.
>         (arm_init_acle_builtins): Initialize CDE builtins.
>         (arm_expand_acle_builtin): Check CDE constant operands.
>         * config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the range
>         of CDE constant operand.
>         (ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3): Likewise.
>         * config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
>         (__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
>         (__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
>         (__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
>         (__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
>         * config/arm/arm_cde_builtins.def: New file.
>         * config/arm/iterators.md (V_reg): New attribute of SI.
>         * config/arm/predicates.md (const_int_coproc_operand): New.
>         (const_int_vcde1_operand, const_int_vcde2_operand): New.
>         (const_int_vcde3_operand): New.
>         * config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
>         * config/arm/vfp.md (arm_vcx1<mode>): New entry.
>         (arm_vcx1a<mode>, arm_vcx2<mode>, arm_vcx2a<mode>): Likewise.
>         (arm_vcx3<mode>, arm_vcx3a<mode>): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2020-03-12  Dennis Zhang  <dennis.zhang@arm.com>
>
>         * gcc.target/arm/acle/cde_v_1.c: New test.
>         * gcc.target/arm/acle/cde_v_1_err.c: New test.
>         * gcc.target/arm/acle/cde_v_1_mve.c: New test.
>

[-- Attachment #2: arm-m-cde-vcxsidi-final-20200319.patch --]
[-- Type: application/octet-stream, Size: 29147 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index c3deb9efc8849019141b6430543e93605fda4af4..74b50c7adc1c03d1a77516f94fbc16c78a02e556 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -305,6 +305,35 @@ arm_mrrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define MRRC_QUALIFIERS \
   (arm_mrrc_qualifiers)
 
+/* T (immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate, qualifier_unsigned_immediate };
+#define CX_IMM_QUALIFIERS (arm_cx_imm_qualifiers)
+
+/* T (immediate, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_unary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_UNARY_QUALIFIERS (arm_cx_unary_qualifiers)
+
+/* T (immediate, T, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_binary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate,
+      qualifier_none, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_BINARY_QUALIFIERS (arm_cx_binary_qualifiers)
+
+/* T (immediate, T, T, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_ternary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate,
+      qualifier_none, qualifier_none, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_TERNARY_QUALIFIERS (arm_cx_ternary_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
    which we represent with qualifier_void.  Their first operand will be
    a DImode pointer to the location to store to, so we must use
@@ -846,7 +875,23 @@ static arm_builtin_datum acle_builtin_data[] =
 };
 
 #undef VAR1
+/* IMM_MAX sets the maximum valid value of the CDE immediate operand.
+   ECF_FLAG sets the flag used for set_call_expr_flags.  */
+#define VAR1(T, N, A, IMM_MAX, ECF_FLAG) \
+  {{#N #A, UP (A), CODE_FOR_arm_##N##A, 0, T##_QUALIFIERS}, IMM_MAX, ECF_FLAG},
+
+typedef struct {
+  arm_builtin_datum base;
+  unsigned int imm_max;
+  int ecf_flag;
+} arm_builtin_cde_datum;
+
+static arm_builtin_cde_datum cde_builtin_data[] =
+{
+#include "arm_cde_builtins.def"
+};
 
+#undef VAR1
 #define VAR1(T, N, X) \
   ARM_BUILTIN_NEON_##N##X,
 
@@ -1140,6 +1185,14 @@ enum arm_builtins
 
 #include "arm_acle_builtins.def"
 
+#undef VAR1
+#define VAR1(T, N, X, ... ) \
+  ARM_BUILTIN_##N##X,
+
+  ARM_BUILTIN_CDE_BASE,
+
+#include "arm_cde_builtins.def"
+
   ARM_BUILTIN_MVE_BASE,
 
 #undef VAR1
@@ -1162,6 +1215,12 @@ enum arm_builtins
 #define ARM_BUILTIN_ACLE_PATTERN_START \
   (ARM_BUILTIN_ACLE_BASE + 1)
 
+#define ARM_BUILTIN_CDE_PATTERN_START \
+  (ARM_BUILTIN_CDE_BASE + 1)
+
+#define ARM_BUILTIN_CDE_PATTERN_END \
+  (ARM_BUILTIN_CDE_BASE + ARRAY_SIZE (cde_builtin_data))
+
 #undef CF
 #undef VAR1
 #undef VAR2
@@ -1690,6 +1749,15 @@ arm_init_acle_builtins (void)
       arm_builtin_datum *d = &acle_builtin_data[i];
       arm_init_builtin (fcode, d, "__builtin_arm");
     }
+
+  fcode = ARM_BUILTIN_CDE_PATTERN_START;
+  for (i = 0; i < ARRAY_SIZE (cde_builtin_data); i++, fcode++)
+    {
+      arm_builtin_cde_datum *cde = &cde_builtin_data[i];
+      arm_builtin_datum *d = &cde->base;
+      arm_init_builtin (fcode, d, "__builtin_arm");
+      set_call_expr_flags (arm_builtin_decls[fcode], cde->ecf_flag);
+    }
 }
 
 /* Set up all the MVE builtins mentioned in arm_mve_builtins.def file.  */
@@ -2866,8 +2934,29 @@ constant_arg:
 	      if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
 		{
-		  error ("%Kargument %d must be a constant immediate",
-			 exp, argc + 1);
+		  if (IN_RANGE (fcode, ARM_BUILTIN_CDE_PATTERN_START,
+				ARM_BUILTIN_CDE_PATTERN_END))
+		    {
+		      if (argc == 0)
+			{
+			  unsigned int cp_bit = UINTVAL (op[argc]);
+			  if (IN_RANGE (cp_bit, 0, ARM_CDE_CONST_COPROC))
+			    error ("%Kcoprocessor %d is not enabled "
+				   "with +cdecp%d", exp, cp_bit, cp_bit);
+			  else
+			    error ("%Kcoproc must be a constant immediate in "
+				   "range [0-%d] enabled with +cdecp<N>", exp,
+				   ARM_CDE_CONST_COPROC);
+			}
+		      else
+			error ("%Kargument %d must be a constant immediate "
+			       "in range [0-%d]", exp, argc + 1,
+			       cde_builtin_data[fcode -
+			       ARM_BUILTIN_CDE_PATTERN_START].imm_max);
+		    }
+		  else
+		    error ("%Kargument %d must be a constant immediate",
+			   exp, argc + 1);
 		  /* We have failed to expand the pattern, and are safely
 		     in to invalid code.  But the mid-end will still try to
 		     build an assignment for this node while it expands,
@@ -3092,8 +3181,12 @@ arm_expand_acle_builtin (int fcode, tree exp, rtx target)
       /* Don't generate any RTL.  */
       return const0_rtx;
     }
+
+  gcc_assert (fcode != ARM_BUILTIN_CDE_BASE);
   arm_builtin_datum *d
-    = &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START];
+    = (fcode < ARM_BUILTIN_CDE_BASE)
+      ? &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START]
+      : &cde_builtin_data[fcode - ARM_BUILTIN_CDE_PATTERN_START].base;
 
   return arm_expand_builtin_1 (fcode, exp, target, d);
 }
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 343235d0cbc0be4fa7c773da71567d4ae267494b..ca36a74cd1fa161c388961588fa0f96030b7888e 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -575,6 +575,10 @@ extern int arm_arch_bf16;
 extern int arm_arch_cde;
 extern int arm_arch_cde_coproc;
 extern const int arm_arch_cde_coproc_bits[];
+#define ARM_CDE_CONST_COPROC	7
+#define ARM_VCDE_CONST_1	((1 << 11) - 1)
+#define ARM_VCDE_CONST_2	((1 << 6 ) - 1)
+#define ARM_VCDE_CONST_3	((1 << 3 ) - 1)
 
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
diff --git a/gcc/config/arm/arm_cde.h b/gcc/config/arm/arm_cde.h
index f975754632f6e87da331a19a63300c4de3c1f033..4c9f7ebeed4e2abf532f53040f5891da8b1aadac 100644
--- a/gcc/config/arm/arm_cde.h
+++ b/gcc/config/arm/arm_cde.h
@@ -33,6 +33,77 @@ extern "C" {
 
 #include <stdint.h>
 
+#if defined (__ARM_FEATURE_CDE)
+
+#if defined (__ARM_FP) || defined (__ARM_FEATURE_MVE)
+
+/* CDE builtins using FPU/MVE registers.  */
+
+/* uint32_t
+   __arm_vcx1_u32(int coproc, uint32_t imm);  */
+#define __arm_vcx1_u32(coproc, imm) \
+	__builtin_arm_vcx1si(coproc, imm)
+
+/* uint32_t
+   __arm_vcx1a_u32(int coproc, uint32_t acc, uint32_t imm);  */
+#define __arm_vcx1a_u32(coproc, acc, imm) \
+	__builtin_arm_vcx1asi(coproc, acc, imm)
+
+/* uint32_t
+   __arm_vcx2_u32(int coproc, uint32_t n, uint32_t imm);  */
+#define __arm_vcx2_u32(coproc, n, imm) \
+	__builtin_arm_vcx2si(coproc, n, imm)
+
+/* uint32_t
+   __arm_vcx2a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t imm);  */
+#define __arm_vcx2a_u32(coproc, acc, n, imm) \
+	__builtin_arm_vcx2asi(coproc, acc, n, imm)
+
+/* uint32_t
+   __arm_vcx3_u32(int coproc, uint32_t n, uint32_t m, uint32_t imm);  */
+#define __arm_vcx3_u32(coproc, n, m, imm) \
+	__builtin_arm_vcx3si(coproc, n, m, imm)
+
+/* uint32_t
+   __arm_vcx3a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t m,
+		   uint32_t imm);  */
+#define __arm_vcx3a_u32(coproc, acc, n, m, imm) \
+	__builtin_arm_vcx3asi(coproc, acc, n, m, imm)
+
+/* uint64_t
+   __arm_vcx1d_u64(int coproc, uint32_t imm);  */
+#define __arm_vcx1d_u64(coproc, imm) \
+	__builtin_arm_vcx1di(coproc, imm)
+
+/* uint64_t
+   __arm_vcx1da_u64(int coproc, uint64_t acc, uint32_t imm);  */
+#define __arm_vcx1da_u64(coproc, acc, imm) \
+	__builtin_arm_vcx1adi(coproc, acc, imm)
+
+/* uint64_t
+   __arm_vcx2d_u64(int coproc, uint64_t m, uint32_t imm);  */
+#define __arm_vcx2d_u64(coproc, m, imm) \
+	__builtin_arm_vcx2di(coproc, m, imm)
+
+/* uint64_t
+   __arm_vcx2da_u64(int coproc, uint64_t acc, uint64_t m, uint32_t imm);  */
+#define __arm_vcx2da_u64(coproc, acc, m, imm) \
+	__builtin_arm_vcx2adi(coproc, acc, m, imm)
+
+/* uint64_t
+   __arm_vcx3d_u64(int coproc, uint64_t n, uint64_t m, uint32_t imm);  */
+#define __arm_vcx3d_u64(coproc, n, m, imm) \
+	__builtin_arm_vcx3di(coproc, n, m, imm)
+
+/* uint64_t
+   __arm_vcx3da_u64(int coproc, uint64_t acc, uint64_t n, uint64_t m,
+		    uint32_t imm);  */
+#define __arm_vcx3da_u64(coproc, acc, n, m, imm) \
+	__builtin_arm_vcx3adi(coproc, acc, n, m, imm)
+
+#endif /* __ARM_FP || __ARM_FEATURE_MVE.  */
+#endif /* __ARM_FEATURE_CDE.  */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_cde_builtins.def b/gcc/config/arm/arm_cde_builtins.def
new file mode 100644
index 0000000000000000000000000000000000000000..a9fea937b9650f21a26d8183572b550e39b0fe7d
--- /dev/null
+++ b/gcc/config/arm/arm_cde_builtins.def
@@ -0,0 +1,33 @@
+/* Arm Custom Datapath Extension (CDE) builtin definitions.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#undef CDE_VAR2
+#define CDE_VAR2(T, N, A, B, IMM_MAX, ECF_FLAG) \
+  VAR1 (T, N, A, IMM_MAX, ECF_FLAG) \
+  VAR1 (T, N, B, IMM_MAX, ECF_FLAG)
+
+CDE_VAR2 (CX_IMM, vcx1, si, di, ARM_VCDE_CONST_1, ECF_CONST)
+CDE_VAR2 (CX_UNARY, vcx1a, si, di, ARM_VCDE_CONST_1, ECF_CONST)
+CDE_VAR2 (CX_UNARY, vcx2, si, di, ARM_VCDE_CONST_2, ECF_CONST)
+CDE_VAR2 (CX_BINARY, vcx2a, si, di, ARM_VCDE_CONST_2, ECF_CONST)
+CDE_VAR2 (CX_BINARY, vcx3, si, di, ARM_VCDE_CONST_3, ECF_CONST)
+CDE_VAR2 (CX_TERNARY, vcx3a, si, di, ARM_VCDE_CONST_3, ECF_CONST)
+
+#undef CDE_VAR2
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 5c1a11bf7dee7590d668e7ec5e3b068789b3b3db..183d0347666f56f051e0fc060148f5064def9f83 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -621,7 +621,7 @@
 			 (V2SI "P") (V4SI  "q")
 			 (V2SF "P") (V4SF  "q")
 			 (DI   "P") (V2DI  "q")
-			 (V2HF "") (SF   "")
+			 (V2HF "") (SF   "") (SI "")
 			 (DF    "P") (HF   "")])
 
 ;; Output template to select the high VFP register of a mult-register value.
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index bb302ed5b42208310822b6df770a0f00455cdfba..59cf5b67f8a0a8ac56a664711090d682a5a93ad5 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -226,6 +226,23 @@
   (and (match_operand 0 "const_int_operand")
        (match_test "satisfies_constraint_M (op)")))
 
+(define_predicate "const_int_coproc_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_CDE_CONST_COPROC)")
+       (match_test "arm_arch_cde_coproc_bits[UINTVAL (op)] & arm_arch_cde_coproc")))
+
+(define_predicate "const_int_vcde1_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_1)")))
+
+(define_predicate "const_int_vcde2_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_2)")))
+
+(define_predicate "const_int_vcde3_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_3)")))
+
 ;; This doesn't have to do much because the constant is already checked
 ;; in the shift_operator predicate.
 (define_predicate "shift_amount_operand"
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index e76609f79418af38b70746336dd43592a1dc8713..b50e127eadc21ad6fc91015f59f2dcd47160eabf 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -154,6 +154,8 @@
   UNSPEC_SMUADX		; Represent the SMUADX operation.
   UNSPEC_SSAT16		; Represent the SSAT16 operation.
   UNSPEC_USAT16		; Represent the USAT16 operation.
+  UNSPEC_VCDE		; Custom Datapath Extension instruction.
+  UNSPEC_VCDEA		; Custom Datapath Extension instruction.
 ])
 
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index eb6ae7bea7927c666f36219797d54c0127001bc1..01b53443ac1ffe6b63592b44eacde3a6ff84abc0 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -2166,3 +2166,74 @@
   DONE;
 }
 )
+
+;; CDE instructions using FPU/MVE S/D registers
+
+(define_insn "arm_vcx1<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SI 2 "const_int_vcde1_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx1\\tp%c1, %<V_reg>0, #%c2"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx1a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SI 3 "const_int_vcde1_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx1a\\tp%c1, %<V_reg>0, #%c3"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx2<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "t")
+		      (match_operand:SI 3 "const_int_vcde2_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx2\\tp%c1, %<V_reg>0, %<V_reg>2, #%c3"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx2a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SI 4 "const_int_vcde2_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx2a\\tp%c1, %<V_reg>0, %<V_reg>3, #%c4"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx3<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "t")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SI 4 "const_int_vcde3_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx3\\tp%c1, %<V_reg>0, %<V_reg>2, %<V_reg>3, #%c4"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx3a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SIDI 4 "register_operand" "t")
+		      (match_operand:SI 5 "const_int_vcde3_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx3a\\tp%c1, %<V_reg>0, %<V_reg>3, %<V_reg>4, #%c5"
+  [(set_attr "type" "coproc")]
+)
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..3104db4ae608365667f4b617c5a4d58c90f5f5aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c
@@ -0,0 +1,94 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_fp_ok } */
+/* { dg-options "-save-temps -O2" } */
+/* { dg-add-options arm_v8m_main_cde_fp } */
+
+#include "arm_cde.h"
+
+#define TEST0(T, N, C, I) \
+T test_arm_##N##_##C##_##I () { \
+  return __arm_##N (C, I); \
+}
+
+#define TEST1(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, I); \
+}
+
+#define TEST2(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, I); \
+}
+
+#define TEST3(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, a, I); \
+}
+
+#define TEST_ALL(C) \
+TEST0 (uint32_t, vcx1_u32,	C, 0) \
+TEST1 (uint32_t, vcx1a_u32,	C, 0) \
+TEST1 (uint32_t, vcx2_u32,	C, 0) \
+TEST2 (uint32_t, vcx2a_u32,	C, 0) \
+TEST2 (uint32_t, vcx3_u32,	C, 0) \
+TEST3 (uint32_t, vcx3a_u32,	C, 0) \
+TEST0 (uint64_t, vcx1d_u64,	C, 0) \
+TEST1 (uint64_t, vcx1da_u64,	C, 0) \
+TEST1 (uint64_t, vcx2d_u64,	C, 0) \
+TEST2 (uint64_t, vcx2da_u64,	C, 0) \
+TEST2 (uint64_t, vcx3d_u64,	C, 0) \
+TEST3 (uint64_t, vcx3da_u64,	C, 0) \
+TEST0 (uint32_t, vcx1_u32,	C, 2047) \
+TEST1 (uint32_t, vcx1a_u32,	C, 2047) \
+TEST1 (uint32_t, vcx2_u32,	C, 63) \
+TEST2 (uint32_t, vcx2a_u32,	C, 63) \
+TEST2 (uint32_t, vcx3_u32,	C, 7) \
+TEST3 (uint32_t, vcx3a_u32,	C, 7) \
+TEST0 (uint64_t, vcx1d_u64,	C, 2047) \
+TEST1 (uint64_t, vcx1da_u64,	C, 2047) \
+TEST1 (uint64_t, vcx2d_u64,	C, 63) \
+TEST2 (uint64_t, vcx2da_u64,	C, 63) \
+TEST2 (uint64_t, vcx3d_u64,	C, 7) \
+TEST3 (uint64_t, vcx3da_u64,	C, 7)
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+fp")
+TEST_ALL (0)
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1+fp")
+TEST_ALL (1)
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2+cdecp3+cdecp4+cdecp5+cdecp6+cdecp7+fp")
+TEST_ALL (2)
+TEST_ALL (3)
+TEST_ALL (4)
+TEST_ALL (5)
+TEST_ALL (6)
+TEST_ALL (7)
+#pragma GCC pop_options
+
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp1, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp2, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp3, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp4, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp5, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp6, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp7, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp[0-7], s[0-9]+, #2047} 8 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp[0-7], s[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp[0-7], s[0-9]+, s[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp[0-7], s[0-9]+, s[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp[0-7], s[0-9]+, s[0-9]+, s[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp[0-7], s[0-9]+, s[0-9]+, s[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp[0-7], d[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp[0-7], d[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp[0-7], d[0-9]+, d[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp[0-7], d[0-9]+, d[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp[0-7], d[0-9]+, d[0-9]+, d[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp[0-7], d[0-9]+, d[0-9]+, d[0-9]+, #[0,7]} 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c
new file mode 100644
index 0000000000000000000000000000000000000000..023fab4ef9bf46dbf630d4698c2a0570bd2e4d14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c
@@ -0,0 +1,127 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_fp_ok } */
+/* { dg-add-options arm_v8m_main_cde_fp } */
+
+#include "arm_cde.h"
+
+uint64_t test_coproc_range (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (8, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1a_u32 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2_u32 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2a_u32 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3_u32 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3a_u32 (8, a, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1d_u64 (8, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1da_u64 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2d_u64 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2da_u64 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3d_u64 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3da_u64 (8, a, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  return res;
+}
+
+uint64_t test_imm_range (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (0, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx1a_u32 (0, a, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx2_u32 (0, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx2a_u32 (0, a, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx3_u32 (0, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3a_u32 (0, a, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1d_u64 (0, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx1da_u64 (0, a, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx2d_u64 (0, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx2da_u64 (0, a, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx3d_u64 (0, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3da_u64 (0, a, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  return res;
+}
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1+fp")
+uint64_t test_coproc_match_1 (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1a_u32 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2_u32 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2a_u32 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3_u32 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3a_u32 (0, a, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1d_u64 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1da_u64 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2d_u64 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2da_u64 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3d_u64 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3da_u64 (0, a, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  return res;
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2+fp")
+uint32_t test_coproc_match_2 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp3+fp")
+uint32_t test_coproc_match_3 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp4+fp")
+uint32_t test_coproc_match_4 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp5+fp")
+uint32_t test_coproc_match_5 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp6+fp")
+uint32_t test_coproc_match_6 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp7+fp")
+uint32_t test_coproc_match_7 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+fp")
+uint32_t test_coproc_match_0 ()
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (1, 0); /* { dg-error {coprocessor 1 is not enabled with \+cdecp1} } */
+  res += __arm_vcx1_u32 (2, 0); /* { dg-error {coprocessor 2 is not enabled with \+cdecp2} } */
+  res += __arm_vcx1_u32 (3, 0); /* { dg-error {coprocessor 3 is not enabled with \+cdecp3} } */
+  res += __arm_vcx1_u32 (4, 0); /* { dg-error {coprocessor 4 is not enabled with \+cdecp4} } */
+  res += __arm_vcx1_u32 (5, 0); /* { dg-error {coprocessor 5 is not enabled with \+cdecp5} } */
+  res += __arm_vcx1_u32 (6, 0); /* { dg-error {coprocessor 6 is not enabled with \+cdecp6} } */
+  res += __arm_vcx1_u32 (7, 0); /* { dg-error {coprocessor 7 is not enabled with \+cdecp7} } */
+  return res;
+}
+#pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c
new file mode 100644
index 0000000000000000000000000000000000000000..5140c3f521a628c4ccc4ca670876a0b0468efa37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c
@@ -0,0 +1,56 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_main_cde_mve_ok } */
+/* { dg-options "-save-temps -O2" } */
+/* { dg-add-options arm_v8_1m_main_cde_mve } */
+
+#include "arm_cde.h"
+
+#define TEST0(T, N, C, I) \
+T test_arm_##N##_##C##_##I () { \
+  return __arm_##N (C, I); \
+}
+
+#define TEST1(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, I); \
+}
+
+#define TEST2(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, I); \
+}
+
+#define TEST3(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, a, I); \
+}
+
+#define TEST_ALL(C) \
+TEST0 (uint32_t, vcx1_u32,	C, 0) \
+TEST1 (uint32_t, vcx1a_u32,	C, 0) \
+TEST1 (uint32_t, vcx2_u32,	C, 0) \
+TEST2 (uint32_t, vcx2a_u32,	C, 0) \
+TEST2 (uint32_t, vcx3_u32,	C, 0) \
+TEST3 (uint32_t, vcx3a_u32,	C, 0) \
+TEST0 (uint64_t, vcx1d_u64,	C, 0) \
+TEST1 (uint64_t, vcx1da_u64,	C, 0) \
+TEST1 (uint64_t, vcx2d_u64,	C, 0) \
+TEST2 (uint64_t, vcx2da_u64,	C, 0) \
+TEST2 (uint64_t, vcx3d_u64,	C, 0) \
+TEST3 (uint64_t, vcx3da_u64,	C, 0)
+
+TEST_ALL (0)
+
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp0, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp0, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp0, s[0-9]+, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp0, s[0-9]+, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp0, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp0, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp0, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp0, d[0-9]+, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp0, d[0-9]+, d[0-9]+, d[0-9]+, #0} 1 } } */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm][2/4]  Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers
  2020-03-20 15:18     ` Dennis Zhang
@ 2020-04-07 12:31       ` Dennis Zhang
  2020-04-07 14:07         ` Kyrylo Tkachov
  0 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-04-07 12:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 2818 bytes --]

Hi all,

This patch is updated to support DImode for vfp target as required by CDE.
Changelog is updated as following.

Is this ready for commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-04-07  Dennis Zhang  <dennis.zhang@arm.com>
	    Matthew Malcomson <matthew.malcomson@arm.com>

	* config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
	(CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
	(CX_TERNARY_QUALIFIERS): Likewise.
	(ARM_BUILTIN_CDE_PATTERN_START): Likewise.
	(ARM_BUILTIN_CDE_PATTERN_END): Likewise.
	(arm_init_acle_builtins): Initialize CDE builtins.
	(arm_expand_acle_builtin): Check CDE constant operands.
	* config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the range
	of CDE constant operand.
	* config/arm/arm.c (arm_hard_regno_mode_ok): Support DImode for
	TARGET_VFP_BASE.
	(ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3): Likewise.
	* config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
	(__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
	(__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
	(__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
	(__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
	* config/arm/arm_cde_builtins.def: New file.
	* config/arm/iterators.md (V_reg): New attribute of SI.
	* config/arm/predicates.md (const_int_coproc_operand): New.
	(const_int_vcde1_operand, const_int_vcde2_operand): New.
	(const_int_vcde3_operand): New.
	* config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
	* config/arm/vfp.md (arm_vcx1<mode>): New entry.
	(arm_vcx1a<mode>, arm_vcx2<mode>, arm_vcx2a<mode>): Likewise.
	(arm_vcx3<mode>, arm_vcx3a<mode>): Likewise.

gcc/testsuite/ChangeLog:

2020-04-07  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/acle/cde_v_1.c: New test.
	* gcc.target/arm/acle/cde_v_1_err.c: New test.
	* gcc.target/arm/acle/cde_v_1_mve.c: New test.

> Hi all,
>
> This patch is updated as attached.
> It's rebased to the top. Is it ready for commit please?
>
> Cheers
> Dennis
>
> > Hi all,
> >
> > This patch is part of a series that adds support for the ARMv8.m Custom Datapath Extension (CDE).
> > It enables the ACLE intrinsics calling VCX1<A>, VCX2<A>, and VCX3<A> instructions who work with FPU/MVE 32-bit/64-bit registers.
> >
> > This patch depends on the CDE feature patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
> > It also depends on the MVE framework patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
> > ISA has been announced at https://developer.arm.com/architectures/instruction-sets/custom-instructions
> >
> > Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.
> >
> > Is it OK for commit please?
> >
> > Cheers
> > Dennis
> >

[-- Attachment #2: arm-m-cde-vcxsidi-final-20200407-rb12663.patch --]
[-- Type: application/octet-stream, Size: 29677 bytes --]

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 832b9107424fd9a4a0ee272b773b3d0929172370..a8bad7b1ae5a102616656cf4cf35a6c570fbe349 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -305,6 +305,35 @@ arm_mrrc_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define MRRC_QUALIFIERS \
   (arm_mrrc_qualifiers)
 
+/* T (immediate, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate, qualifier_unsigned_immediate };
+#define CX_IMM_QUALIFIERS (arm_cx_imm_qualifiers)
+
+/* T (immediate, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_unary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_UNARY_QUALIFIERS (arm_cx_unary_qualifiers)
+
+/* T (immediate, T, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_binary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate,
+      qualifier_none, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_BINARY_QUALIFIERS (arm_cx_binary_qualifiers)
+
+/* T (immediate, T, T, T, unsigned immediate).  */
+static enum arm_type_qualifiers
+arm_cx_ternary_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_immediate,
+      qualifier_none, qualifier_none, qualifier_none,
+      qualifier_unsigned_immediate };
+#define CX_TERNARY_QUALIFIERS (arm_cx_ternary_qualifiers)
+
 /* The first argument (return type) of a store should be void type,
    which we represent with qualifier_void.  Their first operand will be
    a DImode pointer to the location to store to, so we must use
@@ -928,7 +957,23 @@ static arm_builtin_datum acle_builtin_data[] =
 };
 
 #undef VAR1
+/* IMM_MAX sets the maximum valid value of the CDE immediate operand.
+   ECF_FLAG sets the flag used for set_call_expr_flags.  */
+#define VAR1(T, N, A, IMM_MAX, ECF_FLAG) \
+  {{#N #A, UP (A), CODE_FOR_arm_##N##A, 0, T##_QUALIFIERS}, IMM_MAX, ECF_FLAG},
+
+typedef struct {
+  arm_builtin_datum base;
+  unsigned int imm_max;
+  int ecf_flag;
+} arm_builtin_cde_datum;
+
+static arm_builtin_cde_datum cde_builtin_data[] =
+{
+#include "arm_cde_builtins.def"
+};
 
+#undef VAR1
 #define VAR1(T, N, X) \
   ARM_BUILTIN_NEON_##N##X,
 
@@ -1224,6 +1269,14 @@ enum arm_builtins
 
 #include "arm_acle_builtins.def"
 
+#undef VAR1
+#define VAR1(T, N, X, ... ) \
+  ARM_BUILTIN_##N##X,
+
+  ARM_BUILTIN_CDE_BASE,
+
+#include "arm_cde_builtins.def"
+
   ARM_BUILTIN_MVE_BASE,
 
 #undef VAR1
@@ -1246,6 +1299,12 @@ enum arm_builtins
 #define ARM_BUILTIN_ACLE_PATTERN_START \
   (ARM_BUILTIN_ACLE_BASE + 1)
 
+#define ARM_BUILTIN_CDE_PATTERN_START \
+  (ARM_BUILTIN_CDE_BASE + 1)
+
+#define ARM_BUILTIN_CDE_PATTERN_END \
+  (ARM_BUILTIN_CDE_BASE + ARRAY_SIZE (cde_builtin_data))
+
 #undef CF
 #undef VAR1
 #undef VAR2
@@ -1774,6 +1833,15 @@ arm_init_acle_builtins (void)
       arm_builtin_datum *d = &acle_builtin_data[i];
       arm_init_builtin (fcode, d, "__builtin_arm");
     }
+
+  fcode = ARM_BUILTIN_CDE_PATTERN_START;
+  for (i = 0; i < ARRAY_SIZE (cde_builtin_data); i++, fcode++)
+    {
+      arm_builtin_cde_datum *cde = &cde_builtin_data[i];
+      arm_builtin_datum *d = &cde->base;
+      arm_init_builtin (fcode, d, "__builtin_arm");
+      set_call_expr_flags (arm_builtin_decls[fcode], cde->ecf_flag);
+    }
 }
 
 /* Set up all the MVE builtins mentioned in arm_mve_builtins.def file.  */
@@ -2966,8 +3034,29 @@ constant_arg:
 	      if (!(*insn_data[icode].operand[opno].predicate)
 		  (op[argc], mode[argc]))
 		{
-		  error ("%Kargument %d must be a constant immediate",
-			 exp, argc + 1);
+		  if (IN_RANGE (fcode, ARM_BUILTIN_CDE_PATTERN_START,
+				ARM_BUILTIN_CDE_PATTERN_END))
+		    {
+		      if (argc == 0)
+			{
+			  unsigned int cp_bit = UINTVAL (op[argc]);
+			  if (IN_RANGE (cp_bit, 0, ARM_CDE_CONST_COPROC))
+			    error ("%Kcoprocessor %d is not enabled "
+				   "with +cdecp%d", exp, cp_bit, cp_bit);
+			  else
+			    error ("%Kcoproc must be a constant immediate in "
+				   "range [0-%d] enabled with +cdecp<N>", exp,
+				   ARM_CDE_CONST_COPROC);
+			}
+		      else
+			error ("%Kargument %d must be a constant immediate "
+			       "in range [0-%d]", exp, argc + 1,
+			       cde_builtin_data[fcode -
+			       ARM_BUILTIN_CDE_PATTERN_START].imm_max);
+		    }
+		  else
+		    error ("%Kargument %d must be a constant immediate",
+			   exp, argc + 1);
 		  /* We have failed to expand the pattern, and are safely
 		     in to invalid code.  But the mid-end will still try to
 		     build an assignment for this node while it expands,
@@ -3192,8 +3281,12 @@ arm_expand_acle_builtin (int fcode, tree exp, rtx target)
       /* Don't generate any RTL.  */
       return const0_rtx;
     }
+
+  gcc_assert (fcode != ARM_BUILTIN_CDE_BASE);
   arm_builtin_datum *d
-    = &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START];
+    = (fcode < ARM_BUILTIN_CDE_BASE)
+      ? &acle_builtin_data[fcode - ARM_BUILTIN_ACLE_PATTERN_START]
+      : &cde_builtin_data[fcode - ARM_BUILTIN_CDE_PATTERN_START].base;
 
   return arm_expand_builtin_1 (fcode, exp, target, d);
 }
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 343235d0cbc0be4fa7c773da71567d4ae267494b..ca36a74cd1fa161c388961588fa0f96030b7888e 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -575,6 +575,10 @@ extern int arm_arch_bf16;
 extern int arm_arch_cde;
 extern int arm_arch_cde_coproc;
 extern const int arm_arch_cde_coproc_bits[];
+#define ARM_CDE_CONST_COPROC	7
+#define ARM_VCDE_CONST_1	((1 << 11) - 1)
+#define ARM_VCDE_CONST_2	((1 << 6 ) - 1)
+#define ARM_VCDE_CONST_3	((1 << 3 ) - 1)
 
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1d3974bf832ffd5b0cefd8a4f0c0dbc97e54772c..81526ce72dbd14d5cac298a88acca34e4a9087a5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25022,7 +25022,7 @@ arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 
   if (TARGET_VFP_BASE && IS_VFP_REGNUM (regno))
     {
-      if (mode == DFmode)
+      if (mode == DFmode || mode == DImode)
 	return VFP_REGNO_OK_FOR_DOUBLE (regno);
 
       if (mode == HFmode || mode == BFmode || mode == HImode
diff --git a/gcc/config/arm/arm_cde.h b/gcc/config/arm/arm_cde.h
index f975754632f6e87da331a19a63300c4de3c1f033..4c9f7ebeed4e2abf532f53040f5891da8b1aadac 100644
--- a/gcc/config/arm/arm_cde.h
+++ b/gcc/config/arm/arm_cde.h
@@ -33,6 +33,77 @@ extern "C" {
 
 #include <stdint.h>
 
+#if defined (__ARM_FEATURE_CDE)
+
+#if defined (__ARM_FP) || defined (__ARM_FEATURE_MVE)
+
+/* CDE builtins using FPU/MVE registers.  */
+
+/* uint32_t
+   __arm_vcx1_u32(int coproc, uint32_t imm);  */
+#define __arm_vcx1_u32(coproc, imm) \
+	__builtin_arm_vcx1si(coproc, imm)
+
+/* uint32_t
+   __arm_vcx1a_u32(int coproc, uint32_t acc, uint32_t imm);  */
+#define __arm_vcx1a_u32(coproc, acc, imm) \
+	__builtin_arm_vcx1asi(coproc, acc, imm)
+
+/* uint32_t
+   __arm_vcx2_u32(int coproc, uint32_t n, uint32_t imm);  */
+#define __arm_vcx2_u32(coproc, n, imm) \
+	__builtin_arm_vcx2si(coproc, n, imm)
+
+/* uint32_t
+   __arm_vcx2a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t imm);  */
+#define __arm_vcx2a_u32(coproc, acc, n, imm) \
+	__builtin_arm_vcx2asi(coproc, acc, n, imm)
+
+/* uint32_t
+   __arm_vcx3_u32(int coproc, uint32_t n, uint32_t m, uint32_t imm);  */
+#define __arm_vcx3_u32(coproc, n, m, imm) \
+	__builtin_arm_vcx3si(coproc, n, m, imm)
+
+/* uint32_t
+   __arm_vcx3a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t m,
+		   uint32_t imm);  */
+#define __arm_vcx3a_u32(coproc, acc, n, m, imm) \
+	__builtin_arm_vcx3asi(coproc, acc, n, m, imm)
+
+/* uint64_t
+   __arm_vcx1d_u64(int coproc, uint32_t imm);  */
+#define __arm_vcx1d_u64(coproc, imm) \
+	__builtin_arm_vcx1di(coproc, imm)
+
+/* uint64_t
+   __arm_vcx1da_u64(int coproc, uint64_t acc, uint32_t imm);  */
+#define __arm_vcx1da_u64(coproc, acc, imm) \
+	__builtin_arm_vcx1adi(coproc, acc, imm)
+
+/* uint64_t
+   __arm_vcx2d_u64(int coproc, uint64_t m, uint32_t imm);  */
+#define __arm_vcx2d_u64(coproc, m, imm) \
+	__builtin_arm_vcx2di(coproc, m, imm)
+
+/* uint64_t
+   __arm_vcx2da_u64(int coproc, uint64_t acc, uint64_t m, uint32_t imm);  */
+#define __arm_vcx2da_u64(coproc, acc, m, imm) \
+	__builtin_arm_vcx2adi(coproc, acc, m, imm)
+
+/* uint64_t
+   __arm_vcx3d_u64(int coproc, uint64_t n, uint64_t m, uint32_t imm);  */
+#define __arm_vcx3d_u64(coproc, n, m, imm) \
+	__builtin_arm_vcx3di(coproc, n, m, imm)
+
+/* uint64_t
+   __arm_vcx3da_u64(int coproc, uint64_t acc, uint64_t n, uint64_t m,
+		    uint32_t imm);  */
+#define __arm_vcx3da_u64(coproc, acc, n, m, imm) \
+	__builtin_arm_vcx3adi(coproc, acc, n, m, imm)
+
+#endif /* __ARM_FP || __ARM_FEATURE_MVE.  */
+#endif /* __ARM_FEATURE_CDE.  */
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/gcc/config/arm/arm_cde_builtins.def b/gcc/config/arm/arm_cde_builtins.def
new file mode 100644
index 0000000000000000000000000000000000000000..a9fea937b9650f21a26d8183572b550e39b0fe7d
--- /dev/null
+++ b/gcc/config/arm/arm_cde_builtins.def
@@ -0,0 +1,33 @@
+/* Arm Custom Datapath Extension (CDE) builtin definitions.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#undef CDE_VAR2
+#define CDE_VAR2(T, N, A, B, IMM_MAX, ECF_FLAG) \
+  VAR1 (T, N, A, IMM_MAX, ECF_FLAG) \
+  VAR1 (T, N, B, IMM_MAX, ECF_FLAG)
+
+CDE_VAR2 (CX_IMM, vcx1, si, di, ARM_VCDE_CONST_1, ECF_CONST)
+CDE_VAR2 (CX_UNARY, vcx1a, si, di, ARM_VCDE_CONST_1, ECF_CONST)
+CDE_VAR2 (CX_UNARY, vcx2, si, di, ARM_VCDE_CONST_2, ECF_CONST)
+CDE_VAR2 (CX_BINARY, vcx2a, si, di, ARM_VCDE_CONST_2, ECF_CONST)
+CDE_VAR2 (CX_BINARY, vcx3, si, di, ARM_VCDE_CONST_3, ECF_CONST)
+CDE_VAR2 (CX_TERNARY, vcx3a, si, di, ARM_VCDE_CONST_3, ECF_CONST)
+
+#undef CDE_VAR2
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index e6b66eef3728122c87bd6ea68b8a643dd4552b00..c94198772f27dfda62886fecd37393960456c3c0 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -632,7 +632,7 @@
 			 (V2SI "P") (V4SI  "q")
 			 (V2SF "P") (V4SF  "q")
 			 (DI   "P") (V2DI  "q")
-			 (V2HF "") (SF   "")
+			 (V2HF "") (SF   "") (SI "")
 			 (DF    "P") (HF   "")])
 
 ;; Output template to select the high VFP register of a mult-register value.
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index bb302ed5b42208310822b6df770a0f00455cdfba..59cf5b67f8a0a8ac56a664711090d682a5a93ad5 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -226,6 +226,23 @@
   (and (match_operand 0 "const_int_operand")
        (match_test "satisfies_constraint_M (op)")))
 
+(define_predicate "const_int_coproc_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_CDE_CONST_COPROC)")
+       (match_test "arm_arch_cde_coproc_bits[UINTVAL (op)] & arm_arch_cde_coproc")))
+
+(define_predicate "const_int_vcde1_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_1)")))
+
+(define_predicate "const_int_vcde2_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_2)")))
+
+(define_predicate "const_int_vcde3_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "IN_RANGE (UINTVAL (op), 0, ARM_VCDE_CONST_3)")))
+
 ;; This doesn't have to do much because the constant is already checked
 ;; in the shift_operator predicate.
 (define_predicate "shift_amount_operand"
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index a7575871da7bf123f9e2d693815147fa60e1e914..1645c32dfb2a43dde6ee947637edbca2df8f2309 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -154,6 +154,8 @@
   UNSPEC_SMUADX		; Represent the SMUADX operation.
   UNSPEC_SSAT16		; Represent the SSAT16 operation.
   UNSPEC_USAT16		; Represent the USAT16 operation.
+  UNSPEC_VCDE		; Custom Datapath Extension instruction.
+  UNSPEC_VCDEA		; Custom Datapath Extension instruction.
 ])
 
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index dfb1031431af3ec87d9cccdee35db04e0adffe04..ef83b504ff6b3e6cec4c2c81b7bf97785f4a5492 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -2165,3 +2165,74 @@
   DONE;
 }
 )
+
+;; CDE instructions using FPU/MVE S/D registers
+
+(define_insn "arm_vcx1<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SI 2 "const_int_vcde1_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx1\\tp%c1, %<V_reg>0, #%c2"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx1a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SI 3 "const_int_vcde1_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx1a\\tp%c1, %<V_reg>0, #%c3"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx2<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "t")
+		      (match_operand:SI 3 "const_int_vcde2_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx2\\tp%c1, %<V_reg>0, %<V_reg>2, #%c3"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx2a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SI 4 "const_int_vcde2_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx2a\\tp%c1, %<V_reg>0, %<V_reg>3, #%c4"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx3<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "t")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SI 4 "const_int_vcde3_operand" "i")]
+	 UNSPEC_VCDE))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx3\\tp%c1, %<V_reg>0, %<V_reg>2, %<V_reg>3, #%c4"
+  [(set_attr "type" "coproc")]
+)
+
+(define_insn "arm_vcx3a<mode>"
+  [(set (match_operand:SIDI 0 "register_operand" "=t")
+	(unspec:SIDI [(match_operand:SI 1 "const_int_coproc_operand" "i")
+		      (match_operand:SIDI 2 "register_operand" "0")
+		      (match_operand:SIDI 3 "register_operand" "t")
+		      (match_operand:SIDI 4 "register_operand" "t")
+		      (match_operand:SI 5 "const_int_vcde3_operand" "i")]
+	 UNSPEC_VCDEA))]
+  "TARGET_CDE && (TARGET_ARM_FP || TARGET_HAVE_MVE)"
+  "vcx3a\\tp%c1, %<V_reg>0, %<V_reg>3, %<V_reg>4, #%c5"
+  [(set_attr "type" "coproc")]
+)
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..3104db4ae608365667f4b617c5a4d58c90f5f5aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1.c
@@ -0,0 +1,94 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_fp_ok } */
+/* { dg-options "-save-temps -O2" } */
+/* { dg-add-options arm_v8m_main_cde_fp } */
+
+#include "arm_cde.h"
+
+#define TEST0(T, N, C, I) \
+T test_arm_##N##_##C##_##I () { \
+  return __arm_##N (C, I); \
+}
+
+#define TEST1(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, I); \
+}
+
+#define TEST2(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, I); \
+}
+
+#define TEST3(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, a, I); \
+}
+
+#define TEST_ALL(C) \
+TEST0 (uint32_t, vcx1_u32,	C, 0) \
+TEST1 (uint32_t, vcx1a_u32,	C, 0) \
+TEST1 (uint32_t, vcx2_u32,	C, 0) \
+TEST2 (uint32_t, vcx2a_u32,	C, 0) \
+TEST2 (uint32_t, vcx3_u32,	C, 0) \
+TEST3 (uint32_t, vcx3a_u32,	C, 0) \
+TEST0 (uint64_t, vcx1d_u64,	C, 0) \
+TEST1 (uint64_t, vcx1da_u64,	C, 0) \
+TEST1 (uint64_t, vcx2d_u64,	C, 0) \
+TEST2 (uint64_t, vcx2da_u64,	C, 0) \
+TEST2 (uint64_t, vcx3d_u64,	C, 0) \
+TEST3 (uint64_t, vcx3da_u64,	C, 0) \
+TEST0 (uint32_t, vcx1_u32,	C, 2047) \
+TEST1 (uint32_t, vcx1a_u32,	C, 2047) \
+TEST1 (uint32_t, vcx2_u32,	C, 63) \
+TEST2 (uint32_t, vcx2a_u32,	C, 63) \
+TEST2 (uint32_t, vcx3_u32,	C, 7) \
+TEST3 (uint32_t, vcx3a_u32,	C, 7) \
+TEST0 (uint64_t, vcx1d_u64,	C, 2047) \
+TEST1 (uint64_t, vcx1da_u64,	C, 2047) \
+TEST1 (uint64_t, vcx2d_u64,	C, 63) \
+TEST2 (uint64_t, vcx2da_u64,	C, 63) \
+TEST2 (uint64_t, vcx3d_u64,	C, 7) \
+TEST3 (uint64_t, vcx3da_u64,	C, 7)
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+fp")
+TEST_ALL (0)
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1+fp")
+TEST_ALL (1)
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2+cdecp3+cdecp4+cdecp5+cdecp6+cdecp7+fp")
+TEST_ALL (2)
+TEST_ALL (3)
+TEST_ALL (4)
+TEST_ALL (5)
+TEST_ALL (6)
+TEST_ALL (7)
+#pragma GCC pop_options
+
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp1, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp2, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp3, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp4, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp5, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp6, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp7, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp[0-7], s[0-9]+, #2047} 8 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp[0-7], s[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp[0-7], s[0-9]+, s[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp[0-7], s[0-9]+, s[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp[0-7], s[0-9]+, s[0-9]+, s[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp[0-7], s[0-9]+, s[0-9]+, s[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp[0-7], d[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp[0-7], d[0-9]+, #[0,2047]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp[0-7], d[0-9]+, d[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp[0-7], d[0-9]+, d[0-9]+, #[0,63]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp[0-7], d[0-9]+, d[0-9]+, d[0-9]+, #[0,7]} 16 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp[0-7], d[0-9]+, d[0-9]+, d[0-9]+, #[0,7]} 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c
new file mode 100644
index 0000000000000000000000000000000000000000..023fab4ef9bf46dbf630d4698c2a0570bd2e4d14
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_err.c
@@ -0,0 +1,127 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_fp_ok } */
+/* { dg-add-options arm_v8m_main_cde_fp } */
+
+#include "arm_cde.h"
+
+uint64_t test_coproc_range (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (8, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1a_u32 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2_u32 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2a_u32 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3_u32 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3a_u32 (8, a, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1d_u64 (8, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1da_u64 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2d_u64 (8, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx2da_u64 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3d_u64 (8, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3da_u64 (8, a, a, a, 0); /* { dg-error {coproc must be a constant immediate in range \[0-7\]} } */
+  return res;
+}
+
+uint64_t test_imm_range (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (0, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx1a_u32 (0, a, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx2_u32 (0, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx2a_u32 (0, a, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx3_u32 (0, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3a_u32 (0, a, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx1d_u64 (0, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx1da_u64 (0, a, 2048); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-2047\]} } */
+  res += __arm_vcx2d_u64 (0, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx2da_u64 (0, a, a, 64); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-63\]} } */
+  res += __arm_vcx3d_u64 (0, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  res += __arm_vcx3da_u64 (0, a, a, a, 8); /* { dg-error {argument [2-5] must be a constant immediate in range \[0-7\]} } */
+  return res;
+}
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1+fp")
+uint64_t test_coproc_match_1 (uint32_t a, uint64_t b)
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1a_u32 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2_u32 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2a_u32 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3_u32 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3a_u32 (0, a, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1d_u64 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx1da_u64 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2d_u64 (0, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx2da_u64 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3d_u64 (0, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  res += __arm_vcx3da_u64 (0, a, a, a, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+  return res;
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2+fp")
+uint32_t test_coproc_match_2 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp3+fp")
+uint32_t test_coproc_match_3 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp4+fp")
+uint32_t test_coproc_match_4 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp5+fp")
+uint32_t test_coproc_match_5 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp6+fp")
+uint32_t test_coproc_match_6 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp7+fp")
+uint32_t test_coproc_match_7 ()
+{
+  return __arm_vcx1_u32 (0, 0); /* { dg-error {coprocessor 0 is not enabled with \+cdecp0} } */
+}
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+fp")
+uint32_t test_coproc_match_0 ()
+{
+  uint64_t res = 0;
+  res += __arm_vcx1_u32 (1, 0); /* { dg-error {coprocessor 1 is not enabled with \+cdecp1} } */
+  res += __arm_vcx1_u32 (2, 0); /* { dg-error {coprocessor 2 is not enabled with \+cdecp2} } */
+  res += __arm_vcx1_u32 (3, 0); /* { dg-error {coprocessor 3 is not enabled with \+cdecp3} } */
+  res += __arm_vcx1_u32 (4, 0); /* { dg-error {coprocessor 4 is not enabled with \+cdecp4} } */
+  res += __arm_vcx1_u32 (5, 0); /* { dg-error {coprocessor 5 is not enabled with \+cdecp5} } */
+  res += __arm_vcx1_u32 (6, 0); /* { dg-error {coprocessor 6 is not enabled with \+cdecp6} } */
+  res += __arm_vcx1_u32 (7, 0); /* { dg-error {coprocessor 7 is not enabled with \+cdecp7} } */
+  return res;
+}
+#pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c
new file mode 100644
index 0000000000000000000000000000000000000000..5140c3f521a628c4ccc4ca670876a0b0468efa37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/acle/cde_v_1_mve.c
@@ -0,0 +1,56 @@
+/* Test the CDE ACLE intrinsic.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_main_cde_mve_ok } */
+/* { dg-options "-save-temps -O2" } */
+/* { dg-add-options arm_v8_1m_main_cde_mve } */
+
+#include "arm_cde.h"
+
+#define TEST0(T, N, C, I) \
+T test_arm_##N##_##C##_##I () { \
+  return __arm_##N (C, I); \
+}
+
+#define TEST1(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, I); \
+}
+
+#define TEST2(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, I); \
+}
+
+#define TEST3(T, N, C, I) \
+T test_arm_##N##_##C##_##I (T a) { \
+  return __arm_##N (C, a, a, a, I); \
+}
+
+#define TEST_ALL(C) \
+TEST0 (uint32_t, vcx1_u32,	C, 0) \
+TEST1 (uint32_t, vcx1a_u32,	C, 0) \
+TEST1 (uint32_t, vcx2_u32,	C, 0) \
+TEST2 (uint32_t, vcx2a_u32,	C, 0) \
+TEST2 (uint32_t, vcx3_u32,	C, 0) \
+TEST3 (uint32_t, vcx3a_u32,	C, 0) \
+TEST0 (uint64_t, vcx1d_u64,	C, 0) \
+TEST1 (uint64_t, vcx1da_u64,	C, 0) \
+TEST1 (uint64_t, vcx2d_u64,	C, 0) \
+TEST2 (uint64_t, vcx2da_u64,	C, 0) \
+TEST2 (uint64_t, vcx3d_u64,	C, 0) \
+TEST3 (uint64_t, vcx3da_u64,	C, 0)
+
+TEST_ALL (0)
+
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp0, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp0, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp0, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp0, s[0-9]+, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp0, s[0-9]+, s[0-9]+, s[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1\tp0, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx1a\tp0, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2\tp0, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx2a\tp0, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3\tp0, d[0-9]+, d[0-9]+, d[0-9]+, #0} 1 } } */
+/* { dg-final { scan-assembler-times {\tvcx3a\tp0, d[0-9]+, d[0-9]+, d[0-9]+, #0} 1 } } */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH][Arm][2/4]  Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers
  2020-04-07 12:31       ` Dennis Zhang
@ 2020-04-07 14:07         ` Kyrylo Tkachov
  2020-04-08 15:25           ` Dennis Zhang
                             ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-04-07 14:07 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Dennis,

> -----Original Message-----
> From: Dennis Zhang <Dennis.Zhang@arm.com>
> Sent: 07 April 2020 13:31
> To: gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH][Arm][2/4] Custom Datapath Extension intrinsics:
> instructions using FPU/MVE S/D registers
> 
> Hi all,
> 
> This patch is updated to support DImode for vfp target as required by CDE.
> Changelog is updated as following.
> 
> Is this ready for commit please?

This is ok.
Has the first patch been updated and committed yet?
Thanks,
Kyrill

> 
> Cheers
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-04-07  Dennis Zhang  <dennis.zhang@arm.com>
>     Matthew Malcomson <matthew.malcomson@arm.com>
> 
> * config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
> (CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
> (CX_TERNARY_QUALIFIERS): Likewise.
> (ARM_BUILTIN_CDE_PATTERN_START): Likewise.
> (ARM_BUILTIN_CDE_PATTERN_END): Likewise.
> (arm_init_acle_builtins): Initialize CDE builtins.
> (arm_expand_acle_builtin): Check CDE constant operands.
> * config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the
> range
> of CDE constant operand.
> * config/arm/arm.c (arm_hard_regno_mode_ok): Support DImode for
> TARGET_VFP_BASE.
> (ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3):
> Likewise.
> * config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
> (__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
> (__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
> (__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
> (__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
> * config/arm/arm_cde_builtins.def: New file.
> * config/arm/iterators.md (V_reg): New attribute of SI.
> * config/arm/predicates.md (const_int_coproc_operand): New.
> (const_int_vcde1_operand, const_int_vcde2_operand): New.
> (const_int_vcde3_operand): New.
> * config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
> * config/arm/vfp.md (arm_vcx1<mode>): New entry.
> (arm_vcx1a<mode>, arm_vcx2<mode>, arm_vcx2a<mode>): Likewise.
> (arm_vcx3<mode>, arm_vcx3a<mode>): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-04-07  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * gcc.target/arm/acle/cde_v_1.c: New test.
> * gcc.target/arm/acle/cde_v_1_err.c: New test.
> * gcc.target/arm/acle/cde_v_1_mve.c: New test.
> 
> > Hi all,
> >
> > This patch is updated as attached.
> > It's rebased to the top. Is it ready for commit please?
> >
> > Cheers
> > Dennis
> >
> > > Hi all,
> > >
> > > This patch is part of a series that adds support for the ARMv8.m Custom
> Datapath Extension (CDE).
> > > It enables the ACLE intrinsics calling VCX1<A>, VCX2<A>, and VCX3<A>
> instructions who work with FPU/MVE 32-bit/64-bit registers.
> > >
> > > This patch depends on the CDE feature patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
> > > It also depends on the MVE framework patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
> > > ISA has been announced at
> https://developer.arm.com/architectures/instruction-sets/custom-
> instructions
> > >
> > > Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.
> > >
> > > Is it OK for commit please?
> > >
> > > Cheers
> > > Dennis
> > >

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
  2020-03-19 17:48       ` Kyrylo Tkachov
@ 2020-04-08 11:33         ` Dennis Zhang
  2020-04-08 12:34           ` Kyrylo Tkachov
  0 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-04-08 11:33 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 4308 bytes --]

Hi Kyrylo

> Hi Dennis,
>
> > -----Original Message-----
> > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > Sent: 19 March 2020 14:03
> > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> > Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> > (CDE): enable the feature
> >
> > Hi Kyrylo,
> >
> > >________________________________________
> > >From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > >Sent: Wednesday, March 18, 2020 9:04 AM
> > >To: Dennis Zhang; gcc-patches@gcc.gnu.org
> > >Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> > >Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > >Extension (CDE): enable the feature
> > >
> > >Hi Dennis,
> > >
> > >> -----Original Message-----
> > >> From: Dennis Zhang <Dennis.Zhang@arm.com>
> > >> Sent: 12 March 2020 12:06
> > >> To: gcc-patches@gcc.gnu.org
> > >> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> > >> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> > Tkachov
> > >> <Kyrylo.Tkachov@arm.com>
> > >> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> > >> (CDE): enable the feature
> > >>
> > >> Hi all,
> > >>
> > >> This patch is part of a series that adds support for the ARMv8.m
> > >> Custom Datapath Extension.
> > >> This patch defines the options cdecp0-cdecp7 for CLI to enable the
> > >> CDE on corresponding coprocessor 0-7.
> > >> It also adds new check-effective for CDE feature.
> > >>
> > >> ISA has been announced at
> > >> https://developer.arm.com/architectures/instruction-sets/custom-
> > >> instructions
> > >>
> > >> Regtested and bootstrapped.
> > >>
> > >> Is it OK to commit please?
> > >
> > >Can you please rebase this patch on top of the recent MVE commits?
> > >It currently doesn't apply cleanly to trunk.
> > >Thanks,
> > >Kyrill
> >
> > The rebase patches is as attached.
> > Is it OK to commit?
>
> Ok, with a few fixes...
>
> diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> new file mode 100644
> index 00000000000..97643a08405
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> @@ -0,0 +1,98 @@
> +/* Test for CDE #prama target macros.  */
> +/* { dg-do compile } */
>
> Typo in "pragma" in the comment.
>
>
> +# A series of routines are created to 1) check if a given architecture is
> +# effective (check_effective_target_*_ok) and then 2) give the corresponding
> +# flags that enable the architecture (add_options_for_*).
> +# The series includes:
> +#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
> +#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
> +#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
> +# Usage:
> +#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
> +#   /* { dg-add-options arm_v8m_main_cde } */
> +# The tests are valid for Arm.
> +
> +foreach { armfunc armflag armdef } {
>
>   New effective target checks need to be documented in doc/invoke.texi
>

Thanks a lot for the review.
The document has been updated and the changelog, too.
Is it ready to commit please?

Cheers
Dennis

gcc/ChangeLog:

2020-04-08  Dennis Zhang  <dennis.zhang@arm.com>

	* config.gcc: Add arm_cde.h.
	* config/arm/arm-c.c (arm_cpu_builtins): Define or undefine
	__ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
	* config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
	* config/arm/arm.c (arm_option_reconfigure_globals): Configure
	arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
	* config/arm/arm.h (TARGET_CDE): New macro.
	* config/arm/arm_cde.h: New file.
	* doc/invoke.texi: Document CDE options +cdecp[0-7].
	* doc/sourcebuild.texi (arm_v8m_main_cde_ok): Document new target
	supports option.
	(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

gcc/testsuite/ChangeLog:

2020-04-08  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/pragma_cde.c: New test.
	* lib/target-supports.exp (arm_v8m_main_cde_ok): New target support
	option.
	(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

[-- Attachment #2: arm-m-cde-cli-20200408.patch --]
[-- Type: application/octet-stream, Size: 13922 bytes --]

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 13e3cb753e2..7624c654c51 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -346,7 +346,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_mve.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h arm_mve.h arm_cde.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 73bdb9cfae0..7e92e8a83ae 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -237,6 +237,12 @@ arm_cpu_builtins (struct cpp_reader* pfile)
       builtin_define_with_int_value ("__ARM_FEATURE_COPROC", coproc_level);
     }
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_CDE", TARGET_CDE);
+  cpp_undef (pfile, "__ARM_FEATURE_CDE_COPROC");
+  if (TARGET_CDE)
+    builtin_define_with_int_value ("__ARM_FEATURE_CDE_COPROC",
+				   arm_arch_cde_coproc);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_MATMUL_INT8", TARGET_I8MM);
   def_or_undef_macro (pfile, "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC",
 		      TARGET_BF16_FP);
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 77b43090d69..fba34e556fb 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -211,6 +211,16 @@ define feature i8mm
 # Brain half-precision floating-point extension. Optional from v8.2-A.
 define feature bf16
 
+# Arm Custom Datapath Extension (CDE).
+define feature cdecp0
+define feature cdecp1
+define feature cdecp2
+define feature cdecp3
+define feature cdecp4
+define feature cdecp5
+define feature cdecp6
+define feature cdecp7
+
 # Feature groups.  Conventionally all (or mostly) upper case.
 # ALL_FPU lists all the feature bits associated with the floating-point
 # unit; these will all be removed if the floating-point unit is disabled
@@ -676,6 +686,14 @@ begin arch armv8-m.main
  option fp.dp add FPv5 FP_DBL
  option nofp remove ALL_FP
  option nodsp remove armv7em
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8-m.main
 
 begin arch armv8-r
@@ -707,6 +725,14 @@ begin arch armv8.1-m.main
  option nofp remove ALL_FP
  option mve add MVE
  option mve.fp add MVE_FP
+ option cdecp0 add cdecp0
+ option cdecp1 add cdecp1
+ option cdecp2 add cdecp2
+ option cdecp3 add cdecp3
+ option cdecp4 add cdecp4
+ option cdecp5 add cdecp5
+ option cdecp6 add cdecp6
+ option cdecp7 add cdecp7
 end arch armv8.1-m.main
 
 begin arch iwmmxt
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d5207e0d8f0..1d3974bf832 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1021,6 +1021,13 @@ int arm_arch_i8mm = 0;
 /* Nonzero if chip supports the BFloat16 instructions.  */
 int arm_arch_bf16 = 0;
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+int arm_arch_cde = 0;
+int arm_arch_cde_coproc = 0;
+const int arm_arch_cde_coproc_bits[] = {
+  0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
+};
+
 /* The condition codes of the ARM, and the inverse function.  */
 static const char * const arm_condition_codes[] =
 {
@@ -3740,6 +3747,21 @@ arm_option_reconfigure_globals (void)
       arm_fp16_format = ARM_FP16_FORMAT_IEEE;
     }
 
+  arm_arch_cde = 0;
+  arm_arch_cde_coproc = 0;
+  int cde_bits[] = {isa_bit_cdecp0, isa_bit_cdecp1, isa_bit_cdecp2,
+		    isa_bit_cdecp3, isa_bit_cdecp4, isa_bit_cdecp5,
+		    isa_bit_cdecp6, isa_bit_cdecp7};
+  for (int i = 0, e = ARRAY_SIZE (cde_bits); i < e; i++)
+    {
+      int cde_bit = bitmap_bit_p (arm_active_target.isa, cde_bits[i]);
+      if (cde_bit)
+	{
+	  arm_arch_cde |= cde_bit;
+	  arm_arch_cde_coproc |= arm_arch_cde_coproc_bits[i];
+	}
+    }
+
   /* And finally, set up some quirks.  */
   arm_arch_no_volatile_ce
     = bitmap_bit_p (arm_active_target.isa, isa_bit_quirk_no_volatile_ce);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index fb55f73c62b..343235d0cbc 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -354,6 +354,9 @@ emission of floating point pcs attributes.  */
 /* Nonzero if disallow volatile memory access in IT block.  */
 #define TARGET_NO_VOLATILE_CE		(arm_arch_no_volatile_ce)
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+#define TARGET_CDE	(arm_arch_cde && arm_arch8 && !arm_arch_notm)
+
 /* Should constant I be slplit for OP.  */
 #define DONT_EARLY_SPLIT_CONSTANT(i, op) \
 				((optimize >= 2) \
@@ -568,6 +571,11 @@ extern int arm_arch_i8mm;
 /* Nonzero if chip supports the BFloat16 instructions.  */
 extern int arm_arch_bf16;
 
+/* Nonzero if chip supports the Custom Datapath Extension.  */
+extern int arm_arch_cde;
+extern int arm_arch_cde_coproc;
+extern const int arm_arch_cde_coproc_bits[];
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/arm_cde.h b/gcc/config/arm/arm_cde.h
new file mode 100644
index 00000000000..f975754632f
--- /dev/null
+++ b/gcc/config/arm/arm_cde.h
@@ -0,0 +1,40 @@
+/* Arm Custom Datapath Extension (CDE) intrinsics include file.
+
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   Contributed by Arm Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_CDE_H
+#define _GCC_ARM_CDE_H 1
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e9e1683e9a8..7efce1f0527 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -18678,6 +18678,10 @@ The single- and double-precision floating-point instructions.
 
 @item +nofp
 Disable the floating-point extension.
+
+@item +cdecp0, +cdecp1, ... , +cdecp7
+Enable the Custom Datapath Extension (CDE) on selected coprocessors according
+to the numbers given in the options in the range 0 to 7.
 @end table
 
 @item  armv8-m.main
@@ -18696,6 +18700,10 @@ The single- and double-precision floating-point instructions.
 
 @item +nofp
 Disable the floating-point extension.
+
+@item +cdecp0, +cdecp1, ... , +cdecp7
+Enable the Custom Datapath Extension (CDE) on selected coprocessors according
+to the numbers given in the options in the range 0 to 7.
 @end table
 
 @item armv8-r
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 91b46cc654b..26a57e3199b 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1904,6 +1904,21 @@ ARM target supports options to generate instructions from ARMv8.1-M with
 the M-Profile Vector Extension (MVE). Some multilibs may be incompatible
 with these options.
 
+@item arm_v8m_main_cde
+ARM target supports options to generate instructions from ARMv8-M with
+the Custom Datapath Extension (CDE). Some multilibs may be incompatible
+with these options.
+
+@item arm_v8m_main_cde_fp
+ARM target supports options to generate instructions from ARMv8-M with
+the Custom Datapath Extension (CDE) and floating-point (VFP).
+Some multilibs may be incompatible with these options.
+
+@item arm_v8_1m_main_cde_mve
+ARM target supports options to generate instructions from ARMv8.1-M with
+the Custom Datapath Extension (CDE) and M-Profile Vector Extension (MVE).
+Some multilibs may be incompatible with these options.
+
 @item arm_prefer_ldrd_strd
 ARM target prefers @code{LDRD} and @code{STRD} instructions over
 @code{LDM} and @code{STM} instructions.
diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c b/gcc/testsuite/gcc.target/arm/pragma_cde.c
new file mode 100644
index 00000000000..b66e22d08cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
@@ -0,0 +1,98 @@
+/* Test for CDE #pragma target macros.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8m_main_cde_ok } */
+/* { dg-add-options arm_v8m_main_cde } */
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main")
+#ifdef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is defined but should not be"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x1
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp1")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x2
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp2")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x4
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp3")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x8
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp4")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x10
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp5")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x20
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp6")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x40
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp7")
+#ifndef __ARM_FEATURE_CDE
+#error "__ARM_FEATURE_CDE is not defined but should be"
+#endif
+#if __ARM_FEATURE_CDE_COPROC != 0x80
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
+
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8-m.main+cdecp0+cdecp1")
+#if __ARM_FEATURE_CDE_COPROC != 0x3
+#error "__ARM_FEATURE_CDE_COPROC is not defined as configured"
+#endif
+#pragma GCC pop_options
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 3654e7bc232..623dd4fa44a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5103,6 +5103,65 @@ proc add_options_for_arm_v8_2a_bf16_neon { flags } {
     return "$flags $et_arm_v8_2a_bf16_neon_flags"
 }
 
+# A series of routines are created to 1) check if a given architecture is
+# effective (check_effective_target_*_ok) and then 2) give the corresponding
+# flags that enable the architecture (add_options_for_*).
+# The series includes:
+#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
+#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
+#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
+# Usage:
+#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
+#   /* { dg-add-options arm_v8m_main_cde } */
+# The tests are valid for Arm.
+
+foreach { armfunc armflag armdef } {
+	arm_v8m_main_cde
+		"-march=armv8-m.main+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE)"
+	arm_v8m_main_cde_fp
+		"-march=armv8-m.main+fp+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE) && defined (__ARM_FP)"
+	arm_v8_1m_main_cde_mve
+		"-march=armv8.1-m.main+mve+cdecp0 -mthumb"
+		"defined (__ARM_FEATURE_CDE) && defined (__ARM_FEATURE_MVE)"
+	} {
+    eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
+	proc check_effective_target_FUNC_ok_nocache { } {
+	    global et_FUNC_flags
+	    set et_FUNC_flags ""
+
+	    if { ![istarget arm*-*-*] } {
+		return 0;
+	    }
+
+	    if { [check_no_compiler_messages_nocache FUNC_ok assembly {
+		#if !(DEF)
+		#error "DEF failed"
+		#endif
+	    } "FLAG"] } {
+		    set et_FUNC_flags "FLAG"
+		    return 1
+	    }
+
+	    return 0;
+	}
+
+	proc check_effective_target_FUNC_ok { } {
+	    return [check_cached_effective_target FUNC_ok \
+		    check_effective_target_FUNC_ok_nocache]
+	}
+
+	proc add_options_for_FUNC { flags } {
+	    if { ! [check_effective_target_FUNC_ok] } {
+		return "$flags"
+	    }
+	    global et_FUNC_flags
+	    return "$flags $et_FUNC_flags"
+	}
+    }]
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
  2020-04-08 11:33         ` Dennis Zhang
@ 2020-04-08 12:34           ` Kyrylo Tkachov
  2020-04-08 15:19             ` Dennis Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-04-08 12:34 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan



> -----Original Message-----
> From: Dennis Zhang <Dennis.Zhang@arm.com>
> Sent: 08 April 2020 12:34
> To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> (CDE): enable the feature
> 
> Hi Kyrylo
> 
> > Hi Dennis,
> >
> > > -----Original Message-----
> > > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > > Sent: 19 March 2020 14:03
> > > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-
> patches@gcc.gnu.org
> > > Cc: nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>;
> > > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> > > Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> Extension
> > > (CDE): enable the feature
> > >
> > > Hi Kyrylo,
> > >
> > > >________________________________________
> > > >From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > >Sent: Wednesday, March 18, 2020 9:04 AM
> > > >To: Dennis Zhang; gcc-patches@gcc.gnu.org
> > > >Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> > > >Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > > >Extension (CDE): enable the feature
> > > >
> > > >Hi Dennis,
> > > >
> > > >> -----Original Message-----
> > > >> From: Dennis Zhang <Dennis.Zhang@arm.com>
> > > >> Sent: 12 March 2020 12:06
> > > >> To: gcc-patches@gcc.gnu.org
> > > >> Cc: nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>;
> > > >> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> > > Tkachov
> > > >> <Kyrylo.Tkachov@arm.com>
> > > >> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> Extension
> > > >> (CDE): enable the feature
> > > >>
> > > >> Hi all,
> > > >>
> > > >> This patch is part of a series that adds support for the ARMv8.m
> > > >> Custom Datapath Extension.
> > > >> This patch defines the options cdecp0-cdecp7 for CLI to enable the
> > > >> CDE on corresponding coprocessor 0-7.
> > > >> It also adds new check-effective for CDE feature.
> > > >>
> > > >> ISA has been announced at
> > > >> https://developer.arm.com/architectures/instruction-sets/custom-
> > > >> instructions
> > > >>
> > > >> Regtested and bootstrapped.
> > > >>
> > > >> Is it OK to commit please?
> > > >
> > > >Can you please rebase this patch on top of the recent MVE commits?
> > > >It currently doesn't apply cleanly to trunk.
> > > >Thanks,
> > > >Kyrill
> > >
> > > The rebase patches is as attached.
> > > Is it OK to commit?
> >
> > Ok, with a few fixes...
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c
> b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> > new file mode 100644
> > index 00000000000..97643a08405
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> > @@ -0,0 +1,98 @@
> > +/* Test for CDE #prama target macros.  */
> > +/* { dg-do compile } */
> >
> > Typo in "pragma" in the comment.
> >
> >
> > +# A series of routines are created to 1) check if a given architecture is
> > +# effective (check_effective_target_*_ok) and then 2) give the
> corresponding
> > +# flags that enable the architecture (add_options_for_*).
> > +# The series includes:
> > +#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
> > +#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
> > +#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
> > +# Usage:
> > +#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
> > +#   /* { dg-add-options arm_v8m_main_cde } */
> > +# The tests are valid for Arm.
> > +
> > +foreach { armfunc armflag armdef } {
> >
> >   New effective target checks need to be documented in doc/invoke.texi
> >
> 
> Thanks a lot for the review.
> The document has been updated and the changelog, too.
> Is it ready to commit please?

Ok.
Thanks,
Kyrill

> 
> Cheers
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-04-08  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * config.gcc: Add arm_cde.h.
> * config/arm/arm-c.c (arm_cpu_builtins): Define or undefine
> __ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
> * config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
> * config/arm/arm.c (arm_option_reconfigure_globals): Configure
> arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
> * config/arm/arm.h (TARGET_CDE): New macro.
> * config/arm/arm_cde.h: New file.
> * doc/invoke.texi: Document CDE options +cdecp[0-7].
> * doc/sourcebuild.texi (arm_v8m_main_cde_ok): Document new target
> supports option.
> (arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-04-08  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * gcc.target/arm/pragma_cde.c: New test.
> * lib/target-supports.exp (arm_v8m_main_cde_ok): New target support
> option.
> (arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
  2020-04-08 12:34           ` Kyrylo Tkachov
@ 2020-04-08 15:19             ` Dennis Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-04-08 15:19 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Kyrylo,

> ________________________________________
> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Sent: Wednesday, April 8, 2020 1:34 PM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature
>
> > -----Original Message-----
> > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > Sent: 08 April 2020 12:34
> > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> > Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension
> > (CDE): enable the feature
> >
> > Hi Kyrylo
> >
> > > Hi Dennis,
> > >
> > > > -----Original Message-----
> > > > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > > > Sent: 19 March 2020 14:03
> > > > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; gcc-
> > patches@gcc.gnu.org
> > > > Cc: nd <nd@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>;
> > > > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
> > > > Subject: Re: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > Extension
> > > > (CDE): enable the feature
> > > >
> > > > Hi Kyrylo,
> > > >
> > > > >________________________________________
> > > > >From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > > > >Sent: Wednesday, March 18, 2020 9:04 AM
> > > > >To: Dennis Zhang; gcc-patches@gcc.gnu.org
> > > > >Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> > > > >Subject: RE: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > > > >Extension (CDE): enable the feature
> > > > >
> > > > >Hi Dennis,
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Dennis Zhang <Dennis.Zhang@arm.com>
> > > > >> Sent: 12 March 2020 12:06
> > > > >> To: gcc-patches@gcc.gnu.org
> > > > >> Cc: nd <nd@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>;
> > > > >> Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> > > > Tkachov
> > > > >> <Kyrylo.Tkachov@arm.com>
> > > > >> Subject: [PATCH][Arm][1/3] Support for Arm Custom Datapath
> > Extension
> > > > >> (CDE): enable the feature
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >> This patch is part of a series that adds support for the ARMv8.m
> > > > >> Custom Datapath Extension.
> > > > >> This patch defines the options cdecp0-cdecp7 for CLI to enable the
> > > > >> CDE on corresponding coprocessor 0-7.
> > > > >> It also adds new check-effective for CDE feature.
> > > > >>
> > > > >> ISA has been announced at
> > > > >> https://developer.arm.com/architectures/instruction-sets/custom-
> > > > >> instructions
> > > > >>
> > > > >> Regtested and bootstrapped.
> > > > >>
> > > > >> Is it OK to commit please?
> > > > >
> > > > >Can you please rebase this patch on top of the recent MVE commits?
> > > > >It currently doesn't apply cleanly to trunk.
> > > > >Thanks,
> > > > >Kyrill
> > > >
> > > > The rebase patches is as attached.
> > > > Is it OK to commit?
> > >
> > > Ok, with a few fixes...
> > >
> > > diff --git a/gcc/testsuite/gcc.target/arm/pragma_cde.c
> > b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> > > new file mode 100644
> > > index 00000000000..97643a08405
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/arm/pragma_cde.c
> > > @@ -0,0 +1,98 @@
> > > +/* Test for CDE #prama target macros.  */
> > > +/* { dg-do compile } */
> > >
> > > Typo in "pragma" in the comment.
> > >
> > >
> > > +# A series of routines are created to 1) check if a given architecture is
> > > +# effective (check_effective_target_*_ok) and then 2) give the
> > corresponding
> > > +# flags that enable the architecture (add_options_for_*).
> > > +# The series includes:
> > > +#   arm_v8m_main_cde: Armv8-m CDE (Custom Datapath Extension).
> > > +#   arm_v8m_main_cde_fp: Armv8-m CDE with FP registers.
> > > +#   arm_v8_1m_main_cde_mve: Armv8.1-m CDE with MVE.
> > > +# Usage:
> > > +#   /* { dg-require-effective-target arm_v8m_main_cde_ok } */
> > > +#   /* { dg-add-options arm_v8m_main_cde } */
> > > +# The tests are valid for Arm.
> > > +
> > > +foreach { armfunc armflag armdef } {
> > >
> > >   New effective target checks need to be documented in doc/invoke.texi
> > >
> >
> > Thanks a lot for the review.
> > The document has been updated and the changelog, too.
> > Is it ready to commit please?
>
> Ok.
> Thanks,
> Kyrill

This patch has been committed as 975e6670c428b032aa6ec600f57082d3cfb57393.

Many thanks!
Dennis

>
> >
> > Cheers
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2020-04-08  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * config.gcc: Add arm_cde.h.
> > * config/arm/arm-c.c (arm_cpu_builtins): Define or undefine
> > __ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
> > * config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
> > * config/arm/arm.c (arm_option_reconfigure_globals): Configure
> > arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
> > * config/arm/arm.h (TARGET_CDE): New macro.
> > * config/arm/arm_cde.h: New file.
> > * doc/invoke.texi: Document CDE options +cdecp[0-7].
> > * doc/sourcebuild.texi (arm_v8m_main_cde_ok): Document new target
> > supports option.
> > (arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-04-08  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * gcc.target/arm/pragma_cde.c: New test.
> > * lib/target-supports.exp (arm_v8m_main_cde_ok): New target support
> > option.
> > (arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm][2/4]  Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers
  2020-04-07 14:07         ` Kyrylo Tkachov
@ 2020-04-08 15:25           ` Dennis Zhang
  2020-08-17 18:41           ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
  2020-09-16 16:00           ` [PATCH][Arm] Enable MVE SIMD modes for vectorization Dennis Zhang
  2 siblings, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-04-08 15:25 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi kyrylo,

> ________________________________________
> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Sent: Tuesday, April 7, 2020 3:07 PM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm][2/4]  Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers
>
> Hi Dennis,
>
> > -----Original Message-----
> > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > Sent: 07 April 2020 13:31
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>;
> > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo
> > Tkachov <Kyrylo.Tkachov@arm.com>
> > Subject: Re: [PATCH][Arm][2/4] Custom Datapath Extension intrinsics:
> > instructions using FPU/MVE S/D registers
> >
> > Hi all,
> >
> > This patch is updated to support DImode for vfp target as required by CDE.
> > Changelog is updated as following.
> >
> > Is this ready for commit please?
>
> This is ok.
> Has the first patch been updated and committed yet?
> Thanks,
> Kyrill
>

This patch has been committed as 07b9bfd02b88cad2f6b3f50ad610dd75cb989ed3.

Many thanks
Dennis

> >
> > Cheers
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2020-04-07  Dennis Zhang  <dennis.zhang@arm.com>
> >     Matthew Malcomson <matthew.malcomson@arm.com>
> >
> > * config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
> > (CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
> > (CX_TERNARY_QUALIFIERS): Likewise.
> > (ARM_BUILTIN_CDE_PATTERN_START): Likewise.
> > (ARM_BUILTIN_CDE_PATTERN_END): Likewise.
> > (arm_init_acle_builtins): Initialize CDE builtins.
> > (arm_expand_acle_builtin): Check CDE constant operands.
> > * config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the
> > range
> > of CDE constant operand.
> > * config/arm/arm.c (arm_hard_regno_mode_ok): Support DImode for
> > TARGET_VFP_BASE.
> > (ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3):
> > Likewise.
> > * config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
> > (__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
> > (__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
> > (__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
> > (__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
> > * config/arm/arm_cde_builtins.def: New file.
> > * config/arm/iterators.md (V_reg): New attribute of SI.
> > * config/arm/predicates.md (const_int_coproc_operand): New.
> > (const_int_vcde1_operand, const_int_vcde2_operand): New.
> > (const_int_vcde3_operand): New.
> > * config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
> > * config/arm/vfp.md (arm_vcx1<mode>): New entry.
> > (arm_vcx1a<mode>, arm_vcx2<mode>, arm_vcx2a<mode>): Likewise.
> > (arm_vcx3<mode>, arm_vcx3a<mode>): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-04-07  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * gcc.target/arm/acle/cde_v_1.c: New test.
> > * gcc.target/arm/acle/cde_v_1_err.c: New test.
> > * gcc.target/arm/acle/cde_v_1_mve.c: New test.
> >
> > > Hi all,
> > >
> > > This patch is updated as attached.
> > > It's rebased to the top. Is it ready for commit please?
> > >
> > > Cheers
> > > Dennis
> > >
> > > > Hi all,
> > > >
> > > > This patch is part of a series that adds support for the ARMv8.m Custom
> > Datapath Extension (CDE).
> > > > It enables the ACLE intrinsics calling VCX1<A>, VCX2<A>, and VCX3<A>
> > instructions who work with FPU/MVE 32-bit/64-bit registers.
> > > >
> > > > This patch depends on the CDE feature patch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-March/541921.html
> > > > It also depends on the MVE framework patch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540415.html
> > > > ISA has been announced at
> > https://developer.arm.com/architectures/instruction-sets/custom-
> > instructions
> > > >
> > > > Regtested and bootstrapped for arm-none-linux-gnueabi-armv8-m.main.
> > > >
> > > > Is it OK for commit please?
> > > >
> > > > Cheers
> > > > Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-04-07 14:07         ` Kyrylo Tkachov
  2020-04-08 15:25           ` Dennis Zhang
@ 2020-08-17 18:41           ` Dennis Zhang
  2020-08-21 22:33             ` Ramana Radhakrishnan
                               ` (3 more replies)
  2020-09-16 16:00           ` [PATCH][Arm] Enable MVE SIMD modes for vectorization Dennis Zhang
  2 siblings, 4 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-08-17 18:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, nd, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 2909 bytes --]


Hi all,

This patch enables MVE vsub instructions for auto-vectorization.
It adds RTL templates for MVE vsub instructions using 'minus' instead of 
unspec expression to make the instructions recognizable for vectorization.
MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is 
modified to use a mode iterator that selects available modes for various 
targets correspondingly.
MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to 
support vectorization.

This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests 
generate wrong instruction numbers because of unexpected icf optimization.
This bug is exposed by the MVE vector modes enabled in this patch, 
therefore it is corrected in this patch to avoid test failures.

MVE instructions are documented here: 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics

The patch is regtested for arm-none-eabi and bootstrapped for 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
	* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
	(TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
	(TARGET_NEON_MVE_HFP): Likewise.
	* config/arm/iterators.md (VSEL): New mode iterator to select modes
	for corresponding targets.
	* config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
	using expression 'minus'.
	(mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
	* config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
	sub<mode>3 in vec-common.md
	* config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use VSEL
	to select available modes. Exclude TARGET_NEON_FP16INST from
	TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
	originally in neon.md.

gcc/testsuite/ChangeLog:

2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
	option -fno-ipa-icf and change the instruction count from 8 to 16.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
	* gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
	* gcc.target/arm/mve/vect/vect_sub_0.c: New test.
	* gcc.target/arm/mve/vect/vect_sub_1.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mve-vect-sub-20200810.patch --]
[-- Type: text/x-patch; name="mve-vect-sub-20200810.patch", Size: 16965 bytes --]

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 30e1d6dc994..eb8c9599357 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -334,6 +334,14 @@ emission of floating point pcs attributes.  */
 						isa_bit_mve_float) \
 			       && !TARGET_GENERAL_REGS_ONLY)
 
+#define TARGET_NEON_IWMMXT	(TARGET_NEON || TARGET_REALLY_IWMMXT)
+#define TARGET_NEON_IWMMXT_MVE	(TARGET_NEON || TARGET_REALLY_IWMMXT \
+				 || TARGET_HAVE_MVE)
+#define TARGET_NEON_IWMMXT_MVE_FP ((TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT) \
+				   || TARGET_NEON || TARGET_REALLY_IWMMXT)
+#define TARGET_NEON_MVE_HFP	((TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT) \
+				 || TARGET_NEON_FP16INST)
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
    alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
    registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6b7ca829f1c..dcbcbbeced0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28913,6 +28913,30 @@ arm_preferred_simd_mode (scalar_mode mode)
       default:;
       }
 
+  if (TARGET_HAVE_MVE)
+    switch (mode)
+      {
+      case QImode:
+	return V16QImode;
+      case HImode:
+	return V8HImode;
+      case SImode:
+	return V4SImode;
+
+      default:;
+      }
+
+  if (TARGET_HAVE_MVE_FLOAT)
+    switch (mode)
+      {
+      case HFmode:
+	return V8HFmode;
+      case SFmode:
+	return V4SFmode;
+
+      default:;
+      }
+
   return word_mode;
 }
 
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722..52c3a8a4355 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -80,6 +80,19 @@
 ;; Integer and float modes supported by Neon and IWMMXT but not MVE.
 (define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])
 
+;; Select modes for NEON, IWMMXT and MVE.
+(define_mode_iterator VSEL [(V16QI "TARGET_NEON_IWMMXT_MVE")
+			    (V8HI  "TARGET_NEON_IWMMXT_MVE")
+			    (V4SI  "TARGET_NEON_IWMMXT_MVE")
+			    (V4SF  "TARGET_NEON_IWMMXT_MVE_FP")
+			    (V8HF  "TARGET_NEON_MVE_HFP")
+			    (V4HF  "TARGET_NEON_FP16INST")
+			    (V2SI  "TARGET_NEON_IWMMXT")
+			    (V4HI  "TARGET_NEON_IWMMXT")
+			    (V8QI  "TARGET_NEON_IWMMXT")
+			    (V2SF  "TARGET_NEON_IWMMXT")
+			    (V2DI  "TARGET_NEON_IWMMXT")])
+
 ;; Integer and float modes supported by Neon and IWMMXT, except V2DI.
 (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..7853b642262 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2574,6 +2574,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vsubq<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(minus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		     (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vabdq_f])
 ;;
@@ -3480,9 +3491,8 @@
 (define_insn "mve_vsubq_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VSUBQ_F))
+	(minus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		     (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vsub.f%#<V_sz_elem>\t%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7b51d8ab6..ec933b5711e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -552,6 +552,10 @@
     (const_string "neon_add<q>")))]
 )
 
+;; These insns implement the patterns defined by the expander sub<mode>3
+;; in vec-common.md file. For NEON fp16 extension, the pattern is only valid
+;; when flag-unsafe-math-optimizations is enabled.
+
 (define_insn "*sub<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
@@ -564,17 +568,6 @@
                     (const_string "neon_sub<q>")))]
 )
 
-(define_insn "sub<mode>3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (minus:VH
-    (match_operand:VH 1 "s_register_operand" "w")
-    (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_sub<q>")]
-)
-
 (define_insn "sub<mode>3_fp16"
  [(set
    (match_operand:VH 0 "s_register_operand" "=w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index b7e3619caf4..98664b17585 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -120,15 +120,20 @@
 })
 
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
-;; patterns separately for IWMMXT and Neon.
+;; patterns separately for MVE, IWMMXT and Neon.
 
 (define_expand "sub<mode>3"
-  [(set (match_operand:VALL 0 "s_register_operand")
-        (minus:VALL (match_operand:VALL 1 "s_register_operand")
-                    (match_operand:VALL 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+  [(set (match_operand:VSEL 0 "s_register_operand")
+	(minus:VSEL (match_operand:VSEL 1 "s_register_operand")
+		    (match_operand:VSEL 2 "s_register_operand")))]
+  "((TARGET_NEON && !TARGET_NEON_FP16INST)
+    && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+	|| flag_unsafe_math_optimizations))
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
+   || (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT
+       && VALID_MVE_SF_MODE(<MODE>mode))
+   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
 {
 })
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
index f59f69734ed..4b8e252d998 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f16 (r7, vreinterpretq_f16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
index dac47c7e924..9c4cec7f648 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f32 (r7, vreinterpretq_f32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
index edc2f2f3bc6..67c64ff5ca7 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s16 (r7, vreinterpretq_s16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
index 880de06a781..e213383e799 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s32 (r7, vreinterpretq_s32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
index b0e81542956..7f5b3991602 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -42,4 +42,4 @@ foo1 (mve_pred16_t __p)
   return vpselq_s64 (r7, vreinterpretq_s64 (value9), __p);
 }
 
-/* { dg-final { scan-assembler-times "vpsel" 8 } } */
+/* { dg-final { scan-assembler-times "vpsel" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
index a5ceebb10b9..59b2dc4a598 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s8 (r7, vreinterpretq_s8 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i8" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i8" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
index cd31c23500a..5ae5b6941f9 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u16 (r7, vreinterpretq_u16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
index faa66c9e1cc..85f10ad6a04 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u32 (r7, vreinterpretq_u32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
index 853b28a2aac..2786771e85e 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -42,4 +42,4 @@ foo1 (mve_pred16_t __p)
   return vpselq_u64 (r7, vreinterpretq_u64 (value9), __p);
 }
 
-/* { dg-final { scan-assembler-times "vpsel" 8 } } */
+/* { dg-final { scan-assembler-times "vpsel" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
index bdf8cd588e1..2e87fe7da6a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u8 (r7, vreinterpretq_u8 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i8" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i8" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/mve.exp b/gcc/testsuite/gcc.target/arm/mve/mve.exp
index e84cb068940..4a651438eaa 100644
--- a/gcc/testsuite/gcc.target/arm/mve/mve.exp
+++ b/gcc/testsuite/gcc.target/arm/mve/mve.exp
@@ -43,6 +43,8 @@ dg-init
 # Main loop.
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/intrinsics/*.\[cCS\]]] \
 	"" $DEFAULT_CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vect/*.\[cCS\]]] \
+	"" $DEFAULT_CFLAGS
 
 # All done.
 set dg_runtest_extra_prunes ""
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_0.c b/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_0.c
new file mode 100644
index 00000000000..68af9f0c316
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_0.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_i16 (int16_t * dest, int16_t * a, int16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_i8 (int8_t * dest, int8_t * a, int8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_1.c b/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_1.c
new file mode 100644
index 00000000000..3106f4aa6f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3" } */
+
+void test_vsub_f32 (float * dest, float * a, float * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-08-17 18:41           ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
@ 2020-08-21 22:33             ` Ramana Radhakrishnan
  2020-09-07  7:20               ` Dennis Zhang
  2020-10-06 16:46             ` Dennis Zhang
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Ramana Radhakrishnan @ 2020-08-21 22:33 UTC (permalink / raw)
  To: Dennis Zhang; +Cc: gcc-patches, nd, Ramana Radhakrishnan, Richard Earnshaw

On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang <Dennis.Zhang@arm.com> wrote:
>
>
> Hi all,
>
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
>
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>

Hi Dennis,

Thanks for this patch . However a quick read suggests  at first glance
that it could do with some refactoring or indeed further breaking
down.

1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
the motivation for obviously on a quick read. I'll try and read that
again. Please document why these complex TARGET_ macros exist and how
they are expected to be used in the machine description and what they
are indicated to do.
2. It seems odd that we would have
 "&& ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+    || flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
apply this to TARGET_MVE_FLOAT in the sub<mode>3 expander. The point
is that if it isn't safe to vectorize a subtract for Neon, why is it
safe to do the same for MVE ? This was done in 2010 by Julian to fix
PR target/43703 - isn't this applicable on MVE as well ?
3. I'm also going to quibble a bit about the use of VSEL as the name
of an iterator as that conflates it with the instruction vsel and it's
not obvious what's going on here.


> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>

I'm a bit confused as to why this got exposed because of the new MVE
vector modes exposed by this patch.

> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
>
Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
--with-fpu=neon in the configuration ?


> Is it OK for trunk please?



Ramana

>
> Thanks
> Dennis
>
> gcc/ChangeLog:
>
> 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
>
>         * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>         * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>         (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>         (TARGET_NEON_MVE_HFP): Likewise.
>         * config/arm/iterators.md (VSEL): New mode iterator to select modes
>         for corresponding targets.
>         * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
>         using expression 'minus'.
>         (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
>         * config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
>         sub<mode>3 in vec-common.md
>         * config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use VSEL
>         to select available modes. Exclude TARGET_NEON_FP16INST from
>         TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>         originally in neon.md.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
>
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>         option -fno-ipa-icf and change the instruction count from 8 to 16.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>         * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>         * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>         * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>         * gcc.target/arm/mve/vect/vect_sub_1.c: New test.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-08-21 22:33             ` Ramana Radhakrishnan
@ 2020-09-07  7:20               ` Dennis Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-09-07  7:20 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: gcc-patches, nd, Ramana Radhakrishnan, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 6519 bytes --]

Hi Ramana,

On 8/21/20 10:33 PM, Ramana Radhakrishnan wrote:
> On Mon, Aug 17, 2020 at 7:42 PM Dennis Zhang <Dennis.Zhang@arm.com> wrote:
>>
>>
>> Hi all,
>>
>> This patch enables MVE vsub instructions for auto-vectorization.
>> It adds RTL templates for MVE vsub instructions using 'minus' instead of
>> unspec expression to make the instructions recognizable for vectorization.
>> MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
>> modified to use a mode iterator that selects available modes for various
>> targets correspondingly.
>> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
>> support vectorization.
>>
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
>> MVE instructions are documented here:
>> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>>
> 
> Hi Dennis,
> 
> Thanks for this patch . However a quick read suggests  at first glance
> that it could do with some refactoring or indeed further breaking
> down.
> 
> 1. The refactor for TARGET_NEON_IWWMMXT and friends which I don't get
> the motivation for obviously on a quick read. I'll try and read that
> again. Please document why these complex TARGET_ macros exist and how
> they are expected to be used in the machine description and what they
> are indicated to do.

Thanks for the questions.
The macros are used in the iterators as conditions to enable modes 
separately for different targets. The reason to define these macros is 
to make the iterators short.
And about why using conditions for the iterators, the aim is to put 
different modes in a single expander. Otherwise the expander would 
repeat several times for different sets of modes supported by different 
targets.

> 2. It seems odd that we would have
>   "&& ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
> +    || flag_unsafe_math_optimizations))" apply to TARGET_NEON but not
> apply this to TARGET_MVE_FLOAT in the sub<mode>3 expander. The point
> is that if it isn't safe to vectorize a subtract for Neon, why is it
> safe to do the same for MVE ? This was done in 2010 by Julian to fix
> PR target/43703 - isn't this applicable on MVE as well ?

I agree with this after investigation. I've add 
flag_unsafe_math_optimizations fot MVE_FLOAT target.

> 3. I'm also going to quibble a bit about the use of VSEL as the name
> of an iterator as that conflates it with the instruction vsel and it's
> not obvious what's going on here.

I have changed the name to VNIM_COND, which means NONE, IWWMMXT and MVE 
according to conditions.
I've add comments to document the aim of the iterator.
Please let me know if you think it needs further fix.

> 
> 
>> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
>> generate wrong instruction numbers because of unexpected icf optimization.
>> This bug is exposed by the MVE vector modes enabled in this patch,
>> therefore it is corrected in this patch to avoid test failures.
>>
> 
> I'm a bit confused as to why this got exposed because of the new MVE
> vector modes exposed by this patch.

The aim of the tests is only to check the reinterpret intrinsics working 
well.
However the two functions in each test contain icf optimization pattern 
and then the second function is folded due to same code. The icf pattern 
is not expected but to make the test pass, the author only checked the 
instruction count for the first function.
With my patch that enables MVE vector modes in arm_preferred_simd_mode, 
the estimated code size is smaller so that the code is inlined from the 
first function back to the second one in inlining optimization after icf 
optimization. Then the instruction count changes.
Because the icf is not the expected pattern to be tested but causes 
above mentioned issues, -fno-ipa-icf is used to avoid unstable 
instruction count in these tests.

> 
>> The patch is regtested for arm-none-eabi and bootstrapped for
>> arm-none-linux-gnueabihf.
>>
> Bootstrapped and regression tested for arm-none-linux-gnueabihf with a
> --with-fpu=neon in the configuration ?

Yes, for arm-none-linux-gnueabihf bootstrap there is --with-fpu=neon.
Should I test it without this configuration?

The new patch is attached.
I updated the comments for the iterator and the macros.

Many thanks!
Dennis

gcc/ChangeLog:

2020-08-27  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
	* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
	(TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
	(TARGET_NEON_MVE_HFP): Likewise.
	* config/arm/iterators.md (VNIM_COND): New mode iterator to enable
	modes according to corresponding targets.
	* config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
	using expression 'minus'.
	(mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
	* config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
	sub<mode>3 in vec-common.md
	* config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use VSEL
	to select available modes. Exclude TARGET_NEON_FP16INST from
	TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
	originally in neon.md.

gcc/testsuite/ChangeLog:

2020-08-27  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
	option -fno-ipa-icf and change the instruction count from 8 to 16.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
	* gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
	* gcc.target/arm/mve/vect/vect_sub_0.c: New test.
	* gcc.target/arm/mve/vect/vect_sub_1.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mve-vect-sub-20200902.patch --]
[-- Type: text/x-patch; name="mve-vect-sub-20200902.patch", Size: 17481 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd78141519e..c50d5aca6a9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28964,6 +28964,30 @@ arm_preferred_simd_mode (scalar_mode mode)
       default:;
       }
 
+  if (TARGET_HAVE_MVE)
+    switch (mode)
+      {
+      case QImode:
+	return V16QImode;
+      case HImode:
+	return V8HImode;
+      case SImode:
+	return V4SImode;
+
+      default:;
+      }
+
+  if (TARGET_HAVE_MVE_FLOAT)
+    switch (mode)
+      {
+      case HFmode:
+	return V8HFmode;
+      case SFmode:
+	return V4SFmode;
+
+      default:;
+      }
+
   return word_mode;
 }
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3887c51eebe..4edc31b7c55 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -334,6 +334,17 @@ emission of floating point pcs attributes.  */
 						isa_bit_mve_float) \
 			       && !TARGET_GENERAL_REGS_ONLY)
 
+/* Combinations of NEON, NEON_FP16, IWMMXT, MVE and MVE_FLOAT targets.
+   They are used in iterators as conditions to enable modes separately
+   for different targets. The aim is to make the iterators short.  */
+#define TARGET_NEON_IWMMXT	(TARGET_NEON || TARGET_REALLY_IWMMXT)
+#define TARGET_NEON_IWMMXT_MVE	(TARGET_NEON || TARGET_REALLY_IWMMXT \
+				 || TARGET_HAVE_MVE)
+#define TARGET_NEON_IWMMXT_MVE_FP ((TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT) \
+				   || TARGET_NEON || TARGET_REALLY_IWMMXT)
+#define TARGET_NEON_MVE_HFP	((TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT) \
+				 || TARGET_NEON_FP16INST)
+
 /* MVE have few common instructions as VFP, like VLDM alias VPOP, VLDR, VSTM
    alia VPUSH, VSTR and VMOV, VMSR and VMRS.  In the same manner it updates few
    registers such as FPCAR, FPCCR, FPDSCR, FPSCR, MVFR0, MVFR1 and MVFR2.  All
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0bc9eba0722..2d523908331 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -80,6 +80,22 @@
 ;; Integer and float modes supported by Neon and IWMMXT but not MVE.
 (define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])
 
+;; Integer and float modes supported in different conditions.
+;; The conditions are combinations of NEON, NEON_FP16, IWMMXT, MVE and
+;; MVE_FLOAT targets. The aim of the iterator is to support various targets
+;; and modes in a single expander in vec-common.h.
+(define_mode_iterator VNIM_COND [(V16QI "TARGET_NEON_IWMMXT_MVE")
+				 (V8HI  "TARGET_NEON_IWMMXT_MVE")
+				 (V4SI  "TARGET_NEON_IWMMXT_MVE")
+				 (V4SF  "TARGET_NEON_IWMMXT_MVE_FP")
+				 (V8HF  "TARGET_NEON_MVE_HFP")
+				 (V4HF  "TARGET_NEON_FP16INST")
+				 (V2SI  "TARGET_NEON_IWMMXT")
+				 (V4HI  "TARGET_NEON_IWMMXT")
+				 (V8QI  "TARGET_NEON_IWMMXT")
+				 (V2SF  "TARGET_NEON_IWMMXT")
+				 (V2DI  "TARGET_NEON_IWMMXT")])
+
 ;; Integer and float modes supported by Neon and IWMMXT, except V2DI.
 (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 465b39a51b3..21de5b98a52 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2574,6 +2574,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vsubq<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(minus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		     (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vabdq_f])
 ;;
@@ -3480,9 +3491,8 @@
 (define_insn "mve_vsubq_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VSUBQ_F))
+	(minus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		     (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vsub.f%#<V_sz_elem>\t%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 3e7b51d8ab6..ec933b5711e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -552,6 +552,10 @@
     (const_string "neon_add<q>")))]
 )
 
+;; These insns implement the patterns defined by the expander sub<mode>3
+;; in vec-common.md file. For NEON fp16 extension, the pattern is only valid
+;; when flag-unsafe-math-optimizations is enabled.
+
 (define_insn "*sub<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
@@ -564,17 +568,6 @@
                     (const_string "neon_sub<q>")))]
 )
 
-(define_insn "sub<mode>3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (minus:VH
-    (match_operand:VH 1 "s_register_operand" "w")
-    (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_sub<q>")]
-)
-
 (define_insn "sub<mode>3_fp16"
  [(set
    (match_operand:VH 0 "s_register_operand" "=w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index b7e3619caf4..b72b98989d3 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -120,15 +120,21 @@
 })
 
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
-;; patterns separately for IWMMXT and Neon.
+;; patterns separately for MVE, IWMMXT and Neon.
 
 (define_expand "sub<mode>3"
-  [(set (match_operand:VALL 0 "s_register_operand")
-        (minus:VALL (match_operand:VALL 1 "s_register_operand")
-                    (match_operand:VALL 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
+  [(set (match_operand:VNIM_COND 0 "s_register_operand")
+	(minus:VNIM_COND (match_operand:VNIM_COND 1 "s_register_operand")
+			 (match_operand:VNIM_COND 2 "s_register_operand")))]
+  "((TARGET_NEON && !TARGET_NEON_FP16INST)
+    && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
+	|| flag_unsafe_math_optimizations))
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE(<MODE>mode))
+   || (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT
+       && VALID_MVE_SF_MODE(<MODE>mode)
+       && flag_unsafe_math_optimizations)
+   || (TARGET_NEON_FP16INST && flag_unsafe_math_optimizations)"
 {
 })
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
index f59f69734ed..2398d894861 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f16 (r7, vreinterpretq_f16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
index dac47c7e924..5a58dc6eb4c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f32 (r7, vreinterpretq_f32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
index edc2f2f3bc6..9ab05e95420 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s16 (r7, vreinterpretq_s16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
index 880de06a781..fbfff1fc1bb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s32 (r7, vreinterpretq_s32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
index b0e81542956..beb6b927deb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -42,4 +42,4 @@ foo1 (mve_pred16_t __p)
   return vpselq_s64 (r7, vreinterpretq_s64 (value9), __p);
 }
 
-/* { dg-final { scan-assembler-times "vpsel" 8 } } */
+/* { dg-final { scan-assembler-times "vpsel" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
index a5ceebb10b9..727d89b63ee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s8 (r7, vreinterpretq_s8 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i8" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i8" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
index cd31c23500a..600f6d72a96 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u16 (r7, vreinterpretq_u16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
index faa66c9e1cc..d536ae825de 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u32 (r7, vreinterpretq_u32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
index 853b28a2aac..abc43612b91 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -42,4 +42,4 @@ foo1 (mve_pred16_t __p)
   return vpselq_u64 (r7, vreinterpretq_u64 (value9), __p);
 }
 
-/* { dg-final { scan-assembler-times "vpsel" 8 } } */
+/* { dg-final { scan-assembler-times "vpsel" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
index bdf8cd588e1..c138e5b3668 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u8 (r7, vreinterpretq_u8 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i8" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i8" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/mve.exp b/gcc/testsuite/gcc.target/arm/mve/mve.exp
index e84cb068940..4a651438eaa 100644
--- a/gcc/testsuite/gcc.target/arm/mve/mve.exp
+++ b/gcc/testsuite/gcc.target/arm/mve/mve.exp
@@ -43,6 +43,8 @@ dg-init
 # Main loop.
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/intrinsics/*.\[cCS\]]] \
 	"" $DEFAULT_CFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vect/*.\[cCS\]]] \
+	"" $DEFAULT_CFLAGS
 
 # All done.
 set dg_runtest_extra_prunes ""
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_0.c b/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_0.c
new file mode 100644
index 00000000000..68af9f0c316
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_0.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_i16 (int16_t * dest, int16_t * a, int16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_i8 (int8_t * dest, int8_t * a, int8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_1.c b/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_1.c
new file mode 100644
index 00000000000..3dbdf243b30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/vect/vect_sub_1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+void test_vsub_f32 (float * dest, float * a, float * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-04-07 14:07         ` Kyrylo Tkachov
  2020-04-08 15:25           ` Dennis Zhang
  2020-08-17 18:41           ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
@ 2020-09-16 16:00           ` Dennis Zhang
  2020-10-06 13:37             ` Ping: " Dennis Zhang
  2 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-09-16 16:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, nd, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]

Hi all,

This patch enables SIMD modes for MVE auto-vectorization.
In this patch, the integer and float MVE SIMD modes are returned by 
arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when 
MVE or MVE_FLOAT is enabled.
Then the expanders for auto-vectorization can be used for generating MVE 
SIMD code.

This patch also fixes bugs in MVE vreiterpretq_*.c tests which are 
revealed by the enabled MVE SIMD modes.
The tests are for checking the MVE reinterpret intrinsics.
There are two functions in each of the tests. The two functions contain 
the pattern of identical code so that they are folded in icf pass.
Because of icf, the instruction count only checks one function which is 8.
However when the SIMD modes are enabled, the estimation of the code size 
becomes smaller so that inlining is applied after icf, then the 
instruction count becomes 16 which causes failure of the tests.
Because the icf is not the expected pattern to be tested but causes 
above issues, -fno-ipa-icf is applied to the tests to avoid unstable 
instruction count.

This patch is separated from 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html 
because this part is not strongly connected to the aim of that one so 
that causing confusion.

Regtested and bootstraped.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.

gcc/testsuite/ChangeLog:

2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
	option -fno-ipa-icf and change the instruction count from 8 to 16.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mve-mode-20200915.patch --]
[-- Type: text/x-patch; name="mve-mode-20200915.patch", Size: 8618 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dd78141519e..c50d5aca6a9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28964,6 +28964,30 @@ arm_preferred_simd_mode (scalar_mode mode)
       default:;
       }
 
+  if (TARGET_HAVE_MVE)
+    switch (mode)
+      {
+      case QImode:
+	return V16QImode;
+      case HImode:
+	return V8HImode;
+      case SImode:
+	return V4SImode;
+
+      default:;
+      }
+
+  if (TARGET_HAVE_MVE_FLOAT)
+    switch (mode)
+      {
+      case HFmode:
+	return V8HFmode;
+      case SFmode:
+	return V4SFmode;
+
+      default:;
+      }
+
   return word_mode;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
index f59f69734ed..2398d894861 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f16 (r7, vreinterpretq_f16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
index dac47c7e924..5a58dc6eb4c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_f32 (r7, vreinterpretq_f32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.f32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.f32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
index edc2f2f3bc6..9ab05e95420 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s16 (r7, vreinterpretq_s16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
index 880de06a781..fbfff1fc1bb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s32 (r7, vreinterpretq_s32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
index b0e81542956..beb6b927deb 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -42,4 +42,4 @@ foo1 (mve_pred16_t __p)
   return vpselq_s64 (r7, vreinterpretq_s64 (value9), __p);
 }
 
-/* { dg-final { scan-assembler-times "vpsel" 8 } } */
+/* { dg-final { scan-assembler-times "vpsel" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
index a5ceebb10b9..727d89b63ee 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_s8 (r7, vreinterpretq_s8 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i8" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i8" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
index cd31c23500a..600f6d72a96 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int8x16_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u16 (r7, vreinterpretq_u16 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i16" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i16" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
index faa66c9e1cc..d536ae825de 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u32 (r7, vreinterpretq_u32 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i32" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i32" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
index 853b28a2aac..abc43612b91 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -42,4 +42,4 @@ foo1 (mve_pred16_t __p)
   return vpselq_u64 (r7, vreinterpretq_u64 (value9), __p);
 }
 
-/* { dg-final { scan-assembler-times "vpsel" 8 } } */
+/* { dg-final { scan-assembler-times "vpsel" 16 } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
index bdf8cd588e1..c138e5b3668 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-O2 -fno-ipa-icf" } */
 
 #include "arm_mve.h"
 int16x8_t value1;
@@ -41,4 +41,4 @@ foo1 ()
   return vaddq_u8 (r7, vreinterpretq_u8 (value9));
 }
 
-/* { dg-final { scan-assembler-times "vadd.i8" 8 } } */
+/* { dg-final { scan-assembler-times "vadd.i8" 16 } } */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-09-16 16:00           ` [PATCH][Arm] Enable MVE SIMD modes for vectorization Dennis Zhang
@ 2020-10-06 13:37             ` Dennis Zhang
  2020-10-06 13:43               ` Kyrylo Tkachov
  2020-10-08 13:14               ` Christophe Lyon
  0 siblings, 2 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-10-06 13:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, nd, Richard Earnshaw, Ramana Radhakrishnan

On 9/16/20 4:00 PM, Dennis Zhang wrote:
> Hi all,
> 
> This patch enables SIMD modes for MVE auto-vectorization.
> In this patch, the integer and float MVE SIMD modes are returned by
> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> MVE or MVE_FLOAT is enabled.
> Then the expanders for auto-vectorization can be used for generating MVE
> SIMD code.
> 
> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> revealed by the enabled MVE SIMD modes.
> The tests are for checking the MVE reinterpret intrinsics.
> There are two functions in each of the tests. The two functions contain
> the pattern of identical code so that they are folded in icf pass.
> Because of icf, the instruction count only checks one function which is 8.
> However when the SIMD modes are enabled, the estimation of the code size
> becomes smaller so that inlining is applied after icf, then the
> instruction count becomes 16 which causes failure of the tests.
> Because the icf is not the expected pattern to be tested but causes
> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> instruction count.
> 
> This patch is separated from
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> because this part is not strongly connected to the aim of that one so
> that causing confusion.
> 
> Regtested and bootstraped.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> 
> 	* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> 
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> 	option -fno-ipa-icf and change the instruction count from 8 to 16.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> 

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-10-06 13:37             ` Ping: " Dennis Zhang
@ 2020-10-06 13:43               ` Kyrylo Tkachov
  2020-10-08 13:14               ` Christophe Lyon
  1 sibling, 0 replies; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-10-06 13:43 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Dennis,

> -----Original Message-----
> From: Dennis Zhang <Dennis.Zhang@arm.com>
> Sent: 06 October 2020 14:37
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>
> Subject: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
> 
> On 9/16/20 4:00 PM, Dennis Zhang wrote:
> > Hi all,
> >
> > This patch enables SIMD modes for MVE auto-vectorization.
> > In this patch, the integer and float MVE SIMD modes are returned by
> > arm_preferred_simd_mode
> (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> > MVE or MVE_FLOAT is enabled.
> > Then the expanders for auto-vectorization can be used for generating MVE
> > SIMD code.
> >
> > This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> > revealed by the enabled MVE SIMD modes.
> > The tests are for checking the MVE reinterpret intrinsics.
> > There are two functions in each of the tests. The two functions contain
> > the pattern of identical code so that they are folded in icf pass.
> > Because of icf, the instruction count only checks one function which is 8.
> > However when the SIMD modes are enabled, the estimation of the code
> size
> > becomes smaller so that inlining is applied after icf, then the
> > instruction count becomes 16 which causes failure of the tests.
> > Because the icf is not the expected pattern to be tested but causes
> > above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> > instruction count.
> >
> > This patch is separated from
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > because this part is not strongly connected to the aim of that one so
> > that causing confusion.
> >
> > Regtested and bootstraped.
> >
> > Is it OK for trunk please?

Ok.
Sorry for the delay.
Kyrill

> >
> > Thanks
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD
> modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> > option -fno-ipa-icf and change the instruction count from 8 to 16.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> >
> 
> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> September/554100.html


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-08-17 18:41           ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
  2020-08-21 22:33             ` Ramana Radhakrishnan
@ 2020-10-06 16:46             ` Dennis Zhang
  2020-10-22  0:42               ` Ping: " Dennis Zhang
  2020-10-22  8:40               ` Kyrylo Tkachov
  2020-10-06 16:54             ` [PATCH][Arm] Auto-vectorization for MVE: vmul Dennis Zhang
  2020-10-06 16:59             ` [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax Dennis Zhang
  3 siblings, 2 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-10-06 16:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, nd, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 4776 bytes --]

Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
> 
> Hi all,
> 
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
> 
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
> 
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
> 
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?
> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
> 
> 	* config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
> 	* config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> 	(TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
> 	(TARGET_NEON_MVE_HFP): Likewise.
> 	* config/arm/iterators.md (VSEL): New mode iterator to select modes
> 	for corresponding targets.
> 	* config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
> 	using expression 'minus'.
> 	(mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
> 	* config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
> 	sub<mode>3 in vec-common.md
> 	* config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use VSEL
> 	to select available modes. Exclude TARGET_NEON_FP16INST from
> 	TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> 	originally in neon.md.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
> 
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> 	option -fno-ipa-icf and change the instruction count from 8 to 16.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> 	* gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> 	* gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> 	* gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> 	* gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> 

This patch is updated based on Richard Sandiford's patch adding new 
vector mode macros: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at 
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another 
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec 
expression to make the instructions recognizable for auto-vectorization.
The sub<mode>3 in mve.md is modified to use new mode macros which make 
the expander available when certain modes are supported. Then various 
targets can share this expander for vectorization. The redundant 
sub<mode>3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
	using expression 'minus'.
	(mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
	* config/arm/neon.md (*sub<mode>3_neon): Use the new mode macros
	ARM_HAVE_<MODE>_ARITH.
	(sub<mode>3, sub<mode>3_fp16): Removed.
	(neon_vsub<mode>): Use gen_sub<mode>3 instead of gen_sub<mode>3_fp16.
	* config/arm/vec-common.md (sub<mode>3): Use the new mode macros
	ARM_HAVE_<MODE>_ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/simd/mve-vsub_1.c: New test.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mve-vect-sub-20201002.patch --]
[-- Type: text/x-patch; name="mve-vect-sub-20201002.patch", Size: 5891 bytes --]

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..7853b642262 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2574,6 +2574,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vsubq<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(minus:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		     (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vsub.i%#<V_sz_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vabdq_f])
 ;;
@@ -3480,9 +3491,8 @@
 (define_insn "mve_vsubq_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VSUBQ_F))
+	(minus:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		     (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vsub.f%#<V_sz_elem>\t%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 96bf277f501..9799c130875 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -513,7 +513,7 @@
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (minus:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
                    (match_operand:VDQ 2 "s_register_operand" "w")))]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "ARM_HAVE_NEON_<MODE>_ARITH"
   "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
   [(set (attr "type")
       (if_then_else (match_test "<Is_float_mode>")
@@ -521,28 +521,6 @@
                     (const_string "neon_sub<q>")))]
 )
 
-(define_insn "sub<mode>3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (minus:VH
-    (match_operand:VH 1 "s_register_operand" "w")
-    (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
- "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_sub<q>")]
-)
-
-(define_insn "sub<mode>3_fp16"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (minus:VH
-    (match_operand:VH 1 "s_register_operand" "w")
-    (match_operand:VH 2 "s_register_operand" "w")))]
- "TARGET_NEON_FP16INST"
- "vsub.<V_if_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_sub<q>")]
-)
-
 (define_insn "*mul<mode>3_neon"
   [(set (match_operand:VDQW 0 "s_register_operand" "=w")
         (mult:VDQW (match_operand:VDQW 1 "s_register_operand" "w")
@@ -1804,7 +1782,7 @@
    (match_operand:VH 2 "s_register_operand")]
   "TARGET_NEON_FP16INST"
 {
-  emit_insn (gen_sub<mode>3_fp16 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_sub<mode>3 (operands[0], operands[1], operands[2]));
   DONE;
 })
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..5f5668bcf9b 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -87,18 +87,12 @@
   "ARM_HAVE_<MODE>_ARITH"
 )
 
-;; Vector arithmetic. Expanders are blank, then unnamed insns implement
-;; patterns separately for IWMMXT and Neon.
-
 (define_expand "sub<mode>3"
-  [(set (match_operand:VALL 0 "s_register_operand")
-        (minus:VALL (match_operand:VALL 1 "s_register_operand")
-                    (match_operand:VALL 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+  [(set (match_operand:VDQ 0 "s_register_operand")
+        (minus:VDQ (match_operand:VDQ 1 "s_register_operand")
+                   (match_operand:VDQ 2 "s_register_operand")))]
+  "ARM_HAVE_<MODE>_ARITH"
+)
 
 (define_expand "mul<mode>3"
   [(set (match_operand:VALLW 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
new file mode 100644
index 00000000000..cb3ef3a14e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+void test_vsub_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_i16 (int16_t * dest, int16_t * a, int16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_i8 (int8_t * dest, int8_t * a, int8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+void test_vsub_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vsub_f32 (float * dest, float * a, float * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] - b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vsub\.f32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH][Arm] Auto-vectorization for MVE: vmul
  2020-08-17 18:41           ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
  2020-08-21 22:33             ` Ramana Radhakrishnan
  2020-10-06 16:46             ` Dennis Zhang
@ 2020-10-06 16:54             ` Dennis Zhang
  2020-10-14  9:14               ` Kyrylo Tkachov
  2020-10-06 16:59             ` [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax Dennis Zhang
  3 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-10-06 16:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, nd, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]

Hi all,

This patch enables MVE vmul instructions for auto-vectorization.
It includes MVE in expander mul<mode>3 to enable vectorization for MVE 
and modifies related vmul insns to support the expander by using 'mult' 
instead of unspec.
The mul<mode>3 for vectorization in vec-common.md uses mode iterator 
VDQWH instead of VALLW to cover all supported modes.
The macros ARM_HAVE_<MODE>_ARITH are used to select supported modes for 
different targets. The redundant mul<mode>3 in neon.md is removed.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/mve.md (mve_vmulq<mode>): New entry for vmul instruction
	using expression 'mult'.
	(mve_vmulq_f<mode>): Use mult instead of VMULQ_F.
	* config/arm/neon.md (mul<mode>3): Removed.
	* config/arm/vec-common.md (mul<mode>3): Use the new mode macros
	ARM_HAVE_<MODE>_ARITH. Use mode iterator VDQWH instead of VALLW.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/simd/mve-vmul_1.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mve-vect-mul-20201002.patch --]
[-- Type: text/x-patch; name="mve-vect-mul-20201002.patch", Size: 4528 bytes --]

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..5b2b609174c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2199,6 +2199,17 @@
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "mve_vmulq<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(mult:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmul.i%#<V_sz_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vornq_u, vornq_s])
 ;;
@@ -3210,9 +3221,8 @@
 (define_insn "mve_vmulq_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMULQ_F))
+	(mult:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		    (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmul.f%#<V_sz_elem>	%q0, %q1, %q2"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 96bf277f501..f6632f1a25a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1899,17 +1899,6 @@
                     (const_string "neon_mul_<V_elem_ch><q>")))]
 )
 
-(define_insn "mul<mode>3"
- [(set
-   (match_operand:VH 0 "s_register_operand" "=w")
-   (mult:VH
-    (match_operand:VH 1 "s_register_operand" "w")
-    (match_operand:VH 2 "s_register_operand" "w")))]
-  "TARGET_NEON_FP16INST && flag_unsafe_math_optimizations"
-  "vmul.f16\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
- [(set_attr "type" "neon_mul_<VH_elem_ch><q>")]
-)
-
 (define_insn "neon_vmulf<mode>"
  [(set
    (match_operand:VH 0 "s_register_operand" "=w")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..45db60e7411 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -101,14 +101,11 @@
 })
 
 (define_expand "mul<mode>3"
-  [(set (match_operand:VALLW 0 "s_register_operand")
-        (mult:VALLW (match_operand:VALLW 1 "s_register_operand")
-		    (match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (<MODE>mode == V4HImode && TARGET_REALLY_IWMMXT)"
-{
-})
+  [(set (match_operand:VDQWH 0 "s_register_operand")
+	(mult:VDQWH (match_operand:VDQWH 1 "s_register_operand")
+		    (match_operand:VDQWH 2 "s_register_operand")))]
+  "ARM_HAVE_<MODE>_ARITH"
+)
 
 (define_expand "smin<mode>3"
   [(set (match_operand:VALLW 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
new file mode 100644
index 00000000000..514f292c15e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmul_1.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+void test_vmul_i32 (int32_t * dest, int32_t * a, int32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i32_u (uint32_t * dest, uint32_t * a, uint32_t * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i32\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i16 (int16_t * dest, int16_t * a, int16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i16_u (uint16_t * dest, uint16_t * a, uint16_t * b) {
+  int i;
+  for (i=0; i<8; i++) {
+    dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i16\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_i8 (int8_t * dest, int8_t * a, int8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] * b[i];
+  }
+}
+
+void test_vmul_i8_u (uint8_t * dest, uint8_t * a, uint8_t * b) {
+  int i;
+  for (i=0; i<16; i++) {
+    dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.i8\tq[0-9]+, q[0-9]+, q[0-9]+} 2 } } */
+
+void test_vmul_f32 (float * dest, float * a, float * b) {
+  int i;
+  for (i=0; i<4; i++) {
+    dest[i] = a[i] * b[i];
+  }
+}
+
+/* { dg-final { scan-assembler-times {vmul\.f32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } } */
+

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
  2020-08-17 18:41           ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
                               ` (2 preceding siblings ...)
  2020-10-06 16:54             ` [PATCH][Arm] Auto-vectorization for MVE: vmul Dennis Zhang
@ 2020-10-06 16:59             ` Dennis Zhang
  2020-10-14  9:15               ` Kyrylo Tkachov
  3 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-10-06 16:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, nd, Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]

Hi all,

This patch enables MVE vmin/vmax instructions for auto-vectorization.
MVE target is included in expander smin<mode>3, umin<mode>3, smax<mode>3 
and umax<mode>3 for vectorization.
Related insns for vmin/vmax in mve.md are modified to use smin, umin, 
smax and umax expressions instead of unspec to support the expanders.

Regression tested on arm-none-eabi and bootstraped on 
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>

	* config/arm/mve.md (mve_vmaxq_<supf><mode>): Replace with ...
	(mve_vmaxq_s<mode>, mve_vmaxq_u<mode>): ... these new insns to
	use smax/umax instead of VMAXQ.
	(mve_vminq_<supf><mode>): Replace with ...
	(mve_vminq_s<mode>, mve_vminq_u<mode>): ... these new insns to
	use smin/umin instead of VMINQ.
	(mve_vmaxnmq_f<mode>): Use smax instead of VMAXNMQ_F.
	(mve_vminnmq_f<mode>): Use smin instead of VMINNMQ_F.
	* config/arm/vec-common.md (smin<mode>3): Use the new mode macros
	ARM_HAVE_<MODE>_ARITH.
	(umin<mode>3, smax<mode>3, umax<mode>3): Likewise.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>

	* gcc.target/arm/simd/mve-vminmax_1.c: New test.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mve-vect-vminmax-20201001.patch --]
[-- Type: text/x-patch; name="mve-vect-vminmax-20201001.patch", Size: 7117 bytes --]

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3a57901bd5b..0d9f932e983 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1977,15 +1977,25 @@
 ;;
 ;; [vmaxq_u, vmaxq_s])
 ;;
-(define_insn "mve_vmaxq_<supf><mode>"
+(define_insn "mve_vmaxq_s<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMAXQ))
+	(smax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmax.%#<V_s_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vmaxq_u<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umax:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmax.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
+  "vmax.%#<V_u_elem>\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2037,15 +2047,25 @@
 ;;
 ;; [vminq_s, vminq_u])
 ;;
-(define_insn "mve_vminq_<supf><mode>"
+(define_insn "mve_vminq_s<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VMINQ))
+	(smin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmin.<supf>%#<V_sz_elem>\t%q0, %q1, %q2"
+  "vmin.%#<V_s_elem>\t%q0, %q1, %q2"
+  [(set_attr "type" "mve_move")
+])
+
+(define_insn "mve_vminq_u<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(umin:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		    (match_operand:MVE_2 2 "s_register_operand" "w")))
+  ]
+  "TARGET_HAVE_MVE"
+  "vmin.%#<V_u_elem>\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -3030,9 +3050,8 @@
 (define_insn "mve_vmaxnmq_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMAXNMQ_F))
+	(smax:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		    (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vmaxnm.f%#<V_sz_elem>	%q0, %q1, %q2"
@@ -3090,9 +3109,8 @@
 (define_insn "mve_vminnmq_f<mode>"
   [
    (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-	(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")
-		       (match_operand:MVE_0 2 "s_register_operand" "w")]
-	 VMINNMQ_F))
+	(smin:MVE_0 (match_operand:MVE_0 1 "s_register_operand" "w")
+		    (match_operand:MVE_0 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
   "vminnm.f%#<V_sz_elem>	%q0, %q1, %q2"
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index c3c86c46355..6a330cc82f6 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -114,39 +114,29 @@
   [(set (match_operand:VALLW 0 "s_register_operand")
 	(smin:VALLW (match_operand:VALLW 1 "s_register_operand")
 		    (match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+   "ARM_HAVE_<MODE>_ARITH"
+)
 
 (define_expand "umin<mode>3"
   [(set (match_operand:VINTW 0 "s_register_operand")
 	(umin:VINTW (match_operand:VINTW 1 "s_register_operand")
 		    (match_operand:VINTW 2 "s_register_operand")))]
-  "TARGET_NEON
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+   "ARM_HAVE_<MODE>_ARITH"
+)
 
 (define_expand "smax<mode>3"
   [(set (match_operand:VALLW 0 "s_register_operand")
 	(smax:VALLW (match_operand:VALLW 1 "s_register_operand")
 		    (match_operand:VALLW 2 "s_register_operand")))]
-  "(TARGET_NEON && ((<MODE>mode != V2SFmode && <MODE>mode != V4SFmode)
-		    || flag_unsafe_math_optimizations))
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+   "ARM_HAVE_<MODE>_ARITH"
+)
 
 (define_expand "umax<mode>3"
   [(set (match_operand:VINTW 0 "s_register_operand")
 	(umax:VINTW (match_operand:VINTW 1 "s_register_operand")
 		    (match_operand:VINTW 2 "s_register_operand")))]
-  "TARGET_NEON
-   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
-{
-})
+   "ARM_HAVE_<MODE>_ARITH"
+)
 
 (define_expand "vec_perm<mode>"
   [(match_operand:VE 0 "s_register_operand")
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vminmax_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vminmax_1.c
new file mode 100644
index 00000000000..6c8e7d42906
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vminmax_1.c
@@ -0,0 +1,61 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_1m_mve_ok }  */
+/* { dg-options "-O3" }  */
+/* { dg-add-options arm_v8_1m_mve }  */
+
+#include <stdint.h>
+
+#define MAX(a, b) ((a) > (b)) ? (a) : (b)
+#define MIN(a, b) ((a) < (b)) ? (a) : (b)
+
+
+#define TEST_BINOP(OP, TY, N)		\
+  TY test_##OP##_##TY (TY * dest, TY * a, TY * b)	\
+  {							\
+    int i;						\
+    for (i=0; i<N; i++)					\
+    {							\
+      dest[i] = OP (a[i], b[i]);			\
+    }							\
+  }
+
+/* Test vmax.  */
+
+TEST_BINOP (MAX, int32_t, 4)
+/* { dg-final { scan-assembler-times {vmax\.s32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MAX, uint32_t, 4)
+/* { dg-final { scan-assembler-times {vmax\.u32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MAX, int16_t, 8)
+/* { dg-final { scan-assembler-times {vmax\.s16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MAX, uint16_t, 8)
+/* { dg-final { scan-assembler-times {vmax\.u16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MAX, int8_t, 16)
+/* { dg-final { scan-assembler-times {vmax\.s8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MAX, uint8_t, 16)
+/* { dg-final { scan-assembler-times {vmax\.u8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+/* Test vmin.  */
+
+TEST_BINOP (MIN, int32_t, 4)
+/* { dg-final { scan-assembler-times {vmin\.s32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MIN, uint32_t, 4)
+/* { dg-final { scan-assembler-times {vmin\.u32\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MIN, int16_t, 8)
+/* { dg-final { scan-assembler-times {vmin\.s16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MIN, uint16_t, 8)
+/* { dg-final { scan-assembler-times {vmin\.u16\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MIN, int8_t, 16)
+/* { dg-final { scan-assembler-times {vmin\.s8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+
+TEST_BINOP (MIN, uint8_t, 16)
+/* { dg-final { scan-assembler-times {vmin\.u8\tq[0-9]+, q[0-9]+, q[0-9]+} 1 } }  */
+

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-10-06 13:37             ` Ping: " Dennis Zhang
  2020-10-06 13:43               ` Kyrylo Tkachov
@ 2020-10-08 13:14               ` Christophe Lyon
  2020-10-08 14:06                 ` Dennis Zhang
  1 sibling, 1 reply; 41+ messages in thread
From: Christophe Lyon @ 2020-10-08 13:14 UTC (permalink / raw)
  To: Dennis Zhang; +Cc: gcc-patches, nd, Ramana Radhakrishnan, Richard Earnshaw

Hi,


On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On 9/16/20 4:00 PM, Dennis Zhang wrote:
> > Hi all,
> >
> > This patch enables SIMD modes for MVE auto-vectorization.
> > In this patch, the integer and float MVE SIMD modes are returned by
> > arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> > MVE or MVE_FLOAT is enabled.
> > Then the expanders for auto-vectorization can be used for generating MVE
> > SIMD code.
> >
> > This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> > revealed by the enabled MVE SIMD modes.
> > The tests are for checking the MVE reinterpret intrinsics.
> > There are two functions in each of the tests. The two functions contain
> > the pattern of identical code so that they are folded in icf pass.
> > Because of icf, the instruction count only checks one function which is 8.
> > However when the SIMD modes are enabled, the estimation of the code size
> > becomes smaller so that inlining is applied after icf, then the
> > instruction count becomes 16 which causes failure of the tests.
> > Because the icf is not the expected pattern to be tested but causes
> > above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> > instruction count.
> >
> > This patch is separated from
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > because this part is not strongly connected to the aim of that one so
> > that causing confusion.
> >
> > Regtested and bootstraped.
> >
> > Is it OK for trunk please?
> >
> > Thanks
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> >
> >       * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
> >

Since toolchain builds work again after Jakub's divmod fix, I'm now
facing another build error likely caused by this patch:
In file included from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
                 from
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
./insn-modes.h:196:71: error: temporary of non-literal type
'scalar_int_mode' in a constant expression
 #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
                                                                       ^
/tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
note: in expansion of macro 'QImode'
       case QImode:

and similarly for the other cases.

Does the build work for you?

Thanks,

Christophe

> > gcc/testsuite/ChangeLog:
> >
> > 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> >
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> >       option -fno-ipa-icf and change the instruction count from 8 to 16.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> >       * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> >
>
> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-10-08 13:14               ` Christophe Lyon
@ 2020-10-08 14:06                 ` Dennis Zhang
  2020-10-08 14:22                   ` Christophe Lyon
  0 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-10-08 14:06 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches, nd, Ramana Radhakrishnan, Richard Earnshaw

Hi Christophe,

On 08/10/2020 14:14, Christophe Lyon wrote:
> Hi,
> 
> 
> On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> On 9/16/20 4:00 PM, Dennis Zhang wrote:
>>> Hi all,
>>>
>>> This patch enables SIMD modes for MVE auto-vectorization.
>>> In this patch, the integer and float MVE SIMD modes are returned by
>>> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
>>> MVE or MVE_FLOAT is enabled.
>>> Then the expanders for auto-vectorization can be used for generating MVE
>>> SIMD code.
>>>
>>> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
>>> revealed by the enabled MVE SIMD modes.
>>> The tests are for checking the MVE reinterpret intrinsics.
>>> There are two functions in each of the tests. The two functions contain
>>> the pattern of identical code so that they are folded in icf pass.
>>> Because of icf, the instruction count only checks one function which is 8.
>>> However when the SIMD modes are enabled, the estimation of the code size
>>> becomes smaller so that inlining is applied after icf, then the
>>> instruction count becomes 16 which causes failure of the tests.
>>> Because the icf is not the expected pattern to be tested but causes
>>> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
>>> instruction count.
>>>
>>> This patch is separated from
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
>>> because this part is not strongly connected to the aim of that one so
>>> that causing confusion.
>>>
>>> Regtested and bootstraped.
>>>
>>> Is it OK for trunk please?
>>>
>>> Thanks
>>> Dennis
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
>>>
>>>        * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
>>>
> 
> Since toolchain builds work again after Jakub's divmod fix, I'm now
> facing another build error likely caused by this patch:
> In file included from
> /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
>                   from
> /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
> /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
> In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
> ./insn-modes.h:196:71: error: temporary of non-literal type
> 'scalar_int_mode' in a constant expression
>   #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
>                                                                         ^
> /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
> note: in expansion of macro 'QImode'
>         case QImode:
> 
> and similarly for the other cases.
> 
> Does the build work for you?
> 
> Thanks,
> 
> Christophe
> 

Thanks for the report. Sorry to see the error.
I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I 
didn't get this error.
Could you please help to show the configuration you use for your build?
I will test and fix at once.

Thanks
Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-10-08 14:06                 ` Dennis Zhang
@ 2020-10-08 14:22                   ` Christophe Lyon
  2020-10-12 11:40                     ` Christophe Lyon
  0 siblings, 1 reply; 41+ messages in thread
From: Christophe Lyon @ 2020-10-08 14:22 UTC (permalink / raw)
  To: Dennis Zhang; +Cc: gcc-patches, nd, Ramana Radhakrishnan, Richard Earnshaw

On Thu, 8 Oct 2020 at 16:08, Dennis Zhang <dennis.zhang@arm.com> wrote:
>
> Hi Christophe,
>
> On 08/10/2020 14:14, Christophe Lyon wrote:
> > Hi,
> >
> >
> > On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> On 9/16/20 4:00 PM, Dennis Zhang wrote:
> >>> Hi all,
> >>>
> >>> This patch enables SIMD modes for MVE auto-vectorization.
> >>> In this patch, the integer and float MVE SIMD modes are returned by
> >>> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> >>> MVE or MVE_FLOAT is enabled.
> >>> Then the expanders for auto-vectorization can be used for generating MVE
> >>> SIMD code.
> >>>
> >>> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> >>> revealed by the enabled MVE SIMD modes.
> >>> The tests are for checking the MVE reinterpret intrinsics.
> >>> There are two functions in each of the tests. The two functions contain
> >>> the pattern of identical code so that they are folded in icf pass.
> >>> Because of icf, the instruction count only checks one function which is 8.
> >>> However when the SIMD modes are enabled, the estimation of the code size
> >>> becomes smaller so that inlining is applied after icf, then the
> >>> instruction count becomes 16 which causes failure of the tests.
> >>> Because the icf is not the expected pattern to be tested but causes
> >>> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> >>> instruction count.
> >>>
> >>> This patch is separated from
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> >>> because this part is not strongly connected to the aim of that one so
> >>> that causing confusion.
> >>>
> >>> Regtested and bootstraped.
> >>>
> >>> Is it OK for trunk please?
> >>>
> >>> Thanks
> >>> Dennis
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> >>>
> >>>        * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
> >>>
> >
> > Since toolchain builds work again after Jakub's divmod fix, I'm now
> > facing another build error likely caused by this patch:
> > In file included from
> > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
> >                   from
> > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
> > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
> > In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
> > ./insn-modes.h:196:71: error: temporary of non-literal type
> > 'scalar_int_mode' in a constant expression
> >   #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
> >                                                                         ^
> > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
> > note: in expansion of macro 'QImode'
> >         case QImode:
> >
> > and similarly for the other cases.
> >
> > Does the build work for you?
> >
> > Thanks,
> >
> > Christophe
> >
>
> Thanks for the report. Sorry to see the error.
> I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I
> didn't get this error.
> Could you please help to show the configuration you use for your build?
> I will test and fix at once.
>

It fails on all of them for me. Does it work for you with current
master? (r11-3720-gf18eeb6b958acd5e1590ca4a73231486b749be9b)


> Thanks
> Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-10-08 14:22                   ` Christophe Lyon
@ 2020-10-12 11:40                     ` Christophe Lyon
  2020-10-12 13:22                       ` Kyrylo Tkachov
  2020-10-12 15:39                       ` Dennis Zhang
  0 siblings, 2 replies; 41+ messages in thread
From: Christophe Lyon @ 2020-10-12 11:40 UTC (permalink / raw)
  To: Dennis Zhang; +Cc: gcc-patches, nd, Ramana Radhakrishnan, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 4213 bytes --]

Hi,


On Thu, 8 Oct 2020 at 16:22, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>
> On Thu, 8 Oct 2020 at 16:08, Dennis Zhang <dennis.zhang@arm.com> wrote:
> >
> > Hi Christophe,
> >
> > On 08/10/2020 14:14, Christophe Lyon wrote:
> > > Hi,
> > >
> > >
> > > On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >>
> > >> On 9/16/20 4:00 PM, Dennis Zhang wrote:
> > >>> Hi all,
> > >>>
> > >>> This patch enables SIMD modes for MVE auto-vectorization.
> > >>> In this patch, the integer and float MVE SIMD modes are returned by
> > >>> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> > >>> MVE or MVE_FLOAT is enabled.
> > >>> Then the expanders for auto-vectorization can be used for generating MVE
> > >>> SIMD code.
> > >>>
> > >>> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> > >>> revealed by the enabled MVE SIMD modes.
> > >>> The tests are for checking the MVE reinterpret intrinsics.
> > >>> There are two functions in each of the tests. The two functions contain
> > >>> the pattern of identical code so that they are folded in icf pass.
> > >>> Because of icf, the instruction count only checks one function which is 8.
> > >>> However when the SIMD modes are enabled, the estimation of the code size
> > >>> becomes smaller so that inlining is applied after icf, then the
> > >>> instruction count becomes 16 which causes failure of the tests.
> > >>> Because the icf is not the expected pattern to be tested but causes
> > >>> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> > >>> instruction count.
> > >>>
> > >>> This patch is separated from
> > >>> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > >>> because this part is not strongly connected to the aim of that one so
> > >>> that causing confusion.
> > >>>
> > >>> Regtested and bootstraped.
> > >>>
> > >>> Is it OK for trunk please?
> > >>>
> > >>> Thanks
> > >>> Dennis
> > >>>
> > >>> gcc/ChangeLog:
> > >>>
> > >>> 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> > >>>
> > >>>        * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
> > >>>
> > >
> > > Since toolchain builds work again after Jakub's divmod fix, I'm now
> > > facing another build error likely caused by this patch:
> > > In file included from
> > > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
> > >                   from
> > > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
> > > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
> > > In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
> > > ./insn-modes.h:196:71: error: temporary of non-literal type
> > > 'scalar_int_mode' in a constant expression
> > >   #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
> > >                                                                         ^
> > > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
> > > note: in expansion of macro 'QImode'
> > >         case QImode:
> > >
> > > and similarly for the other cases.
> > >
> > > Does the build work for you?
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> >
> > Thanks for the report. Sorry to see the error.
> > I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I
> > didn't get this error.
> > Could you please help to show the configuration you use for your build?
> > I will test and fix at once.
> >
>
> It fails on all of them for me. Does it work for you with current
> master? (r11-3720-gf18eeb6b958acd5e1590ca4a73231486b749be9b)
>

So... I guess you are using a host with GCC more recent than 4.8.5? :-)
When I build manually on ubuntu-16.04 with gcc-5.4, the build succeeds,
and after manually building with the same environment in the compute
farm I use for validation (RHEL 7, gcc-4.8.5), I managed to reproduce the
build failure.
It's a matter of replacing
case QImode:
with
case E_QImode:

Is the attached patch OK? Or do we instead want to revisit the minimum
gcc version required to build gcc?

Thanks,

Christophe


> > Thanks
> > Dennis

[-- Attachment #2: preferred-simd-mode.patch.txt --]
[-- Type: text/plain, Size: 1189 bytes --]

gcc-4.8.5 does not accept case clauses with non-literal type, which
happens for "QImode" as it expands to (scalar_int_mode
((scalar_int_mode::from_int) E_QImode)).

Use E_QImode instead in arm_preferred_simd_mode, to fix the
build. Same for HImode, SImode, HFmode and SFmode as introduced by a
recent patch.


2020-10-12  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/arm.c (arm_preferred_simd_mode): Use E_FOOmode
	instead of FOOmode.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5d9c995..0b8c5fa 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28967,11 +28967,11 @@ arm_preferred_simd_mode (scalar_mode mode)
   if (TARGET_HAVE_MVE)
     switch (mode)
       {
-      case QImode:
+      case E_QImode:
 	return V16QImode;
-      case HImode:
+      case E_HImode:
 	return V8HImode;
-      case SImode:
+      case E_SImode:
 	return V4SImode;
 
       default:;
@@ -28980,9 +28980,9 @@ arm_preferred_simd_mode (scalar_mode mode)
   if (TARGET_HAVE_MVE_FLOAT)
     switch (mode)
       {
-      case HFmode:
+      case E_HFmode:
 	return V8HFmode;
-      case SFmode:
+      case E_SFmode:
 	return V4SFmode;
 
       default:;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-10-12 11:40                     ` Christophe Lyon
@ 2020-10-12 13:22                       ` Kyrylo Tkachov
  2020-10-12 15:39                       ` Dennis Zhang
  1 sibling, 0 replies; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-10-12 13:22 UTC (permalink / raw)
  To: Christophe Lyon, Dennis Zhang
  Cc: Richard Earnshaw, nd, Ramana Radhakrishnan, gcc-patches

Hi Christophe,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of
> Christophe Lyon via Gcc-patches
> Sent: 12 October 2020 12:41
> To: Dennis Zhang <Dennis.Zhang@arm.com>
> Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>;
> gcc-patches@gcc.gnu.org; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>
> Subject: Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
> 
> Hi,
> 
> 
> On Thu, 8 Oct 2020 at 16:22, Christophe Lyon <christophe.lyon@linaro.org>
> wrote:
> >
> > On Thu, 8 Oct 2020 at 16:08, Dennis Zhang <dennis.zhang@arm.com>
> wrote:
> > >
> > > Hi Christophe,
> > >
> > > On 08/10/2020 14:14, Christophe Lyon wrote:
> > > > Hi,
> > > >
> > > >
> > > > On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > >>
> > > >> On 9/16/20 4:00 PM, Dennis Zhang wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> This patch enables SIMD modes for MVE auto-vectorization.
> > > >>> In this patch, the integer and float MVE SIMD modes are returned by
> > > >>> arm_preferred_simd_mode
> (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
> > > >>> MVE or MVE_FLOAT is enabled.
> > > >>> Then the expanders for auto-vectorization can be used for
> generating MVE
> > > >>> SIMD code.
> > > >>>
> > > >>> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
> > > >>> revealed by the enabled MVE SIMD modes.
> > > >>> The tests are for checking the MVE reinterpret intrinsics.
> > > >>> There are two functions in each of the tests. The two functions
> contain
> > > >>> the pattern of identical code so that they are folded in icf pass.
> > > >>> Because of icf, the instruction count only checks one function which
> is 8.
> > > >>> However when the SIMD modes are enabled, the estimation of the
> code size
> > > >>> becomes smaller so that inlining is applied after icf, then the
> > > >>> instruction count becomes 16 which causes failure of the tests.
> > > >>> Because the icf is not the expected pattern to be tested but causes
> > > >>> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
> > > >>> instruction count.
> > > >>>
> > > >>> This patch is separated from
> > > >>> https://gcc.gnu.org/pipermail/gcc-patches/2020-
> August/552104.html
> > > >>> because this part is not strongly connected to the aim of that one so
> > > >>> that causing confusion.
> > > >>>
> > > >>> Regtested and bootstraped.
> > > >>>
> > > >>> Is it OK for trunk please?
> > > >>>
> > > >>> Thanks
> > > >>> Dennis
> > > >>>
> > > >>> gcc/ChangeLog:
> > > >>>
> > > >>> 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
> > > >>>
> > > >>>        * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE
> SIMD modes.
> > > >>>
> > > >
> > > > Since toolchain builds work again after Jakub's divmod fix, I'm now
> > > > facing another build error likely caused by this patch:
> > > > In file included from
> > > > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-
> fsf/gccsrc/gcc/coretypes.h:449:0,
> > > >                   from
> > > > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-
> fsf/gccsrc/gcc/config/arm/arm.c:28:
> > > > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-
> fsf/gccsrc/gcc/config/arm/arm.c:
> > > > In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
> > > > ./insn-modes.h:196:71: error: temporary of non-literal type
> > > > 'scalar_int_mode' in a constant expression
> > > >   #define QImode (scalar_int_mode ((scalar_int_mode::from_int)
> E_QImode))
> > > >                                                                         ^
> > > > /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-
> fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
> > > > note: in expansion of macro 'QImode'
> > > >         case QImode:
> > > >
> > > > and similarly for the other cases.
> > > >
> > > > Does the build work for you?
> > > >
> > > > Thanks,
> > > >
> > > > Christophe
> > > >
> > >
> > > Thanks for the report. Sorry to see the error.
> > > I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I
> > > didn't get this error.
> > > Could you please help to show the configuration you use for your build?
> > > I will test and fix at once.
> > >
> >
> > It fails on all of them for me. Does it work for you with current
> > master? (r11-3720-gf18eeb6b958acd5e1590ca4a73231486b749be9b)
> >
> 
> So... I guess you are using a host with GCC more recent than 4.8.5? :-)
> When I build manually on ubuntu-16.04 with gcc-5.4, the build succeeds,
> and after manually building with the same environment in the compute
> farm I use for validation (RHEL 7, gcc-4.8.5), I managed to reproduce the
> build failure.
> It's a matter of replacing
> case QImode:
> with
> case E_QImode:
> 
> Is the attached patch OK? Or do we instead want to revisit the minimum
> gcc version required to build gcc?

I'd rather go with this patch as long as it passes the usual testing.
Thanks,
Kyrill

> 
> Thanks,
> 
> Christophe
> 
> 
> > > Thanks
> > > Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Ping: [PATCH][Arm] Enable MVE SIMD modes for vectorization
  2020-10-12 11:40                     ` Christophe Lyon
  2020-10-12 13:22                       ` Kyrylo Tkachov
@ 2020-10-12 15:39                       ` Dennis Zhang
  1 sibling, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-10-12 15:39 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches, nd, Ramana Radhakrishnan, Richard Earnshaw

Hi Christophe,

On 12/10/2020 12:40, Christophe Lyon wrote:
> Hi,
> 
> 
> On Thu, 8 Oct 2020 at 16:22, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>>
>> On Thu, 8 Oct 2020 at 16:08, Dennis Zhang <dennis.zhang@arm.com> wrote:
>>>
>>> Hi Christophe,
>>>
>>> On 08/10/2020 14:14, Christophe Lyon wrote:
>>>> Hi,
>>>>
>>>>
>>>> On Tue, 6 Oct 2020 at 15:37, Dennis Zhang via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>
>>>>> On 9/16/20 4:00 PM, Dennis Zhang wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> This patch enables SIMD modes for MVE auto-vectorization.
>>>>>> In this patch, the integer and float MVE SIMD modes are returned by
>>>>>> arm_preferred_simd_mode (TARGET_VECTORIZE_PREFERRED_SIMD_MODE hook) when
>>>>>> MVE or MVE_FLOAT is enabled.
>>>>>> Then the expanders for auto-vectorization can be used for generating MVE
>>>>>> SIMD code.
>>>>>>
>>>>>> This patch also fixes bugs in MVE vreiterpretq_*.c tests which are
>>>>>> revealed by the enabled MVE SIMD modes.
>>>>>> The tests are for checking the MVE reinterpret intrinsics.
>>>>>> There are two functions in each of the tests. The two functions contain
>>>>>> the pattern of identical code so that they are folded in icf pass.
>>>>>> Because of icf, the instruction count only checks one function which is 8.
>>>>>> However when the SIMD modes are enabled, the estimation of the code size
>>>>>> becomes smaller so that inlining is applied after icf, then the
>>>>>> instruction count becomes 16 which causes failure of the tests.
>>>>>> Because the icf is not the expected pattern to be tested but causes
>>>>>> above issues, -fno-ipa-icf is applied to the tests to avoid unstable
>>>>>> instruction count.
>>>>>>
>>>>>> This patch is separated from
>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
>>>>>> because this part is not strongly connected to the aim of that one so
>>>>>> that causing confusion.
>>>>>>
>>>>>> Regtested and bootstraped.
>>>>>>
>>>>>> Is it OK for trunk please?
>>>>>>
>>>>>> Thanks
>>>>>> Dennis
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 2020-09-15  Dennis Zhang  <dennis.zhang@arm.com>
>>>>>>
>>>>>>         * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE SIMD modes.
>>>>>>
>>>>
>>>> Since toolchain builds work again after Jakub's divmod fix, I'm now
>>>> facing another build error likely caused by this patch:
>>>> In file included from
>>>> /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/coretypes.h:449:0,
>>>>                    from
>>>> /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28:
>>>> /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
>>>> In function 'machine_mode arm_preferred_simd_mode(scalar_mode)':
>>>> ./insn-modes.h:196:71: error: temporary of non-literal type
>>>> 'scalar_int_mode' in a constant expression
>>>>    #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode))
>>>>                                                                          ^
>>>> /tmp/2601185_2.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:28970:12:
>>>> note: in expansion of macro 'QImode'
>>>>          case QImode:
>>>>
>>>> and similarly for the other cases.
>>>>
>>>> Does the build work for you?
>>>>
>>>> Thanks,
>>>>
>>>> Christophe
>>>>
>>>
>>> Thanks for the report. Sorry to see the error.
>>> I tested it for arm-none-eabi and arm-none-linux-gnueabi targets. I
>>> didn't get this error.
>>> Could you please help to show the configuration you use for your build?
>>> I will test and fix at once.
>>>
>>
>> It fails on all of them for me. Does it work for you with current
>> master? (r11-3720-gf18eeb6b958acd5e1590ca4a73231486b749be9b)
>>
> 
> So... I guess you are using a host with GCC more recent than 4.8.5? :-)
> When I build manually on ubuntu-16.04 with gcc-5.4, the build succeeds,
> and after manually building with the same environment in the compute
> farm I use for validation (RHEL 7, gcc-4.8.5), I managed to reproduce the
> build failure.
> It's a matter of replacing
> case QImode:
> with
> case E_QImode:
> 
> Is the attached patch OK? Or do we instead want to revisit the minimum
> gcc version required to build gcc?
> 
> Thanks,
> 
> Christophe
> 

I've tested your patch and it works with my other patches depending on 
this one. So I agree this patch is OK. Thanks for the fix.

Bests
Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH][Arm] Auto-vectorization for MVE: vmul
  2020-10-06 16:54             ` [PATCH][Arm] Auto-vectorization for MVE: vmul Dennis Zhang
@ 2020-10-14  9:14               ` Kyrylo Tkachov
  2020-10-22  0:16                 ` Dennis Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-10-14  9:14 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Dennis,

> -----Original Message-----
> From: Dennis Zhang <Dennis.Zhang@arm.com>
> Sent: 06 October 2020 17:55
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>
> Subject: [PATCH][Arm] Auto-vectorization for MVE: vmul
> 
> Hi all,
> 
> This patch enables MVE vmul instructions for auto-vectorization.
> It includes MVE in expander mul<mode>3 to enable vectorization for MVE
> and modifies related vmul insns to support the expander by using 'mult'
> instead of unspec.
> The mul<mode>3 for vectorization in vec-common.md uses mode iterator
> VDQWH instead of VALLW to cover all supported modes.
> The macros ARM_HAVE_<MODE>_ARITH are used to select supported
> modes for
> different targets. The redundant mul<mode>3 in neon.md is removed.
> 
> Regression tested on arm-none-eabi and bootstraped on
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?

Ok, thank you for your patience.
Kyrill

> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * config/arm/mve.md (mve_vmulq<mode>): New entry for vmul instruction
> using expression 'mult'.
> (mve_vmulq_f<mode>): Use mult instead of VMULQ_F.
> * config/arm/neon.md (mul<mode>3): Removed.
> * config/arm/vec-common.md (mul<mode>3): Use the new mode macros
> ARM_HAVE_<MODE>_ARITH. Use mode iterator VDQWH instead of VALLW.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * gcc.target/arm/simd/mve-vmul_1.c: New test.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
  2020-10-06 16:59             ` [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax Dennis Zhang
@ 2020-10-14  9:15               ` Kyrylo Tkachov
  2020-10-22  0:32                 ` Dennis Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-10-14  9:15 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Dennis,

> -----Original Message-----
> From: Dennis Zhang <Dennis.Zhang@arm.com>
> Sent: 06 October 2020 17:59
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>
> Subject: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
> 
> Hi all,
> 
> This patch enables MVE vmin/vmax instructions for auto-vectorization.
> MVE target is included in expander smin<mode>3, umin<mode>3,
> smax<mode>3
> and umax<mode>3 for vectorization.
> Related insns for vmin/vmax in mve.md are modified to use smin, umin,
> smax and umax expressions instead of unspec to support the expanders.
> 
> Regression tested on arm-none-eabi and bootstraped on
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?

Ok.
Thanks,
Kyrill

> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * config/arm/mve.md (mve_vmaxq_<supf><mode>): Replace with ...
> (mve_vmaxq_s<mode>, mve_vmaxq_u<mode>): ... these new insns to
> use smax/umax instead of VMAXQ.
> (mve_vminq_<supf><mode>): Replace with ...
> (mve_vminq_s<mode>, mve_vminq_u<mode>): ... these new insns to
> use smin/umin instead of VMINQ.
> (mve_vmaxnmq_f<mode>): Use smax instead of VMAXNMQ_F.
> (mve_vminnmq_f<mode>): Use smin instead of VMINNMQ_F.
> * config/arm/vec-common.md (smin<mode>3): Use the new mode macros
> ARM_HAVE_<MODE>_ARITH.
> (umin<mode>3, smax<mode>3, umax<mode>3): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * gcc.target/arm/simd/mve-vminmax_1.c: New test.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Auto-vectorization for MVE: vmul
  2020-10-14  9:14               ` Kyrylo Tkachov
@ 2020-10-22  0:16                 ` Dennis Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-10-22  0:16 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi kyrylo,

> ________________________________________
> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Sent: Wednesday, October 14, 2020 10:14 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vmul
> 
> Hi Dennis,
> 
> > -----Original Message-----
> > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > Sent: 06 October 2020 17:55
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>;
> > Richard Earnshaw <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> > <Ramana.Radhakrishnan@arm.com>
> > Subject: [PATCH][Arm] Auto-vectorization for MVE: vmul
> >
> > Hi all,
> >
> > This patch enables MVE vmul instructions for auto-vectorization.
> > It includes MVE in expander mul<mode>3 to enable vectorization for MVE 
> > and modifies related vmul insns to support the expander by using 'mult'
> > instead of unspec.
> > The mul<mode>3 for vectorization in vec-common.md uses mode iterator
> > VDQWH instead of VALLW to cover all supported modes.
> > The macros ARM_HAVE_<MODE>_ARITH are used to select supported
> > modes for 
> > different targets. The redundant mul<mode>3 in neon.md is removed.
> >
> > Regression tested on arm-none-eabi and bootstraped on
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
> 
> Ok, thank you for your patience.
> Kyrill
> 

Thanks for your approval.
It's committed to trunk at 0f41b5e02fa47db2080b77e4e1f7cd3305457c05

Cheers
Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
  2020-10-14  9:15               ` Kyrylo Tkachov
@ 2020-10-22  0:32                 ` Dennis Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-10-22  0:32 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Kyrylo,

> ________________________________________
> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Sent: Wednesday, October 14, 2020 10:15 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
>
> Hi Dennis,
>
> > -----Original Message-----
> > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > Sent: 06 October 2020 17:59
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>;
> > Richard Earnshaw <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> > <Ramana.Radhakrishnan@arm.com>
> > Subject: [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax
> >
> > Hi all,
> >
> > This patch enables MVE vmin/vmax instructions for auto-vectorization.
> > MVE target is included in expander smin<mode>3, umin<mode>3,
> > smax<mode>3
> > and umax<mode>3 for vectorization.
> > Related insns for vmin/vmax in mve.md are modified to use smin, umin,
> > smax and umax expressions instead of unspec to support the expanders.
> >
> > Regression tested on arm-none-eabi and bootstraped on
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
>
> Ok.
> Thanks,
> Kyrill
>

Thanks for your approval.
This patch has been committed to trunk at 76835dca95ab9f3f106a0db1e6152ad0740b38b3

Cheers
Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Ping: [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-10-06 16:46             ` Dennis Zhang
@ 2020-10-22  0:42               ` Dennis Zhang
  2020-10-22  8:40               ` Kyrylo Tkachov
  1 sibling, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-10-22  0:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kyrylo Tkachov, nd, Richard Earnshaw, Ramana Radhakrishnan

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555646.html
Thanks

________________________________________
From: Dennis Zhang <Dennis.Zhang@arm.com>
Sent: Tuesday, October 6, 2020 5:46 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov; nd; Richard Earnshaw; Ramana Radhakrishnan
Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub

Hi all,

On 8/17/20 6:41 PM, Dennis Zhang wrote:
>
> Hi all,
>
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds RTL templates for MVE vsub instructions using 'minus' instead of
> unspec expression to make the instructions recognizable for vectorization.
> MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
> modified to use a mode iterator that selects available modes for various
> targets correspondingly.
> MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> support vectorization.
>
> This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> generate wrong instruction numbers because of unexpected icf optimization.
> This bug is exposed by the MVE vector modes enabled in this patch,
> therefore it is corrected in this patch to avoid test failures.
>
> MVE instructions are documented here:
> https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/helium-intrinsics
>
> The patch is regtested for arm-none-eabi and bootstrapped for
> arm-none-linux-gnueabihf.
>
> Is it OK for trunk please?
>
> Thanks
> Dennis
>
> gcc/ChangeLog:
>
> 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
>
>       * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector modes.
>       * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
>       (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP): Likewise.
>       (TARGET_NEON_MVE_HFP): Likewise.
>       * config/arm/iterators.md (VSEL): New mode iterator to select modes
>       for corresponding targets.
>       * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
>       using expression 'minus'.
>       (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
>       * config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
>       sub<mode>3 in vec-common.md
>       * config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use VSEL
>       to select available modes. Exclude TARGET_NEON_FP16INST from
>       TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
>       originally in neon.md.
>
> gcc/testsuite/ChangeLog:
>
> 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
>
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
>       option -fno-ipa-icf and change the instruction count from 8 to 16.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
>       * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
>       * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
>       * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
>       * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
>

This patch is updated based on Richard Sandiford's patch adding new
vector mode macros:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
The old version of this patch is at
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
And a less related part in the old version is separated into another
patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554100.html

This patch enables MVE vsub instructions for auto-vectorization.
It adds insns for MVE vsub instructions using 'minus' instead of unspec
expression to make the instructions recognizable for auto-vectorization.
The sub<mode>3 in mve.md is modified to use new mode macros which make
the expander available when certain modes are supported. Then various
targets can share this expander for vectorization. The redundant
sub<mode>3 insns in neon.md are then removed.

Regression tested on arm-none-eabi and bootstraped on
arm-none-linux-gnueabihf.

Is it OK for trunk please?

Thanks
Dennis

gcc/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>

        * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
        using expression 'minus'.
        (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
        * config/arm/neon.md (*sub<mode>3_neon): Use the new mode macros
        ARM_HAVE_<MODE>_ARITH.
        (sub<mode>3, sub<mode>3_fp16): Removed.
        (neon_vsub<mode>): Use gen_sub<mode>3 instead of gen_sub<mode>3_fp16.
        * config/arm/vec-common.md (sub<mode>3): Use the new mode macros
        ARM_HAVE_<MODE>_ARITH.

gcc/testsuite/ChangeLog:

2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>

        * gcc.target/arm/simd/mve-vsub_1.c: New test.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-10-06 16:46             ` Dennis Zhang
  2020-10-22  0:42               ` Ping: " Dennis Zhang
@ 2020-10-22  8:40               ` Kyrylo Tkachov
  2020-10-23  8:01                 ` Dennis Zhang
  1 sibling, 1 reply; 41+ messages in thread
From: Kyrylo Tkachov @ 2020-10-22  8:40 UTC (permalink / raw)
  To: Dennis Zhang, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Dennis,

> -----Original Message-----
> From: Dennis Zhang <Dennis.Zhang@arm.com>
> Sent: 06 October 2020 17:47
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> <Ramana.Radhakrishnan@arm.com>
> Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
> 
> Hi all,
> 
> On 8/17/20 6:41 PM, Dennis Zhang wrote:
> >
> > Hi all,
> >
> > This patch enables MVE vsub instructions for auto-vectorization.
> > It adds RTL templates for MVE vsub instructions using 'minus' instead of
> > unspec expression to make the instructions recognizable for vectorization.
> > MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
> > modified to use a mode iterator that selects available modes for various
> > targets correspondingly.
> > MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> > support vectorization.
> >
> > This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> > generate wrong instruction numbers because of unexpected icf
> optimization.
> > This bug is exposed by the MVE vector modes enabled in this patch,
> > therefore it is corrected in this patch to avoid test failures.
> >
> > MVE instructions are documented here:
> > https://developer.arm.com/architectures/instruction-sets/simd-
> isas/helium/helium-intrinsics
> >
> > The patch is regtested for arm-none-eabi and bootstrapped for
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
> >
> > Thanks
> > Dennis
> >
> > gcc/ChangeLog:
> >
> > 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector
> modes.
> > * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> > (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP):
> Likewise.
> > (TARGET_NEON_MVE_HFP): Likewise.
> > * config/arm/iterators.md (VSEL): New mode iterator to select modes
> > for corresponding targets.
> > * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
> > using expression 'minus'.
> > (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
> > * config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
> > sub<mode>3 in vec-common.md
> > * config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use
> VSEL
> > to select available modes. Exclude TARGET_NEON_FP16INST from
> > TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> > originally in neon.md.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> > option -fno-ipa-icf and change the instruction count from 8 to 16.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> > * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> > * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> > * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> >
> 
> This patch is updated based on Richard Sandiford's patch adding new
> vector mode macros:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
> The old version of this patch is at
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> And a less related part in the old version is separated into another
> patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> September/554100.html
> 
> This patch enables MVE vsub instructions for auto-vectorization.
> It adds insns for MVE vsub instructions using 'minus' instead of unspec
> expression to make the instructions recognizable for auto-vectorization.
> The sub<mode>3 in mve.md is modified to use new mode macros which
> make
> the expander available when certain modes are supported. Then various
> targets can share this expander for vectorization. The redundant
> sub<mode>3 insns in neon.md are then removed.
> 
> Regression tested on arm-none-eabi and bootstraped on
> arm-none-linux-gnueabihf.
> 
> Is it OK for trunk please?

Ok.
Thanks,
Kyrill

> 
> Thanks
> Dennis
> 
> gcc/ChangeLog:
> 
> 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
> using expression 'minus'.
> (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
> * config/arm/neon.md (*sub<mode>3_neon): Use the new mode macros
> ARM_HAVE_<MODE>_ARITH.
> (sub<mode>3, sub<mode>3_fp16): Removed.
> (neon_vsub<mode>): Use gen_sub<mode>3 instead of
> gen_sub<mode>3_fp16.
> * config/arm/vec-common.md (sub<mode>3): Use the new mode macros
> ARM_HAVE_<MODE>_ARITH.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> 
> * gcc.target/arm/simd/mve-vsub_1.c: New test.
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-10-22  8:40               ` Kyrylo Tkachov
@ 2020-10-23  8:01                 ` Dennis Zhang
  2020-11-09 13:38                   ` Christophe Lyon
  0 siblings, 1 reply; 41+ messages in thread
From: Dennis Zhang @ 2020-10-23  8:01 UTC (permalink / raw)
  To: Kyrylo Tkachov, gcc-patches; +Cc: nd, Richard Earnshaw, Ramana Radhakrishnan

Hi Kyrylo,

> ________________________________________
> From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Sent: Thursday, October 22, 2020 9:40 AM
> To: Dennis Zhang; gcc-patches@gcc.gnu.org
> Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vsub
>
> Hi Dennis,
>
> > -----Original Message-----
> > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > Sent: 06 October 2020 17:47
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>;
> > Richard Earnshaw <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> > <Ramana.Radhakrishnan@arm.com>
> > Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
> >
> > Hi all,
> >
> > On 8/17/20 6:41 PM, Dennis Zhang wrote:
> > >
> > > Hi all,
> > >
> > > This patch enables MVE vsub instructions for auto-vectorization.
> > > It adds RTL templates for MVE vsub instructions using 'minus' instead of
> > > unspec expression to make the instructions recognizable for vectorization.
> > > MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
> > > modified to use a mode iterator that selects available modes for various
> > > targets correspondingly.
> > > MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> > > support vectorization.
> > >
> > > This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> > > generate wrong instruction numbers because of unexpected icf
> > optimization.
> > > This bug is exposed by the MVE vector modes enabled in this patch,
> > > therefore it is corrected in this patch to avoid test failures.
> > >
> > > MVE instructions are documented here:
> > > https://developer.arm.com/architectures/instruction-sets/simd-
> > isas/helium/helium-intrinsics
> > >
> > > The patch is regtested for arm-none-eabi and bootstrapped for
> > > arm-none-linux-gnueabihf.
> > >
> > > Is it OK for trunk please?
> > >
> > > Thanks
> > > Dennis
> > >
> > > gcc/ChangeLog:
> > >
> > > 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
> > >
> > > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector
> > modes.
> > > * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> > > (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP):
> > Likewise.
> > > (TARGET_NEON_MVE_HFP): Likewise.
> > > * config/arm/iterators.md (VSEL): New mode iterator to select modes
> > > for corresponding targets.
> > > * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
> > > using expression 'minus'.
> > > (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
> > > * config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
> > > sub<mode>3 in vec-common.md
> > > * config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use
> > VSEL
> > > to select available modes. Exclude TARGET_NEON_FP16INST from
> > > TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> > > originally in neon.md.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
> > >
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> > > option -fno-ipa-icf and change the instruction count from 8 to 16.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> > > * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> > > * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> > > * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> > >
> >
> > This patch is updated based on Richard Sandiford's patch adding new
> > vector mode macros:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
> > The old version of this patch is at
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > And a less related part in the old version is separated into another
> > patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > September/554100.html
> >
> > This patch enables MVE vsub instructions for auto-vectorization.
> > It adds insns for MVE vsub instructions using 'minus' instead of unspec
> > expression to make the instructions recognizable for auto-vectorization.
> > The sub<mode>3 in mve.md is modified to use new mode macros which
> > make
> > the expander available when certain modes are supported. Then various
> > targets can share this expander for vectorization. The redundant
> > sub<mode>3 insns in neon.md are then removed.
> >
> > Regression tested on arm-none-eabi and bootstraped on
> > arm-none-linux-gnueabihf.
> >
> > Is it OK for trunk please?
>
> Ok.
> Thanks,
> Kyrill
>

Thanks for your approval. The patch has been committed as 98161c248c88f873bbffba23664c540f551d89d5

Bests
Dennis

> >
> > gcc/ChangeLog:
> >
> > 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
> > using expression 'minus'.
> > (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
> > * config/arm/neon.md (*sub<mode>3_neon): Use the new mode macros
> > ARM_HAVE_<MODE>_ARITH.
> > (sub<mode>3, sub<mode>3_fp16): Removed.
> > (neon_vsub<mode>): Use gen_sub<mode>3 instead of
> > gen_sub<mode>3_fp16.
> > * config/arm/vec-common.md (sub<mode>3): Use the new mode macros
> > ARM_HAVE_<MODE>_ARITH.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> >
> > * gcc.target/arm/simd/mve-vsub_1.c: New test.
> >

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-10-23  8:01                 ` Dennis Zhang
@ 2020-11-09 13:38                   ` Christophe Lyon
  2020-12-10 15:37                     ` [committed][Patch]arm: Fix typo in testcase mve-vsub_1.c Dennis Zhang
  2020-12-10 15:43                     ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
  0 siblings, 2 replies; 41+ messages in thread
From: Christophe Lyon @ 2020-11-09 13:38 UTC (permalink / raw)
  To: Dennis Zhang
  Cc: Kyrylo Tkachov, gcc-patches, Richard Earnshaw, nd, Ramana Radhakrishnan

Hi,


On Fri, 23 Oct 2020 at 10:02, Dennis Zhang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi Kyrylo,
>
> > ________________________________________
> > From: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> > Sent: Thursday, October 22, 2020 9:40 AM
> > To: Dennis Zhang; gcc-patches@gcc.gnu.org
> > Cc: nd; Richard Earnshaw; Ramana Radhakrishnan
> > Subject: RE: [PATCH][Arm] Auto-vectorization for MVE: vsub
> >
> > Hi Dennis,
> >
> > > -----Original Message-----
> > > From: Dennis Zhang <Dennis.Zhang@arm.com>
> > > Sent: 06 October 2020 17:47
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; nd <nd@arm.com>;
> > > Richard Earnshaw <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan
> > > <Ramana.Radhakrishnan@arm.com>
> > > Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
> > >
> > > Hi all,
> > >
> > > On 8/17/20 6:41 PM, Dennis Zhang wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This patch enables MVE vsub instructions for auto-vectorization.
> > > > It adds RTL templates for MVE vsub instructions using 'minus' instead of
> > > > unspec expression to make the instructions recognizable for vectorization.
> > > > MVE target is added in sub<mode>3 optab. The sub<mode>3 optab is
> > > > modified to use a mode iterator that selects available modes for various
> > > > targets correspondingly.
> > > > MVE vector modes are enabled in arm_preferred_simd_mode in arm.c to
> > > > support vectorization.
> > > >
> > > > This patch also fixes 'vreinterpretq_*.c' MVE intrinsic tests. The tests
> > > > generate wrong instruction numbers because of unexpected icf
> > > optimization.
> > > > This bug is exposed by the MVE vector modes enabled in this patch,
> > > > therefore it is corrected in this patch to avoid test failures.
> > > >
> > > > MVE instructions are documented here:
> > > > https://developer.arm.com/architectures/instruction-sets/simd-
> > > isas/helium/helium-intrinsics
> > > >
> > > > The patch is regtested for arm-none-eabi and bootstrapped for
> > > > arm-none-linux-gnueabihf.
> > > >
> > > > Is it OK for trunk please?
> > > >
> > > > Thanks
> > > > Dennis
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
> > > >
> > > > * config/arm/arm.c (arm_preferred_simd_mode): Enable MVE vector
> > > modes.
> > > > * config/arm/arm.h (TARGET_NEON_IWMMXT): New macro.
> > > > (TARGET_NEON_IWMMXT_MVE, TARGET_NEON_IWMMXT_MVE_FP):
> > > Likewise.
> > > > (TARGET_NEON_MVE_HFP): Likewise.
> > > > * config/arm/iterators.md (VSEL): New mode iterator to select modes
> > > > for corresponding targets.
> > > > * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
> > > > using expression 'minus'.
> > > > (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
> > > > * config/arm/neon.md (sub<mode>3): Removed here. Integrated in the
> > > > sub<mode>3 in vec-common.md
> > > > * config/arm/vec-common.md (sub<mode>3): Enable MVE target. Use
> > > VSEL
> > > > to select available modes. Exclude TARGET_NEON_FP16INST from
> > > > TARGET_NEON statement. Intergrate TARGET_NEON_FP16INST which is
> > > > originally in neon.md.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 2020-08-10  Dennis Zhang  <dennis.zhang@arm.com>
> > > >
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f16.c: Use additional
> > > > option -fno-ipa-icf and change the instruction count from 8 to 16.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_f32.c: Likewise.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s16.c: Likewise.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s32.c: Likewise.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s64.c: Likewise.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_s8.c: Likewise.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u16.c: Likewise.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u32.c: Likewise.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u64.c: Likewise.
> > > > * gcc.target/arm/mve/intrinsics/vreinterpretq_u8.c: Likewise.
> > > > * gcc.target/arm/mve/mve.exp: Include tests in subdir 'vect'.
> > > > * gcc.target/arm/mve/vect/vect_sub_0.c: New test.
> > > > * gcc.target/arm/mve/vect/vect_sub_1.c: New test.
> > > >
> > >
> > > This patch is updated based on Richard Sandiford's patch adding new
> > > vector mode macros:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553425.html
> > > The old version of this patch is at
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552104.html
> > > And a less related part in the old version is separated into another
> > > patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-
> > > September/554100.html
> > >
> > > This patch enables MVE vsub instructions for auto-vectorization.
> > > It adds insns for MVE vsub instructions using 'minus' instead of unspec
> > > expression to make the instructions recognizable for auto-vectorization.
> > > The sub<mode>3 in mve.md is modified to use new mode macros which
> > > make
> > > the expander available when certain modes are supported. Then various
> > > targets can share this expander for vectorization. The redundant
> > > sub<mode>3 insns in neon.md are then removed.
> > >
> > > Regression tested on arm-none-eabi and bootstraped on
> > > arm-none-linux-gnueabihf.
> > >
> > > Is it OK for trunk please?
> >
> > Ok.
> > Thanks,
> > Kyrill
> >
>
> Thanks for your approval. The patch has been committed as 98161c248c88f873bbffba23664c540f551d89d5
>

I have just noticed that the new test has:
/* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
/* { dg-additional-options "-O3" } */
That is, the first line has a typo (space between dg and -additional-options),
so the test is effectively compiled with -O3, and without
-funsafe-math-optimizations

Since I can see it passing, it looks like -funsafe-math-optimizations
is not needed, can you clarify?

Thanks


> Bests
> Dennis
>
> > >
> > > gcc/ChangeLog:
> > >
> > > 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> > >
> > > * config/arm/mve.md (mve_vsubq<mode>): New entry for vsub instruction
> > > using expression 'minus'.
> > > (mve_vsubq_f<mode>): Use minus instead of VSUBQ_F unspec.
> > > * config/arm/neon.md (*sub<mode>3_neon): Use the new mode macros
> > > ARM_HAVE_<MODE>_ARITH.
> > > (sub<mode>3, sub<mode>3_fp16): Removed.
> > > (neon_vsub<mode>): Use gen_sub<mode>3 instead of
> > > gen_sub<mode>3_fp16.
> > > * config/arm/vec-common.md (sub<mode>3): Use the new mode macros
> > > ARM_HAVE_<MODE>_ARITH.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2020-10-02  Dennis Zhang  <dennis.zhang@arm.com>
> > >
> > > * gcc.target/arm/simd/mve-vsub_1.c: New test.
> > >

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [committed][Patch]arm: Fix typo in testcase mve-vsub_1.c
  2020-11-09 13:38                   ` Christophe Lyon
@ 2020-12-10 15:37                     ` Dennis Zhang
  2020-12-10 15:43                     ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
  1 sibling, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-12-10 15:37 UTC (permalink / raw)
  To: gcc-patches
  Cc: Kyrylo Tkachov, Richard Earnshaw, nd, Ramana Radhakrishnan,
	Christophe Lyon

[-- Attachment #1: Type: text/plain, Size: 229 bytes --]

This patch fixes a typo reported at https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558478.html

gcc/testsuite/
	* gcc.target/arm/simd/mve-vsub_1.c: Fix typo.
	Remove needless dg-additional-options.

Cheers,
Dennis

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fix.patch --]
[-- Type: text/x-patch; name="fix.patch", Size: 526 bytes --]

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
index cb3ef3a14e0..842e5c6a30b 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vsub_1.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
 /* { dg-additional-options "-O3" } */
 
 #include <stdint.h>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
  2020-11-09 13:38                   ` Christophe Lyon
  2020-12-10 15:37                     ` [committed][Patch]arm: Fix typo in testcase mve-vsub_1.c Dennis Zhang
@ 2020-12-10 15:43                     ` Dennis Zhang
  1 sibling, 0 replies; 41+ messages in thread
From: Dennis Zhang @ 2020-12-10 15:43 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: Kyrylo Tkachov, gcc-patches, Richard Earnshaw, nd, Ramana Radhakrishnan

Hi Christophe,

> From: Christophe Lyon <christophe.lyon@linaro.org>
> Sent: Monday, November 9, 2020 1:38 PM
> To: Dennis Zhang
> Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; nd; Ramana Radhakrishnan
> Subject: Re: [PATCH][Arm] Auto-vectorization for MVE: vsub
>
> Hi,
>
> I have just noticed that the new test has:
> /* { dg -additional-options "-O3 -funsafe-math-optimizations" } */
> /* { dg-additional-options "-O3" } */
> That is, the first line has a typo (space between dg and -additional-options),
> so the test is effectively compiled with -O3, and without
> -funsafe-math-optimizations
>
> Since I can see it passing, it looks like -funsafe-math-optimizations
> is not needed, can you clarify?
>
> Thanks

Thank you for the report. The '-funsafe-math-optimizations' option is not needed.
The typo is fixed by commit b46dd03fe94e2428cbcdbfc4d081d89ed604803a.

Bests
Dennis

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-12-10 15:43 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 14:33 [PATCH][Arm] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16 Dennis Zhang
2019-12-12 17:30 ` Dennis Zhang
2019-12-20 15:35   ` Kyrill Tkachov
2020-01-02 17:28     ` Dennis Zhang
2020-03-12 12:05 ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Dennis Zhang
2020-03-13 19:31   ` [PATCH][Arm][2/4] Custom Datapath Extension intrinsics: instructions using FPU/MVE S/D registers Dennis Zhang
2020-03-20 15:18     ` Dennis Zhang
2020-04-07 12:31       ` Dennis Zhang
2020-04-07 14:07         ` Kyrylo Tkachov
2020-04-08 15:25           ` Dennis Zhang
2020-08-17 18:41           ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
2020-08-21 22:33             ` Ramana Radhakrishnan
2020-09-07  7:20               ` Dennis Zhang
2020-10-06 16:46             ` Dennis Zhang
2020-10-22  0:42               ` Ping: " Dennis Zhang
2020-10-22  8:40               ` Kyrylo Tkachov
2020-10-23  8:01                 ` Dennis Zhang
2020-11-09 13:38                   ` Christophe Lyon
2020-12-10 15:37                     ` [committed][Patch]arm: Fix typo in testcase mve-vsub_1.c Dennis Zhang
2020-12-10 15:43                     ` [PATCH][Arm] Auto-vectorization for MVE: vsub Dennis Zhang
2020-10-06 16:54             ` [PATCH][Arm] Auto-vectorization for MVE: vmul Dennis Zhang
2020-10-14  9:14               ` Kyrylo Tkachov
2020-10-22  0:16                 ` Dennis Zhang
2020-10-06 16:59             ` [PATCH][Arm] Auto-vectorization for MVE: vmin/vmax Dennis Zhang
2020-10-14  9:15               ` Kyrylo Tkachov
2020-10-22  0:32                 ` Dennis Zhang
2020-09-16 16:00           ` [PATCH][Arm] Enable MVE SIMD modes for vectorization Dennis Zhang
2020-10-06 13:37             ` Ping: " Dennis Zhang
2020-10-06 13:43               ` Kyrylo Tkachov
2020-10-08 13:14               ` Christophe Lyon
2020-10-08 14:06                 ` Dennis Zhang
2020-10-08 14:22                   ` Christophe Lyon
2020-10-12 11:40                     ` Christophe Lyon
2020-10-12 13:22                       ` Kyrylo Tkachov
2020-10-12 15:39                       ` Dennis Zhang
2020-03-18  9:04   ` [PATCH][Arm][1/3] Support for Arm Custom Datapath Extension (CDE): enable the feature Kyrylo Tkachov
2020-03-19 14:02     ` Dennis Zhang
2020-03-19 17:48       ` Kyrylo Tkachov
2020-04-08 11:33         ` Dennis Zhang
2020-04-08 12:34           ` Kyrylo Tkachov
2020-04-08 15:19             ` Dennis Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).