public inbox for gcc-regression@sourceware.org
help / color / mirror / Atom feed
* [TCWG CI] Regression caused by gcc: [Patch][GCC][AArch64] - Lower store and load neon builtins to gimple
@ 2021-10-21 22:54 ci_notify
  0 siblings, 0 replies; only message in thread
From: ci_notify @ 2021-10-21 22:54 UTC (permalink / raw)
  To: Andre Simoes Dias Vieira; +Cc: gcc-regression

[TCWG CI] Regression caused by gcc: [Patch][GCC][AArch64] - Lower store and load neon builtins to gimple:
commit ad44c6a56c777bd1eddb214095fff36c8dba9246
Author: Andre Simoes Dias Vieira <andre.simoesdiasvieira@arm.com>

    [Patch][GCC][AArch64] - Lower store and load neon builtins to gimple

Results regressed to
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# First few build errors in logs:
# 00:04:07 make[3]: [Makefile:1772: aarch64-unknown-linux-gnu/bits/largefile-config.h] Error 1 (ignored)
# 00:04:07 make[3]: [Makefile:1773: aarch64-unknown-linux-gnu/bits/largefile-config.h] Error 1 (ignored)
# 00:18:01 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:16632:32: runtime error: load of misaligned address 0xffffb7e1a056 for type '__Int8x16_t', which requires 16 byte alignment
# 00:18:02 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:16632:32: runtime error: load of misaligned address 0xffffa14e9056 for type '__Int8x16_t', which requires 16 byte alignment
# 00:45:42 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:16632:32: runtime error: load of misaligned address 0x00002768b48a for type '__Int8x16_t', which requires 16 byte alignment
# 00:45:43 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:16632:32: runtime error: load of misaligned address 0x00001241d371 for type '__Int8x16_t', which requires 16 byte alignment
# 00:45:44 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:16632:32: runtime error: load of misaligned address 0x000022f56c61 for type '__Int8x16_t', which requires 16 byte alignment
# 00:45:53 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:16632:32: runtime error: load of misaligned address 0x00003fa8ec8a for type '__Int8x16_t', which requires 16 byte alignment
# 00:45:56 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:16632:32: runtime error: load of misaligned address 0x0000164cb6ca for type '__Int8x16_t', which requires 16 byte alignment
# 00:45:56 /home/tcwg-buildslave/workspace/tcwg_gnu_6/abe/builds/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/gcc-gcc.git~master-stage2/prev-gcc/include/arm_neon.h:16632:32: runtime error: load of misaligned address 0x00003307981a for type '__Int8x16_t', which requires 16 byte alignment

from
# reset_artifacts:
-10
# true:
0
# build_abe binutils:
1
# build_abe bootstrap_ubsan:
2

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_gcc_bootstrap/master-aarch64-bootstrap_ubsan

First_bad build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstrap_ubsan/5/artifact/artifacts/build-ad44c6a56c777bd1eddb214095fff36c8dba9246/
Last_good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstrap_ubsan/5/artifact/artifacts/build-914045dff10fbd27de27b90a0ac78a0058b2c86e/
Baseline build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstrap_ubsan/5/artifact/artifacts/build-baseline/
Even more details: https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstrap_ubsan/5/artifact/artifacts/

Reproduce builds:
<cut>
mkdir investigate-gcc-ad44c6a56c777bd1eddb214095fff36c8dba9246
cd investigate-gcc-ad44c6a56c777bd1eddb214095fff36c8dba9246

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstrap_ubsan/5/artifact/artifacts/manifests/build-baseline.sh --fail
curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstrap_ubsan/5/artifact/artifacts/manifests/build-parameters.sh --fail
curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-bisect-master-aarch64-bootstrap_ubsan/5/artifact/artifacts/test.sh --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_gnu-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /gcc/ ./ ./bisect/baseline/

cd gcc

# Reproduce first_bad build
git checkout --detach ad44c6a56c777bd1eddb214095fff36c8dba9246
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 914045dff10fbd27de27b90a0ac78a0058b2c86e
../artifacts/test.sh

cd ..
</cut>

Full commit (up to 1000 lines):
<cut>
commit ad44c6a56c777bd1eddb214095fff36c8dba9246
Author: Andre Simoes Dias Vieira <andre.simoesdiasvieira@arm.com>
Date:   Wed Oct 20 13:19:10 2021 +0100

    [Patch][GCC][AArch64] - Lower store and load neon builtins to gimple
    
    20-10-2021  Andre Vieira  <andre.simoesdiasvieira@arm.com>
                Jirui Wu  <jirui.wu@arm.com>
    gcc/ChangeLog:
    
            * config/aarch64/aarch64-builtins.c
            (aarch64_general_gimple_fold_builtin):
            lower vld1 and vst1 variants of the neon builtins
            * config/aarch64/aarch64-protos.h:
            (aarch64_general_gimple_fold_builtin): Add gsi parameter.
            * config/aarch64/aarch64.c (aarch64_general_gimple_fold_builtin):
            Likwise.
    
    gcc/testsuite/ChangeLog:
    
            * gcc.target/aarch64/fmla_intrinsic_1.c: prevent over optimization.
            * gcc.target/aarch64/fmls_intrinsic_1.c: Likewise.
            * gcc.target/aarch64/fmul_intrinsic_1.c: Likewise.
            * gcc.target/aarch64/mla_intrinsic_1.c: Likewise.
            * gcc.target/aarch64/mls_intrinsic_1.c: Likewise.
            * gcc.target/aarch64/mul_intrinsic_1.c: Likewise.
            * gcc.target/aarch64/simd/vmul_elem_1.c: Likewise.
            * gcc.target/aarch64/vclz.c: Likewise.
            * gcc.target/aarch64/vneg_s.c: Likewise.
---
 gcc/config/aarch64/aarch64-builtins.c              | 103 +++++++-
 gcc/config/aarch64/aarch64-protos.h                |   3 +-
 gcc/config/aarch64/aarch64.c                       |   2 +-
 .../gcc.target/aarch64/fmla_intrinsic_1.c          |   9 +-
 .../gcc.target/aarch64/fmls_intrinsic_1.c          |   9 +-
 .../gcc.target/aarch64/fmul_intrinsic_1.c          |  11 +-
 gcc/testsuite/gcc.target/aarch64/mla_intrinsic_1.c |   1 +
 gcc/testsuite/gcc.target/aarch64/mls_intrinsic_1.c |   1 +
 gcc/testsuite/gcc.target/aarch64/mul_intrinsic_1.c |   1 +
 .../gcc.target/aarch64/simd/vmul_elem_1.c          |  44 ++++
 gcc/testsuite/gcc.target/aarch64/vclz.c            | 272 +++++++++++----------
 gcc/testsuite/gcc.target/aarch64/vneg_s.c          | 167 +++++--------
 12 files changed, 371 insertions(+), 252 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 1a507ea5914..a815e4cfbcc 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -46,6 +46,7 @@
 #include "emit-rtl.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "gimple-fold.h"
 
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
@@ -2399,11 +2400,65 @@ aarch64_general_fold_builtin (unsigned int fcode, tree type,
   return NULL_TREE;
 }
 
+enum aarch64_simd_type
+get_mem_type_for_load_store (unsigned int fcode)
+{
+  switch (fcode)
+  {
+    VAR1 (LOAD1, ld1 , 0, LOAD, v8qi)
+    VAR1 (STORE1, st1 , 0, STORE, v8qi)
+      return Int8x8_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v16qi)
+    VAR1 (STORE1, st1 , 0, STORE, v16qi)
+      return Int8x16_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v4hi)
+    VAR1 (STORE1, st1 , 0, STORE, v4hi)
+      return Int16x4_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v8hi)
+    VAR1 (STORE1, st1 , 0, STORE, v8hi)
+      return Int16x8_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v2si)
+    VAR1 (STORE1, st1 , 0, STORE, v2si)
+      return Int32x2_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v4si)
+    VAR1 (STORE1, st1 , 0, STORE, v4si)
+      return Int32x4_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v2di)
+    VAR1 (STORE1, st1 , 0, STORE, v2di)
+      return Int64x2_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v4hf)
+    VAR1 (STORE1, st1 , 0, STORE, v4hf)
+      return Float16x4_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v8hf)
+    VAR1 (STORE1, st1 , 0, STORE, v8hf)
+      return Float16x8_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v4bf)
+    VAR1 (STORE1, st1 , 0, STORE, v4bf)
+      return Bfloat16x4_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v8bf)
+    VAR1 (STORE1, st1 , 0, STORE, v8bf)
+      return Bfloat16x8_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v2sf)
+    VAR1 (STORE1, st1 , 0, STORE, v2sf)
+      return Float32x2_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v4sf)
+    VAR1 (STORE1, st1 , 0, STORE, v4sf)
+      return Float32x4_t;
+    VAR1 (LOAD1, ld1 , 0, LOAD, v2df)
+    VAR1 (STORE1, st1 , 0, STORE, v2df)
+      return Float64x2_t;
+    default:
+      gcc_unreachable ();
+      break;
+  }
+}
+
 /* Try to fold STMT, given that it's a call to the built-in function with
    subcode FCODE.  Return the new statement on success and null on
    failure.  */
 gimple *
-aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt)
+aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt,
+				     gimple_stmt_iterator *gsi)
 {
   gimple *new_stmt = NULL;
   unsigned nargs = gimple_call_num_args (stmt);
@@ -2421,6 +2476,52 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt)
 					       1, args[0]);
 	gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
 	break;
+
+     /*lower store and load neon builtins to gimple.  */
+     BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD)
+	if (!BYTES_BIG_ENDIAN)
+	  {
+	    enum aarch64_simd_type mem_type
+	      = get_mem_type_for_load_store(fcode);
+	    aarch64_simd_type_info simd_type
+	      = aarch64_simd_types[mem_type];
+	    tree elt_ptr_type = build_pointer_type (simd_type.eltype);
+	    tree zero = build_zero_cst (elt_ptr_type);
+	    gimple_seq stmts = NULL;
+	    tree base = gimple_convert (&stmts, elt_ptr_type,
+					args[0]);
+	    if (stmts)
+	      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	    new_stmt
+	      = gimple_build_assign (gimple_get_lhs (stmt),
+				     fold_build2 (MEM_REF,
+						  simd_type.itype,
+						  base, zero));
+	  }
+	break;
+
+      BUILTIN_VALL_F16 (STORE1, st1, 0, STORE)
+	if (!BYTES_BIG_ENDIAN)
+	  {
+	    enum aarch64_simd_type mem_type
+	      = get_mem_type_for_load_store(fcode);
+	    aarch64_simd_type_info simd_type
+	      = aarch64_simd_types[mem_type];
+	    tree elt_ptr_type = build_pointer_type (simd_type.eltype);
+	    tree zero = build_zero_cst (elt_ptr_type);
+	    gimple_seq stmts = NULL;
+	    tree base = gimple_convert (&stmts, elt_ptr_type,
+					args[0]);
+	    if (stmts)
+	      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	    new_stmt
+	      = gimple_build_assign (fold_build2 (MEM_REF,
+				     simd_type.itype,
+				     base,
+				     zero), args[1]);
+	  }
+	break;
+
       BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10, ALL)
       BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10, ALL)
 	new_stmt = gimple_build_call_internal (IFN_REDUC_MAX,
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b91eeeba101..768e8fae136 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -962,7 +962,8 @@ void aarch64_override_options_internal (struct gcc_options *);
 const char *aarch64_general_mangle_builtin_type (const_tree);
 void aarch64_general_init_builtins (void);
 tree aarch64_general_fold_builtin (unsigned int, tree, unsigned int, tree *);
-gimple *aarch64_general_gimple_fold_builtin (unsigned int, gcall *);
+gimple *aarch64_general_gimple_fold_builtin (unsigned int, gcall *,
+					     gimple_stmt_iterator *);
 rtx aarch64_general_expand_builtin (unsigned int, tree, rtx, int);
 tree aarch64_general_builtin_decl (unsigned, bool);
 tree aarch64_general_builtin_rsqrt (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fdf341812f4..730607f7add 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14156,7 +14156,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   switch (code & AARCH64_BUILTIN_CLASS)
     {
     case AARCH64_BUILTIN_GENERAL:
-      new_stmt = aarch64_general_gimple_fold_builtin (subcode, stmt);
+      new_stmt = aarch64_general_gimple_fold_builtin (subcode, stmt, gsi);
       break;
 
     case AARCH64_BUILTIN_SVE:
diff --git a/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c b/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
index 59ad41ed047..adb787a8599 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmla_intrinsic_1.c
@@ -11,6 +11,7 @@ extern void abort (void);
 
 #define TEST_VMLA(q1, q2, size, in1_lanes, in2_lanes)			\
 static void								\
+__attribute__((noipa,noinline))						\
 test_vfma##q1##_lane##q2##_f##size (float##size##_t * res,		\
 				   const float##size##_t *in1,		\
 				   const float##size##_t *in2)		\
@@ -104,12 +105,12 @@ main (int argc, char **argv)
    vfmaq_laneq_f32.  */
 /* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 2 } } */
 
-/* vfma_lane_f64.  */
-/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, d\[0-9\]+\, d\[0-9\]+" 1 } } */
+/* vfma_lane_f64.
+   vfma_laneq_f64. */
+/* { dg-final { scan-assembler-times "fmadd\\td\[0-9\]+\, d\[0-9\]+\, d\[0-9\]+\, d\[0-9\]+" 2 } } */
 
 /* vfmaq_lane_f64.
-   vfma_laneq_f64.
    vfmaq_laneq_f64.  */
-/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 } } */
+/* { dg-final { scan-assembler-times "fmla\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 } } */
 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c b/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
index 2d5a3d30536..865def28c3f 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmls_intrinsic_1.c
@@ -11,6 +11,7 @@ extern void abort (void);
 
 #define TEST_VMLS(q1, q2, size, in1_lanes, in2_lanes)			\
 static void								\
+__attribute__((noipa,noinline))						\
 test_vfms##q1##_lane##q2##_f##size (float##size##_t * res,		\
 				   const float##size##_t *in1,		\
 				   const float##size##_t *in2)		\
@@ -105,12 +106,12 @@ main (int argc, char **argv)
    vfmsq_laneq_f32.  */
 /* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 2 } } */
 
-/* vfms_lane_f64.  */
-/* { dg-final { scan-assembler-times "fmsub\\td\[0-9\]+\, d\[0-9\]+\, d\[0-9\]+\, d\[0-9\]+" 1 } } */
+/* vfms_lane_f64.
+   vfms_laneq_f64.  */
+/* { dg-final { scan-assembler-times "fmsub\\td\[0-9\]+\, d\[0-9\]+\, d\[0-9\]+\, d\[0-9\]+" 2 } } */
 
 /* vfmsq_lane_f64.
-   vfms_laneq_f64.
    vfmsq_laneq_f64.  */
-/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 } } */
+/* { dg-final { scan-assembler-times "fmls\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 } } */
 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_intrinsic_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_intrinsic_1.c
index 8b0880d89b1..d01095e81c1 100644
--- a/gcc/testsuite/gcc.target/aarch64/fmul_intrinsic_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/fmul_intrinsic_1.c
@@ -9,6 +9,7 @@ extern double fabs (double);
 
 #define TEST_VMUL(q1, q2, size, in1_lanes, in2_lanes)			\
 static void								\
+__attribute__((noipa,noinline))						\
 test_vmul##q1##_lane##q2##_f##size (float##size##_t * res,		\
 				   const float##size##_t *in1,		\
 				   const float##size##_t *in2)		\
@@ -104,12 +105,12 @@ main (int argc, char **argv)
    vmulq_laneq_f32.  */
 /* { dg-final { scan-assembler-times "fmul\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s, v\[0-9\]+\.s\\\[\[0-9\]+\\\]" 2 } } */
 
-/* vmul_lane_f64.  */
-/* { dg-final { scan-assembler-times "fmul\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 1 } } */
+/* vmul_lane_f64.
+   Vmul_laneq_f64. */
+/* { dg-final { scan-assembler-times "fmul\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+" 2 } } */
 
-/* vmul_laneq_f64.
-   vmulq_lane_f64.
+/* vmulq_lane_f64.
    vmulq_laneq_f64.  */
-/* { dg-final { scan-assembler-times "fmul\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 3 } } */
+/* { dg-final { scan-assembler-times "fmul\\tv\[0-9\]+\.2d, v\[0-9\]+\.2d, v\[0-9\]+\.d\\\[\[0-9\]+\\\]" 2 } } */
 
 
diff --git a/gcc/testsuite/gcc.target/aarch64/mla_intrinsic_1.c b/gcc/testsuite/gcc.target/aarch64/mla_intrinsic_1.c
index 46b3c78c131..885bfb39b79 100644
--- a/gcc/testsuite/gcc.target/aarch64/mla_intrinsic_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/mla_intrinsic_1.c
@@ -11,6 +11,7 @@ extern void abort (void);
 
 #define TEST_VMLA(q, su, size, in1_lanes, in2_lanes)		\
 static void							\
+__attribute__((noipa,noinline))					\
 test_vmlaq_lane##q##_##su##size (MAP##su (size, ) * res,	\
 				 const MAP##su(size, ) *in1,	\
 				 const MAP##su(size, ) *in2)	\
diff --git a/gcc/testsuite/gcc.target/aarch64/mls_intrinsic_1.c b/gcc/testsuite/gcc.target/aarch64/mls_intrinsic_1.c
index e01a4f6d0e1..df046ce32c0 100644
--- a/gcc/testsuite/gcc.target/aarch64/mls_intrinsic_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/mls_intrinsic_1.c
@@ -11,6 +11,7 @@ extern void abort (void);
 
 #define TEST_VMLS(q, su, size, in1_lanes, in2_lanes)		\
 static void							\
+__attribute__((noipa,noinline))					\
 test_vmlsq_lane##q##_##su##size (MAP##su (size, ) * res,	\
 				 const MAP##su(size, ) *in1,	\
 				 const MAP##su(size, ) *in2)	\
diff --git a/gcc/testsuite/gcc.target/aarch64/mul_intrinsic_1.c b/gcc/testsuite/gcc.target/aarch64/mul_intrinsic_1.c
index 00ef4f2de6c..517b937f3e1 100644
--- a/gcc/testsuite/gcc.target/aarch64/mul_intrinsic_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/mul_intrinsic_1.c
@@ -11,6 +11,7 @@ extern void abort (void);
 
 #define TEST_VMUL(q, su, size, in1_lanes, in2_lanes)		\
 static void							\
+__attribute__((noipa,noinline))					\
 test_vmulq_lane##q##_##su##size (MAP##su (size, ) * res,	\
 				 const MAP##su(size, ) *in1,	\
 				 const MAP##su(size, ) *in2)	\
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c
index a1faefd88ba..ffa391aeae1 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmul_elem_1.c
@@ -146,12 +146,14 @@ check_v2sf (float32_t elemA, float32_t elemB)
 
   vst1_f32 (vec32x2_res, vmul_n_f32 (vec32x2_src, elemA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 2; indx++)
     if (* (uint32_t *) &vec32x2_res[indx] != * (uint32_t *) &expected2_1[indx])
       abort ();
 
   vst1_f32 (vec32x2_res, vmul_n_f32 (vec32x2_src, elemB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 2; indx++)
     if (* (uint32_t *) &vec32x2_res[indx] != * (uint32_t *) &expected2_2[indx])
       abort ();
@@ -169,24 +171,28 @@ check_v4sf (float32_t elemA, float32_t elemB, float32_t elemC, float32_t elemD)
 
   vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_1[indx])
       abort ();
 
   vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_2[indx])
       abort ();
 
   vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemC));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_3[indx])
       abort ();
 
   vst1q_f32 (vec32x4_res, vmulq_n_f32 (vec32x4_src, elemD));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (* (uint32_t *) &vec32x4_res[indx] != * (uint32_t *) &expected4_4[indx])
       abort ();
@@ -204,12 +210,14 @@ check_v2df (float64_t elemdC, float64_t elemdD)
 
   vst1q_f64 (vec64x2_res, vmulq_n_f64 (vec64x2_src, elemdC));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 2; indx++)
     if (* (uint64_t *) &vec64x2_res[indx] != * (uint64_t *) &expectedd2_1[indx])
       abort ();
 
   vst1q_f64 (vec64x2_res, vmulq_n_f64 (vec64x2_src, elemdD));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 2; indx++)
     if (* (uint64_t *) &vec64x2_res[indx] != * (uint64_t *) &expectedd2_2[indx])
       abort ();
@@ -227,12 +235,14 @@ check_v2si (int32_t elemsA, int32_t elemsB)
 
   vst1_s32 (vecs32x2_res, vmul_n_s32 (vecs32x2_src, elemsA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 2; indx++)
     if (vecs32x2_res[indx] != expecteds2_1[indx])
       abort ();
 
   vst1_s32 (vecs32x2_res, vmul_n_s32 (vecs32x2_src, elemsB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 2; indx++)
     if (vecs32x2_res[indx] != expecteds2_2[indx])
       abort ();
@@ -248,12 +258,14 @@ check_v2si_unsigned (uint32_t elemusA, uint32_t elemusB)
 
   vst1_u32 (vecus32x2_res, vmul_n_u32 (vecus32x2_src, elemusA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 2; indx++)
     if (vecus32x2_res[indx] != expectedus2_1[indx])
       abort ();
 
   vst1_u32 (vecus32x2_res, vmul_n_u32 (vecus32x2_src, elemusB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 2; indx++)
     if (vecus32x2_res[indx] != expectedus2_2[indx])
       abort ();
@@ -271,24 +283,28 @@ check_v4si (int32_t elemsA, int32_t elemsB, int32_t elemsC, int32_t elemsD)
 
   vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecs32x4_res[indx] != expecteds4_1[indx])
       abort ();
 
   vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecs32x4_res[indx] != expecteds4_2[indx])
       abort ();
 
   vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsC));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecs32x4_res[indx] != expecteds4_3[indx])
       abort ();
 
   vst1q_s32 (vecs32x4_res, vmulq_n_s32 (vecs32x4_src, elemsD));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecs32x4_res[indx] != expecteds4_4[indx])
       abort ();
@@ -305,24 +321,28 @@ check_v4si_unsigned (uint32_t elemusA, uint32_t elemusB, uint32_t elemusC,
 
   vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecus32x4_res[indx] != expectedus4_1[indx])
       abort ();
 
   vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecus32x4_res[indx] != expectedus4_2[indx])
       abort ();
 
   vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusC));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecus32x4_res[indx] != expectedus4_3[indx])
       abort ();
 
   vst1q_u32 (vecus32x4_res, vmulq_n_u32 (vecus32x4_src, elemusD));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecus32x4_res[indx] != expectedus4_4[indx])
       abort ();
@@ -341,24 +361,28 @@ check_v4hi (int16_t elemhA, int16_t elemhB, int16_t elemhC, int16_t elemhD)
 
   vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vech16x4_res[indx] != expectedh4_1[indx])
       abort ();
 
   vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vech16x4_res[indx] != expectedh4_2[indx])
       abort ();
 
   vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhC));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vech16x4_res[indx] != expectedh4_3[indx])
       abort ();
 
   vst1_s16 (vech16x4_res, vmul_n_s16 (vech16x4_src, elemhD));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vech16x4_res[indx] != expectedh4_4[indx])
       abort ();
@@ -375,24 +399,28 @@ check_v4hi_unsigned (uint16_t elemuhA, uint16_t elemuhB, uint16_t elemuhC,
 
   vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecuh16x4_res[indx] != expecteduh4_1[indx])
       abort ();
 
   vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecuh16x4_res[indx] != expecteduh4_2[indx])
       abort ();
 
   vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhC));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecuh16x4_res[indx] != expecteduh4_3[indx])
       abort ();
 
   vst1_u16 (vecuh16x4_res, vmul_n_u16 (vecuh16x4_src, elemuhD));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 4; indx++)
     if (vecuh16x4_res[indx] != expecteduh4_4[indx])
       abort ();
@@ -411,48 +439,56 @@ check_v8hi (int16_t elemhA, int16_t elemhB, int16_t elemhC, int16_t elemhD,
 
   vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vech16x8_res[indx] != expectedh8_1[indx])
       abort ();
 
   vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vech16x8_res[indx] != expectedh8_2[indx])
       abort ();
 
   vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhC));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vech16x8_res[indx] != expectedh8_3[indx])
       abort ();
 
   vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhD));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vech16x8_res[indx] != expectedh8_4[indx])
       abort ();
 
   vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhE));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vech16x8_res[indx] != expectedh8_5[indx])
       abort ();
 
   vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhF));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vech16x8_res[indx] != expectedh8_6[indx])
       abort ();
 
   vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhG));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vech16x8_res[indx] != expectedh8_7[indx])
       abort ();
 
   vst1q_s16 (vech16x8_res, vmulq_n_s16 (vech16x8_src, elemhH));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vech16x8_res[indx] != expectedh8_8[indx])
       abort ();
@@ -470,48 +506,56 @@ check_v8hi_unsigned (uint16_t elemuhA, uint16_t elemuhB, uint16_t elemuhC,
 
   vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhA));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vecuh16x8_res[indx] != expecteduh8_1[indx])
       abort ();
 
   vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhB));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vecuh16x8_res[indx] != expecteduh8_2[indx])
       abort ();
 
   vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhC));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vecuh16x8_res[indx] != expecteduh8_3[indx])
       abort ();
 
   vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhD));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vecuh16x8_res[indx] != expecteduh8_4[indx])
       abort ();
 
   vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhE));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vecuh16x8_res[indx] != expecteduh8_5[indx])
       abort ();
 
   vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhF));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vecuh16x8_res[indx] != expecteduh8_6[indx])
       abort ();
 
   vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhG));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vecuh16x8_res[indx] != expecteduh8_7[indx])
       abort ();
 
   vst1q_u16 (vecuh16x8_res, vmulq_n_u16 (vecuh16x8_src, elemuhH));
 
+  asm volatile ("" : : : "memory");
   for (indx = 0; indx < 8; indx++)
     if (vecuh16x8_res[indx] != expecteduh8_8[indx])
       abort ();
diff --git a/gcc/testsuite/gcc.target/aarch64/vclz.c b/gcc/testsuite/gcc.target/aarch64/vclz.c
index a36ee44fc16..ca4d17426e6 100644
--- a/gcc/testsuite/gcc.target/aarch64/vclz.c
+++ b/gcc/testsuite/gcc.target/aarch64/vclz.c
@@ -66,22 +66,62 @@ extern void abort (void);
 #define CLZ_INST(reg_len, data_len, is_signed) \
   CONCAT1 (vclz, POSTFIX (reg_len, data_len, is_signed))
 
-#define RUN_TEST(test_set, answ_set, reg_len, data_len, is_signed, n)	\
-  INHIB_OPTIMIZATION;							\
-  a = LOAD_INST (reg_len, data_len, is_signed) (test_set);		\
-  b = LOAD_INST (reg_len, data_len, is_signed) (answ_set);	        \
-  a = CLZ_INST (reg_len, data_len, is_signed) (a);			\
-  for (i = 0; i < n; i++)						\
-    if (a [i] != b [i])							\
-      return 1;
+#define BUILD_TEST(type, size, lanes)			    \
+int __attribute__((noipa,noinline))			    \
+run_test##type##size##x##lanes (int##size##_t* test_set,    \
+				int##size##_t* answ_set,    \
+				int reg_len, int data_len,  \
+				int n)			    \
+{							    \
+  int i;						    \
+  INHIB_OPTIMIZATION;					    \
+  int##size##x##lanes##_t a = vld1##type##size (test_set);  \
+  int##size##x##lanes##_t b = vld1##type##size (answ_set);  \
+  a = vclz##type##size (a);				    \
+  for (i = 0; i < n; i++){				    \
+    if (a [i] != b [i])					    \
+      return 1;						    \
+  }							    \
+  return 0;						    \
+}
+
+/* unsigned inputs  */
+#define U_BUILD_TEST(type, size, lanes)			    \
+int __attribute__((noipa,noinline))			    \
+run_test##type##size##x##lanes (uint##size##_t* test_set,   \
+				uint##size##_t* answ_set,   \
+				int reg_len, int data_len,  \
+				int n)	                    \
+{							    \
+  int i;						    \
+  INHIB_OPTIMIZATION;					    \
+  uint##size##x##lanes##_t a = vld1##type##size (test_set); \
+  uint##size##x##lanes##_t b = vld1##type##size (answ_set); \
+  a = vclz##type##size (a);				    \
+  for (i = 0; i < n; i++){				    \
+    if (a [i] != b [i])					    \
+      return 1;						    \
+  }							    \
+  return 0;						    \
+}
+
+BUILD_TEST (_s, 8, 8)
+BUILD_TEST (_s, 16, 4)
+BUILD_TEST (_s, 32, 2)
+BUILD_TEST (q_s, 8, 16)
+BUILD_TEST (q_s, 16, 8)
+BUILD_TEST (q_s, 32, 4)
+
+U_BUILD_TEST (_u, 8, 8)
+U_BUILD_TEST (_u, 16, 4)
+U_BUILD_TEST (_u, 32, 2)
+U_BUILD_TEST (q_u, 8, 16)
+U_BUILD_TEST (q_u, 16, 8)
+U_BUILD_TEST (q_u, 32, 4)
 
 int __attribute__ ((noinline))
 test_vclz_s8 ()
 {
-  int i;
-  int8x8_t a;
-  int8x8_t b;
-
   int8_t test_set0[8] = {
     TEST0, TEST1, TEST2, TEST3,
     TEST4, TEST5, TEST6, TEST7
@@ -98,22 +138,18 @@ test_vclz_s8 ()
     0, 0, 0, 0,
     0, 0, 0, 0
   };
-  RUN_TEST (test_set0, answ_set0, 64, 8, 1, 8);
-  RUN_TEST (test_set1, answ_set1, 64, 8, 1, 1);
+  int o1 = run_test_s8x8 (test_set0, answ_set0, 64, 8, 8);
+  int o2 = run_test_s8x8 (test_set1, answ_set1, 64, 8, 1);
 
-  return 0;
+  return o1||o2;
 }
 
 /* Double scan-assembler-times to take account of unsigned functions.  */
-/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.8b, v\[0-9\]+\.8b" 4 } } */
+/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.8b, v\[0-9\]+\.8b" 2 } } */
 
 int __attribute__ ((noinline))
 test_vclz_s16 ()
 {
-  int i;
-  int16x4_t a;
-  int16x4_t b;
-
   int16_t test_set0[4] = { TEST0, TEST1, TEST2, TEST3 };
   int16_t test_set1[4] = { TEST4, TEST5, TEST6, TEST7 };
   int16_t test_set2[4] = { TEST8, TEST9, TEST10, TEST11 };
@@ -126,25 +162,21 @@ test_vclz_s16 ()
   int16_t answ_set3[4] = { 4, 3, 2, 1 };
   int16_t answ_set4[4] = { 0, 0, 0, 0 };
 
-  RUN_TEST (test_set0, answ_set0, 64, 16, 1, 4);
-  RUN_TEST (test_set1, answ_set1, 64, 16, 1, 4);
-  RUN_TEST (test_set2, answ_set2, 64, 16, 1, 4);
-  RUN_TEST (test_set3, answ_set3, 64, 16, 1, 4);
-  RUN_TEST (test_set4, answ_set4, 64, 16, 1, 1);
+  int o1 = run_test_s16x4 (test_set0, answ_set0, 64, 16, 4);
+  int o2 = run_test_s16x4 (test_set1, answ_set1, 64, 16, 4);
+  int o3 = run_test_s16x4 (test_set2, answ_set2, 64, 16, 4);
+  int o4 = run_test_s16x4 (test_set3, answ_set3, 64, 16, 4);
+  int o5 = run_test_s16x4 (test_set4, answ_set4, 64, 16, 1);
 
-  return 0;
+  return o1||o2||o3||o4||o5;
 }
 
 /* Double scan-assembler-times to take account of unsigned functions.  */
-/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.4h, v\[0-9\]+\.4h" 10} } */
+/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.4h, v\[0-9\]+\.4h" 2} } */
 
 int __attribute__ ((noinline))
 test_vclz_s32 ()
 {
-  int i;
-  int32x2_t a;
-  int32x2_t b;
-
   int32_t test_set0[2] = { TEST0, TEST1 };
   int32_t test_set1[2] = { TEST2, TEST3 };
   int32_t test_set2[2] = { TEST4, TEST5 };
@@ -181,37 +213,34 @@ test_vclz_s32 ()
   int32_t answ_set15[2] = { 2, 1 };
   int32_t answ_set16[2] = { 0, 0 };
 
-  RUN_TEST (test_set0, answ_set0, 64, 32, 1, 2);
-  RUN_TEST (test_set1, answ_set1, 64, 32, 1, 2);
-  RUN_TEST (test_set2, answ_set2, 64, 32, 1, 2);
-  RUN_TEST (test_set3, answ_set3, 64, 32, 1, 2);
-  RUN_TEST (test_set4, answ_set4, 64, 32, 1, 2);
-  RUN_TEST (test_set5, answ_set5, 64, 32, 1, 2);
-  RUN_TEST (test_set6, answ_set6, 64, 32, 1, 2);
-  RUN_TEST (test_set7, answ_set7, 64, 32, 1, 2);
-  RUN_TEST (test_set8, answ_set8, 64, 32, 1, 2);
-  RUN_TEST (test_set9, answ_set9, 64, 32, 1, 2);
-  RUN_TEST (test_set10, answ_set10, 64, 32, 1, 2);
-  RUN_TEST (test_set11, answ_set11, 64, 32, 1, 2);
-  RUN_TEST (test_set12, answ_set12, 64, 32, 1, 2);
-  RUN_TEST (test_set13, answ_set13, 64, 32, 1, 2);
-  RUN_TEST (test_set14, answ_set14, 64, 32, 1, 2);
-  RUN_TEST (test_set15, answ_set15, 64, 32, 1, 2);
-  RUN_TEST (test_set16, answ_set16, 64, 32, 1, 1);
-
-  return 0;
+  int o1 = run_test_s32x2 (test_set0, answ_set0, 64, 32, 2);
+  int o2 = run_test_s32x2 (test_set1, answ_set1, 64, 32, 2);
+  int o3 = run_test_s32x2 (test_set2, answ_set2, 64, 32, 2);
+  int o4 = run_test_s32x2 (test_set3, answ_set3, 64, 32, 2);
+  int o5 = run_test_s32x2 (test_set4, answ_set4, 64, 32, 2);
+  int o6 = run_test_s32x2 (test_set5, answ_set5, 64, 32, 2);
+  int o7 = run_test_s32x2 (test_set6, answ_set6, 64, 32, 2);
+  int o8 = run_test_s32x2 (test_set7, answ_set7, 64, 32, 2);
+  int o9 = run_test_s32x2 (test_set8, answ_set8, 64, 32, 2);
+  int o10 = run_test_s32x2 (test_set9, answ_set9, 64, 32, 2);
+  int o11 = run_test_s32x2 (test_set10, answ_set10, 64, 32, 2);
+  int o12 = run_test_s32x2 (test_set11, answ_set11, 64, 32, 2);
+  int o13 = run_test_s32x2 (test_set12, answ_set12, 64, 32, 2);
+  int o14 = run_test_s32x2 (test_set13, answ_set13, 64, 32, 2);
+  int o15 = run_test_s32x2 (test_set14, answ_set14, 64, 32, 2);
+  int o16 = run_test_s32x2 (test_set15, answ_set15, 64, 32, 2);
+  int o17 = run_test_s32x2 (test_set16, answ_set16, 64, 32, 1);
+
+  return o1||o2||o3||o4||o5||o6||o7||o8||o9||o10||o11||o12||o13||o14
+    ||o15||o16||o17;
 }
 
 /* Double scan-assembler-times to take account of unsigned functions.  */
-/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.2s, v\[0-9\]+\.2s" 34 } } */
+/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.2s, v\[0-9\]+\.2s"  2 } } */
 
 int __attribute__ ((noinline))
 test_vclzq_s8 ()
 {
-  int i;
-  int8x16_t a;
-  int8x16_t b;
-
   int8_t test_set0[16] = {
     TEST0, TEST1, TEST2, TEST3, TEST4, TEST5, TEST6, TEST7,
     TEST8, TEST8, TEST8, TEST8, TEST8, TEST8, TEST8, TEST8
@@ -219,8 +248,8 @@ test_vclzq_s8 ()
   int8_t answ_set0[16] = {
     8, 7, 6, 5, 4, 3, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0
   };
-  RUN_TEST (test_set0, answ_set0, 128, 8, 1, 9);
-  return 0;
+  int o1 = run_testq_s8x16 (test_set0, answ_set0, 128, 8, 9);
+  return o1;
 }
 
 /* Double scan-assembler-times to take account of unsigned functions.  */
@@ -229,10 +258,6 @@ test_vclzq_s8 ()
 int __attribute__ ((noinline))
 test_vclzq_s16 ()
 {
-  int i;
-  int16x8_t a;
-  int16x8_t b;
-
   int16_t test_set0[8] = {
     TEST0, TEST1, TEST2, TEST3, TEST4, TEST5, TEST6, TEST7
   };
@@ -252,23 +277,19 @@ test_vclzq_s16 ()
   int16_t answ_set2[8] = {
     0, 0, 0, 0, 0, 0, 0, 0
   };
-  RUN_TEST (test_set0, answ_set0, 128, 16, 1, 8);
-  RUN_TEST (test_set1, answ_set1, 128, 16, 1, 8);
-  RUN_TEST (test_set2, answ_set2, 128, 16, 1, 1);
+  int o1 = run_testq_s16x8 (test_set0, answ_set0, 128, 16, 8);
+  int o2 = run_testq_s16x8 (test_set1, answ_set1, 128, 16, 8);
+  int o3 = run_testq_s16x8 (test_set2, answ_set2, 128, 16, 1);
 
-  return 0;
+  return o1||o2||o3;
 }
 
 /* Double scan-assembler-times to take account of unsigned functions.  */
-/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.8h, v\[0-9\]+\.8h" 6 } } */
+/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.8h, v\[0-9\]+\.8h" 2 } } */
 
 int __attribute__ ((noinline))
 test_vclzq_s32 ()
 {
-  int i;
-  int32x4_t a;
-  int32x4_t b;
-
   int32_t test_set0[4] = { TEST0, TEST1, TEST2, TEST3 };
   int32_t test_set1[4] = { TEST4, TEST5, TEST6, TEST7 };
   int32_t test_set2[4] = { TEST8, TEST9, TEST10, TEST11 };
@@ -289,27 +310,23 @@ test_vclzq_s32 ()
   int32_t answ_set7[4] = { 4, 3, 2, 1 };
   int32_t answ_set8[4] = { 0, 0, 0, 0 };
 
-  RUN_TEST (test_set0, answ_set0, 128, 32, 1, 4);
-  RUN_TEST (test_set1, answ_set1, 128, 32, 1, 4);
-  RUN_TEST (test_set2, answ_set2, 128, 32, 1, 4);
-  RUN_TEST (test_set3, answ_set3, 128, 32, 1, 4);
-  RUN_TEST (test_set4, answ_set4, 128, 32, 1, 1);
+  int o1 = run_testq_s32x4 (test_set0, answ_set0, 128, 32, 4);
+  int o2 = run_testq_s32x4 (test_set1, answ_set1, 128, 32, 4);
+  int o3 = run_testq_s32x4 (test_set2, answ_set2, 128, 32, 4);
+  int o4 = run_testq_s32x4 (test_set3, answ_set3, 128, 32, 4);
+  int o5 = run_testq_s32x4 (test_set4, answ_set4, 128, 32, 1);
 
-  return 0;
+  return o1||o2||o3||o4||o5;
 }
 
 /* Double scan-assembler-times to take account of unsigned functions.  */
-/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s" 10 } } */
+/* { dg-final { scan-assembler-times "clz\\tv\[0-9\]+\.4s, v\[0-9\]+\.4s" 2 } } */
 
 /* Unsigned versions.  */
 
 int __attribute__ ((noinline))
 test_vclz_u8 ()
 {
-  int i;
-  uint8x8_t a;
-  uint8x8_t b;
-
   uint8_t test_set0[8] = {
     TEST0, TEST1, TEST2, TEST3, TEST4, TEST5, TEST6, TEST7
   };
@@ -323,10 +340,10 @@ test_vclz_u8 ()
     0, 0, 0, 0, 0, 0, 0, 0
   };
 
-  RUN_TEST (test_set0, answ_set0, 64, 8, 0, 8);
-  RUN_TEST (test_set1, answ_set1, 64, 8, 0, 1);
+  int o1 = run_test_u8x8 (test_set0, answ_set0, 64, 8, 8);
+  int o2 = run_test_u8x8 (test_set1, answ_set1, 64, 8, 1);
 
-  return 0;
+  return o1||o2;
 }
 
 /* ASM scan near test for signed version.  */
@@ -334,10 +351,6 @@ test_vclz_u8 ()
 int __attribute__ ((noinline))
 test_vclz_u16 ()
 {
-  int i;
-  uint16x4_t a;
-  uint16x4_t b;
-
   uint16_t test_set0[4] = { TEST0, TEST1, TEST2, TEST3 };
   uint16_t test_set1[4] = { TEST4, TEST5, TEST6, TEST7 };
   uint16_t test_set2[4] = { TEST8, TEST9, TEST10, TEST11 };
@@ -350,13 +363,13 @@ test_vclz_u16 ()
   uint16_t answ_set3[4] = { 4, 3, 2, 1 };
   uint16_t answ_set4[4] = { 0, 0, 0, 0 };
 
-  RUN_TEST (test_set0, answ_set0, 64, 16, 0, 4);
-  RUN_TEST (test_set1, answ_set1, 64, 16, 0, 4);
-  RUN_TEST (test_set2, answ_set2, 64, 16, 0, 4);
-  RUN_TEST (test_set3, answ_set3, 64, 16, 0, 4);
-  RUN_TEST (test_set4, answ_set4, 64, 16, 0, 1);
+  int o1 = run_test_u16x4 (test_set0, answ_set0, 64, 16, 4);
+  int o2 = run_test_u16x4 (test_set1, answ_set1, 64, 16, 4);
+  int o3 = run_test_u16x4 (test_set2, answ_set2, 64, 16, 4);
+  int o4 = run_test_u16x4 (test_set3, answ_set3, 64, 16, 4);
+  int o5 = run_test_u16x4 (test_set4, answ_set4, 64, 16, 1);
 
-  return 0;
+  return o1||o2||o3||o4||o5;
 }
 
 /* ASM scan near test for signed version.  */
@@ -364,10 +377,6 @@ test_vclz_u16 ()
 int __attribute__ ((noinline))
 test_vclz_u32 ()
 {
-  int i;
-  uint32x2_t a;
-  uint32x2_t b;
-
   uint32_t test_set0[2] = { TEST0, TEST1 };
   uint32_t test_set1[2] = { TEST2, TEST3 };
   uint32_t test_set2[2] = { TEST4, TEST5 };
@@ -404,25 +413,26 @@ test_vclz_u32 ()
   uint32_t answ_set15[2] = { 2, 1 };
   uint32_t answ_set16[2] = { 0, 0 };
 
-  RUN_TEST (test_set0, answ_set0, 64, 32, 0, 2);
-  RUN_TEST (test_set1, answ_set1, 64, 32, 0, 2);
-  RUN_TEST (test_set2, answ_set2, 64, 32, 0, 2);
-  RUN_TEST (test_set3, answ_set3, 64, 32, 0, 2);
-  RUN_TEST (test_set4, answ_set4, 64, 32, 0, 2);
-  RUN_TEST (test_set5, answ_set5, 64, 32, 0, 2);
</cut>
>From hjl@sc.intel.com  Thu Oct 21 23:03:12 2021
Return-Path: <hjl@sc.intel.com>
X-Original-To: gcc-regression@gcc.gnu.org
Delivered-To: gcc-regression@gcc.gnu.org
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by sourceware.org (Postfix) with ESMTPS id 615673858416
 for <gcc-regression@gcc.gnu.org>; Thu, 21 Oct 2021 23:03:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 615673858416
X-IronPort-AV: E=McAfee;i="6200,9189,10144"; a="229119021"
X-IronPort-AV: E=Sophos;i="5.87,170,1631602800"; d="scan'208";a="229119021"
Received: from orsmga006.jf.intel.com ([10.7.209.51])
 by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 Oct 2021 16:00:53 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.87,170,1631602800"; d="scan'208";a="445032185"
Received: from scymds02.sc.intel.com ([10.82.73.244])
 by orsmga006.jf.intel.com with ESMTP; 21 Oct 2021 16:00:53 -0700
Received: from gnu-34.sc.intel.com (gnu-34.sc.intel.com [172.25.70.212])
 by scymds02.sc.intel.com with ESMTP id 19LN0r6j025162;
 Thu, 21 Oct 2021 16:00:53 -0700
Received: by gnu-34.sc.intel.com (Postfix, from userid 1000)
 id 37930638D8; Thu, 21 Oct 2021 16:00:53 -0700 (PDT)
Date: Thu, 21 Oct 2021 16:00:53 -0700
To: skpgkp2@gmail.com, hjl.tools@gmail.com, gcc-regression@gcc.gnu.org
Subject: Regressions on master at commit r12-4617 vs commit r12-4611 on
 Linux/x86_64
User-Agent: Heirloom mailx 12.5 7/5/10
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <20211021230053.37930638D8@gnu-34.sc.intel.com>
From: "H.J. Lu" <hjl@sc.intel.com>
X-Spam-Status: No, score=-3470.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT, KAM_SHORT, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-regression@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-regression mailing list <gcc-regression.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-regression/>
List-Post: <mailto:gcc-regression@gcc.gnu.org>
List-Help: <mailto:gcc-regression-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-regression>,
 <mailto:gcc-regression-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Oct 2021 23:03:12 -0000

New failures:

New passes:
FAIL: gcc.dg/asan/pr78832.c   -O1  (test for excess errors)
FAIL: gcc.dg/asan/pr78832.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/asan/pr78832.c   -O2  (test for excess errors)
FAIL: gcc.dg/asan/pr78832.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/asan/pr78832.c   -Os  (test for excess errors)
FAIL: gcc.dg/pr45055.c (test for excess errors)
FAIL: gcc.dg/pr45105.c (test for excess errors)
FAIL: gcc.dg/pr45865.c (test for excess errors)
FAIL: gcc.dg/torture/pr48343.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/pr48343.c   -Os  (test for excess errors)
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%xmm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/avx512fp16-13.c scan-assembler-times vmovdqa64[ \\t]+[^{\n]*%ymm[0-9]+[^\n]*\\) 1
FAIL: gcc.target/i386/pr57106.c (test for excess errors)


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-10-21 22:54 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21 22:54 [TCWG CI] Regression caused by gcc: [Patch][GCC][AArch64] - Lower store and load neon builtins to gimple ci_notify

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).