From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 48243 invoked by alias); 25 May 2018 11:09:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 48224 invoked by uid 89); 25 May 2018 11:09:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f46.google.com Received: from mail-wm0-f46.google.com (HELO mail-wm0-f46.google.com) (74.125.82.46) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 25 May 2018 11:09:31 +0000 Received: by mail-wm0-f46.google.com with SMTP id a67-v6so13265764wmf.3 for ; Fri, 25 May 2018 04:09:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=yZlnq4XDZcwS7pIzkLreH5bfHMVphC/BCno2usWufY0=; b=aYM8OB6zXatn5fHA91C16XjTmEk9D4Ardk+bV5TA0ueVPXrnQZ8XqwZ7TY0B8jObb1 yJlzxecbpoR63X6tdgY2CPJqN4CrFRT6CRis7whGTfzvHtAFMu0tVPX4SgwoDsys0An0 TRmZYkfS22sX/P4D0KC4uHHGCTvoVfE1fGgV7+ign6bzG5aOR9DJm+SNIbuu/6Zr8S2W uMQosN7QifZ3qIJtk923jYO0AOTAwZBaMScCAPQvGmhHEgLqlS+NBGoKqXyS+lbTHKXx v4OzfrMXxp9XWmDGF9hacTjhenq8vHLRn9LgYuYZ6OvadmAY49x6G0WaswLXID4Bd3BH rzRw== X-Gm-Message-State: ALKqPwe7XSceV4XwyycfnUXeLkcPQJ2AyvAgr9MBfuHD9778x1U5rsqN ZZkZzxqkp16ZWb3i2YIMR/ASSM9gl1uWUbv8wg5nyg== X-Google-Smtp-Source: ADUXVKJzq/vB5J+QDYviPi7H5R43jJZNHKWO8LEXV8tMzaF/1SQxtiyjZUuxzor4g4gV/6YAxKDiPp0NY9FADWovB/o= X-Received: by 2002:a2e:808a:: with SMTP id i10-v6mr1381420ljg.67.1527246568897; Fri, 25 May 2018 04:09:28 -0700 (PDT) MIME-Version: 1.0 References: <87muwzoqd3.fsf@linaro.org> <87sh6gauw8.fsf@linaro.org> In-Reply-To: <87sh6gauw8.fsf@linaro.org> From: Richard Biener Date: Fri, 25 May 2018 11:09:00 -0000 Message-ID: Subject: Re: Implement SLP of internal functions To: GCC Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-05/txt/msg01493.txt.bz2 On Fri, May 25, 2018 at 12:31 PM Richard Sandiford < richard.sandiford@linaro.org> wrote: > Richard Biener writes: > >> Index: gcc/tree-vect-slp.c > >> =================================================================== > >> --- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100 > >> +++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100 > >> @@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v > >> return 0; > >> } > > > >> +/* Return true if call statements CALL1 and CALL2 are similar enough > >> + to be combined into the same SLP group. */ > >> + > >> +static bool > >> +compatible_calls_p (gcall *call1, gcall *call2) > >> +{ > >> + unsigned int nargs = gimple_call_num_args (call1); > >> + if (nargs != gimple_call_num_args (call2)) > >> + return false; > >> + > >> + if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2)) > >> + return false; > >> + > >> + if (gimple_call_internal_p (call1)) > >> + { > >> + if (TREE_TYPE (gimple_call_lhs (call1)) > >> + != TREE_TYPE (gimple_call_lhs (call2))) > >> + return false; > >> + for (unsigned int i = 0; i < nargs; ++i) > >> + if (TREE_TYPE (gimple_call_arg (call1, i)) > >> + != TREE_TYPE (gimple_call_arg (call2, i))) > > > > Please use types_compatible_p in these two type comparisons. > OK. > > Can you please add a generic vect_call_sqrtf to the main vectorizer > > testsuite? In fact I already see > > gcc.dg/vect/fast-math-bb-slp-call-1.c. Does that mean SQRT does never > > appear as internal function before vectorization? > Yeah, sqrt vectorisation is scalar built-in -> vector internal function. > But this patch adds a generic type keyed off vect_double_cond_arith. > Would that be OK instead? Yes, that works for me. Thanks, Richard. > Tested as before. > Thanks, > Richard > 2018-05-25 Richard Sandiford > gcc/ > * internal-fn.h (vectorizable_internal_fn_p): New function. > * tree-vect-slp.c (compatible_calls_p): Likewise. > (vect_build_slp_tree_1): Remove nops argument. Handle calls > to internal functions. > (vect_build_slp_tree_2): Update call to vect_build_slp_tree_1. > gcc/testsuite/ > * gcc.dg/vect/vect-cond-arith-6.c: New test. > * gcc.target/aarch64/sve/cond_arith_4.c: Likewise. > * gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise. > * gcc.target/aarch64/sve/cond_arith_5.c: Likewise. > * gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise. > * gcc.target/aarch64/sve/slp_14.c: Likewise. > * gcc.target/aarch64/sve/slp_14_run.c: Likewise. > Index: gcc/internal-fn.h > =================================================================== > --- gcc/internal-fn.h 2018-05-25 11:28:05.953287025 +0100 > +++ gcc/internal-fn.h 2018-05-25 11:28:06.193277781 +0100 > @@ -160,6 +160,17 @@ direct_internal_fn_p (internal_fn fn) > return direct_internal_fn_array[fn].type0 >= -1; > } > +/* Return true if FN is a direct internal function that can be vectorized by > + converting the return type and all argument types to vectors of the same > + number of elements. E.g. we can vectorize an IFN_SQRT on floats as an > + IFN_SQRT on vectors of N floats. */ > + > +inline bool > +vectorizable_internal_fn_p (internal_fn fn) > +{ > + return direct_internal_fn_array[fn].vectorizable; > +} > + > /* Return optab information about internal function FN. Only meaningful > if direct_internal_fn_p (FN). */ > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c 2018-05-25 11:28:05.953287025 +0100 > +++ gcc/tree-vect-slp.c 2018-05-25 11:28:06.195277704 +0100 > @@ -565,6 +565,41 @@ vect_get_and_check_slp_defs (vec_info *v > return 0; > } > +/* Return true if call statements CALL1 and CALL2 are similar enough > + to be combined into the same SLP group. */ > + > +static bool > +compatible_calls_p (gcall *call1, gcall *call2) > +{ > + unsigned int nargs = gimple_call_num_args (call1); > + if (nargs != gimple_call_num_args (call2)) > + return false; > + > + if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2)) > + return false; > + > + if (gimple_call_internal_p (call1)) > + { > + if (!types_compatible_p (TREE_TYPE (gimple_call_lhs (call1)), > + TREE_TYPE (gimple_call_lhs (call2)))) > + return false; > + for (unsigned int i = 0; i < nargs; ++i) > + if (!types_compatible_p (TREE_TYPE (gimple_call_arg (call1, i)), > + TREE_TYPE (gimple_call_arg (call2, i)))) > + return false; > + } > + else > + { > + if (!operand_equal_p (gimple_call_fn (call1), > + gimple_call_fn (call2), 0)) > + return false; > + > + if (gimple_call_fntype (call1) != gimple_call_fntype (call2)) > + return false; > + } > + return true; > +} > + > /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the > caller's attempt to find the vector type in STMT with the narrowest > element type. Return true if VECTYPE is nonnull and if it is valid > @@ -653,8 +688,8 @@ vect_two_operations_perm_ok_p (vec static bool > vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, > vec stmts, unsigned int group_size, > - unsigned nops, poly_uint64 *max_nunits, > - bool *matches, bool *two_operators) > + poly_uint64 *max_nunits, bool *matches, > + bool *two_operators) > { > unsigned int i; > gimple *first_stmt = stmts[0], *stmt = stmts[0]; > @@ -730,7 +765,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, > if (gcall *call_stmt = dyn_cast (stmt)) > { > rhs_code = CALL_EXPR; > - if (gimple_call_internal_p (call_stmt) > + if ((gimple_call_internal_p (call_stmt) > + && (!vectorizable_internal_fn_p > + (gimple_call_internal_fn (call_stmt)))) > || gimple_call_tail_p (call_stmt) > || gimple_call_noreturn_p (call_stmt) > || !gimple_call_nothrow_p (call_stmt) > @@ -876,11 +913,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, > if (rhs_code == CALL_EXPR) > { > gimple *first_stmt = stmts[0]; > - if (gimple_call_num_args (stmt) != nops > - || !operand_equal_p (gimple_call_fn (first_stmt), > - gimple_call_fn (stmt), 0) > - || gimple_call_fntype (first_stmt) > - != gimple_call_fntype (stmt)) > + if (!compatible_calls_p (as_a (first_stmt), > + as_a (stmt))) > { > if (dump_enabled_p ()) > { > @@ -1196,8 +1230,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, > bool two_operators = false; > unsigned char *swap = XALLOCAVEC (unsigned char, group_size); > - if (!vect_build_slp_tree_1 (vinfo, swap, > - stmts, group_size, nops, > + if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size, > &this_max_nunits, matches, &two_operators)) > return NULL; > Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c 2018-05-25 11:28:06.195277704 +0100 > @@ -0,0 +1,62 @@ > +/* { dg-additional-options "-fdump-tree-optimized" } */ > + > +#include "tree-vect.h" > + > +#define N (VECTOR_BITS * 11 / 64 + 4) > + > +#define add(A, B) ((A) + (B)) > +#define sub(A, B) ((A) - (B)) > +#define mul(A, B) ((A) * (B)) > +#define div(A, B) ((A) / (B)) > + > +#define DEF(OP) \ > + void __attribute__ ((noipa)) \ > + f_##OP (double *restrict a, double *restrict b, double x) \ > + { \ > + for (int i = 0; i < N; i += 2) \ > + { \ > + a[i] = b[i] < 100 ? OP (b[i], x) : b[i]; \ > + a[i + 1] = b[i + 1] < 70 ? OP (b[i + 1], x) : b[i + 1]; \ > + } \ > + } > + > +#define TEST(OP) \ > + { \ > + f_##OP (a, b, 10); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + int bval = (i % 17) * 10; \ > + int truev = OP (bval, 10); \ > + if (a[i] != (bval < (i & 1 ? 70 : 100) ? truev : bval)) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +#define FOR_EACH_OP(T) \ > + T (add) \ > + T (sub) \ > + T (mul) \ > + T (div) > + > +FOR_EACH_OP (DEF) > + > +int > +main (void) > +{ > + double a[N], b[N]; > + for (int i = 0; i < N; ++i) > + { > + b[i] = (i % 17) * 10; > + asm volatile ("" ::: "memory"); > + } > + FOR_EACH_OP (TEST) > + return 0; > +} > + > +/* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4 "vect" { target vect_double_cond_arith } } } */ > +/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" { target vect_double_cond_arith } } } */ > +/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" { target vect_double_cond_arith } } } */ > +/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" { target vect_double_cond_arith } } } */ > +/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" { target vect_double_cond_arith } } } */ > +/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target vect_double_cond_arith } } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c 2018-05-25 11:28:06.195277704 +0100 > @@ -0,0 +1,62 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include > + > +#define TEST(TYPE, NAME, OP) \ > + void __attribute__ ((noinline, noclone)) \ > + test_##TYPE##_##NAME (TYPE *__restrict x, \ > + TYPE *__restrict y, \ > + TYPE z1, TYPE z2, \ > + TYPE *__restrict pred, int n) \ > + { \ > + for (int i = 0; i < n; i += 2) \ > + { \ > + x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]); \ > + x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]); \ > + } \ > + } > + > +#define TEST_INT_TYPE(TYPE) \ > + TEST (TYPE, div, /) > + > +#define TEST_FP_TYPE(TYPE) \ > + TEST (TYPE, add, +) \ > + TEST (TYPE, sub, -) \ > + TEST (TYPE, mul, *) \ > + TEST (TYPE, div, /) > + > +#define TEST_ALL \ > + TEST_INT_TYPE (int32_t) \ > + TEST_INT_TYPE (uint32_t) \ > + TEST_INT_TYPE (int64_t) \ > + TEST_INT_TYPE (uint64_t) \ > + TEST_FP_TYPE (float) \ > + TEST_FP_TYPE (double) > + > +TEST_ALL > + > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 12 } } */ > +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 6 } } */ > + > +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 12 } } */ > +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 6 } } */ > + > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c 2018-05-25 11:28:06.195277704 +0100 > @@ -0,0 +1,32 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "cond_arith_4.c" > + > +#define N 98 > + > +#undef TEST > +#define TEST(TYPE, NAME, OP) \ > + { \ > + TYPE x[N], y[N], pred[N], z[2] = { 5, 7 }; \ > + for (int i = 0; i < N; ++i) \ > + { \ > + y[i] = i * i; \ > + pred[i] = i % 3; \ > + } \ > + test_##TYPE##_##NAME (x, y, z[0], z[1], pred, N); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i]; \ > + if (x[i] != expected) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +int > +main (void) > +{ > + TEST_ALL > + return 0; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c 2018-05-25 11:28:06.195277704 +0100 > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ > + > +#include > + > +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP) \ > + void __attribute__ ((noinline, noclone)) \ > + test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (DATA_TYPE *__restrict x, \ > + DATA_TYPE *__restrict y, \ > + DATA_TYPE z1, DATA_TYPE z2, \ > + DATA_TYPE *__restrict pred, \ > + OTHER_TYPE *__restrict foo, \ > + int n) \ > + { \ > + for (int i = 0; i < n; i += 2) \ > + { \ > + x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]); \ > + x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]); \ > + foo[i] += 1; \ > + foo[i + 1] += 2; \ > + } \ > + } > + > +#define TEST_INT_TYPE(DATA_TYPE, OTHER_TYPE) \ > + TEST (DATA_TYPE, OTHER_TYPE, div, /) > + > +#define TEST_FP_TYPE(DATA_TYPE, OTHER_TYPE) \ > + TEST (DATA_TYPE, OTHER_TYPE, add, +) \ > + TEST (DATA_TYPE, OTHER_TYPE, sub, -) \ > + TEST (DATA_TYPE, OTHER_TYPE, mul, *) \ > + TEST (DATA_TYPE, OTHER_TYPE, div, /) > + > +#define TEST_ALL \ > + TEST_INT_TYPE (int32_t, int8_t) \ > + TEST_INT_TYPE (int32_t, int16_t) \ > + TEST_INT_TYPE (uint32_t, int8_t) \ > + TEST_INT_TYPE (uint32_t, int16_t) \ > + TEST_INT_TYPE (int64_t, int8_t) \ > + TEST_INT_TYPE (int64_t, int16_t) \ > + TEST_INT_TYPE (int64_t, int32_t) \ > + TEST_INT_TYPE (uint64_t, int8_t) \ > + TEST_INT_TYPE (uint64_t, int16_t) \ > + TEST_INT_TYPE (uint64_t, int32_t) \ > + TEST_FP_TYPE (float, int8_t) \ > + TEST_FP_TYPE (float, int16_t) \ > + TEST_FP_TYPE (double, int8_t) \ > + TEST_FP_TYPE (double, int16_t) \ > + TEST_FP_TYPE (double, int32_t) > + > +TEST_ALL > + > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* The load XFAILs for fixed-length SVE account for extra loads from the > + constant pool. */ > +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7],} 12 } } */ > + > +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7],} 12 } } */ > + > +/* 72 for x operations, 6 for foo operations. */ > +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 78 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* 36 for x operations, 6 for foo operations. */ > +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 42 } } */ > + > +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 168 } } */ > +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 84 } } */ > + > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c 2018-05-25 11:28:06.195277704 +0100 > @@ -0,0 +1,35 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "cond_arith_5.c" > + > +#define N 98 > + > +#undef TEST > +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP) \ > + { \ > + DATA_TYPE x[N], y[N], pred[N], z[2] = { 5, 7 }; \ > + OTHER_TYPE foo[N]; \ > + for (int i = 0; i < N; ++i) \ > + { \ > + y[i] = i * i; \ > + pred[i] = i % 3; \ > + foo[i] = i * 5; \ > + } \ > + test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (x, y, z[0], z[1], \ > + pred, foo, N); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + DATA_TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i]; \ > + if (x[i] != expected) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +int > +main (void) > +{ > + TEST_ALL > + return 0; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14.c 2018-05-25 11:28:06.195277704 +0100 > @@ -0,0 +1,48 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include > + > +#define VEC_PERM(TYPE) \ > +void __attribute__ ((weak)) \ > +vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n) \ > +{ \ > + for (int i = 0; i < n; ++i) \ > + { \ > + TYPE a1 = a[i * 2]; \ > + TYPE a2 = a[i * 2 + 1]; \ > + TYPE b1 = b[i * 2]; \ > + TYPE b2 = b[i * 2 + 1]; \ > + a[i * 2] = b1 > 1 ? a1 / b1 : a1; \ > + a[i * 2 + 1] = b2 > 2 ? a2 / b2 : a2; \ > + } \ > +} > + > +#define TEST_ALL(T) \ > + T (int32_t) \ > + T (uint32_t) \ > + T (int64_t) \ > + T (uint64_t) \ > + T (float) \ > + T (double) > + > +TEST_ALL (VEC_PERM) > + > +/* The loop should be fully-masked. The load XFAILs for fixed-length > + SVE account for extra loads from the constant pool. */ > +/* { dg-final { scan-assembler-times {\tld1w\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* { dg-final { scan-assembler-times {\tst1w\t} 3 } } */ > +/* { dg-final { scan-assembler-times {\tld1d\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* { dg-final { scan-assembler-times {\tst1d\t} 3 } } */ > +/* { dg-final { scan-assembler-not {\tldr} } } */ > +/* { dg-final { scan-assembler-not {\tstr} } } */ > + > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */ > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */ > + > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s} 1 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s} 1 } } */ > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d} 1 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d} 1 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c 2018-05-25 11:28:06.195277704 +0100 > @@ -0,0 +1,34 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "slp_14.c" > + > +#define N1 (103 * 2) > +#define N2 (111 * 2) > + > +#define HARNESS(TYPE) \ > + { \ > + TYPE a[N2], b[N2]; \ > + for (unsigned int i = 0; i < N2; ++i) \ > + { \ > + a[i] = i * 2 + i % 5; \ > + b[i] = i % 11; \ > + } \ > + vec_slp_##TYPE (a, b, N1 / 2); \ > + for (unsigned int i = 0; i < N2; ++i) \ > + { \ > + TYPE orig_a = i * 2 + i % 5; \ > + TYPE orig_b = i % 11; \ > + TYPE expected_a = orig_a; \ > + if (i < N1 && orig_b > (i & 1 ? 2 : 1)) \ > + expected_a /= orig_b; \ > + if (a[i] != expected_a || b[i] != orig_b) \ > + __builtin_abort (); \ > + } \ > + } > + > +int > +main (void) > +{ > + TEST_ALL (HARNESS) > +}