From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 45357 invoked by alias); 17 May 2018 11:36:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 44567 invoked by uid 89); 17 May 2018 11:36:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=sk:cond_ar, 14.513574219, sk:vect_re, fullymasked X-HELO: mail-lf0-f54.google.com Received: from mail-lf0-f54.google.com (HELO mail-lf0-f54.google.com) (209.85.215.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 17 May 2018 11:36:44 +0000 Received: by mail-lf0-f54.google.com with SMTP id z142-v6so8241078lff.5 for ; Thu, 17 May 2018 04:36:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Jt2UG/n9tuQaWZ7Y3U4059B8dkG69RtgF5niRJw0I2E=; b=eMda19MleaotffABom9kZbkovGF6TGO9hISLoA1d43YeQBXdxTqNTBAD5lWW1d4MdV FGpypQdmGDZwx4cfcEX0BzHjueavfX+vBtAmUQgnaHa15yfkkAajWxbzL6nhBPYp2Vaz XXdBCcTJIyeSiP0uvkSakHi51Ng93hDWBPje1PAJR6Mo5u80YehwBf4atqZK1t2Un4GD G6UUqfaW5dnmqNrUbqRddENNm3UDvhusGVY1q32hIzny2cuvgBYYRw818ix3PQlIRhsh FIaE/DcBadfmOuptvh1Fzy7CsfxvMO4dN90gBQoI73q8kE5DpiZ9Ic7s+uDxpPqSSZ34 3QoQ== X-Gm-Message-State: ALKqPwdXYUqM1UBubsEz8UggAKrB4n3GeAdEqjtm499KY4fmm/D2SPeK AvRVDeIaOwDO3CTdsmuaYZUK1Ysw1D1lUfKKN/R0LEqM X-Google-Smtp-Source: AB8JxZodvYTFDKUnSecFq0zRhLsNRIITqx8tztvY6to20xr/AiTrJBs0voHRjRje2tX5L5X/2wFpEJiZC6HsHBVKMHQ= X-Received: by 2002:a19:9e12:: with SMTP id h18-v6mr17153400lfe.101.1526557001643; Thu, 17 May 2018 04:36:41 -0700 (PDT) MIME-Version: 1.0 References: <87muwzoqd3.fsf@linaro.org> In-Reply-To: <87muwzoqd3.fsf@linaro.org> From: Richard Biener Date: Thu, 17 May 2018 11:42:00 -0000 Message-ID: Subject: Re: Implement SLP of internal functions To: GCC Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-05/txt/msg00820.txt.bz2 On Wed, May 16, 2018 at 12:18 PM Richard Sandiford < richard.sandiford@linaro.org> wrote: > SLP of calls was previously restricted to built-in functions. > This patch extends it to internal functions. > Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf > and x86_64-linux-gnu. OK to install? > Richard > 2018-05-16 Richard Sandiford > gcc/ > * internal-fn.h (vectorizable_internal_fn_p): New function. > * tree-vect-slp.c (compatible_calls_p): Likewise. > (vect_build_slp_tree_1): Remove nops argument. Handle calls > to internal functions. > (vect_build_slp_tree_2): Update call to vect_build_slp_tree_1. > gcc/testsuite/ > * gcc.target/aarch64/sve/cond_arith_4.c: New test. > * gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise. > * gcc.target/aarch64/sve/cond_arith_5.c: Likewise. > * gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise. > * gcc.target/aarch64/sve/slp_14.c: Likewise. > * gcc.target/aarch64/sve/slp_14_run.c: Likewise. > Index: gcc/internal-fn.h > =================================================================== > --- gcc/internal-fn.h 2018-05-16 11:06:14.513574219 +0100 > +++ gcc/internal-fn.h 2018-05-16 11:12:11.872116220 +0100 > @@ -158,6 +158,17 @@ direct_internal_fn_p (internal_fn fn) > return direct_internal_fn_array[fn].type0 >= -1; > } > +/* Return true if FN is a direct internal function that can be vectorized by > + converting the return type and all argument types to vectors of the same > + number of elements. E.g. we can vectorize an IFN_SQRT on floats as an > + IFN_SQRT on vectors of N floats. */ > + > +inline bool > +vectorizable_internal_fn_p (internal_fn fn) > +{ > + return direct_internal_fn_array[fn].vectorizable; > +} > + > /* Return optab information about internal function FN. Only meaningful > if direct_internal_fn_p (FN). */ > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100 > +++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100 > @@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v > return 0; > } > +/* Return true if call statements CALL1 and CALL2 are similar enough > + to be combined into the same SLP group. */ > + > +static bool > +compatible_calls_p (gcall *call1, gcall *call2) > +{ > + unsigned int nargs = gimple_call_num_args (call1); > + if (nargs != gimple_call_num_args (call2)) > + return false; > + > + if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2)) > + return false; > + > + if (gimple_call_internal_p (call1)) > + { > + if (TREE_TYPE (gimple_call_lhs (call1)) > + != TREE_TYPE (gimple_call_lhs (call2))) > + return false; > + for (unsigned int i = 0; i < nargs; ++i) > + if (TREE_TYPE (gimple_call_arg (call1, i)) > + != TREE_TYPE (gimple_call_arg (call2, i))) Please use types_compatible_p in these two type comparisons. Can you please add a generic vect_call_sqrtf to the main vectorizer testsuite? In fact I already see gcc.dg/vect/fast-math-bb-slp-call-1.c. Does that mean SQRT does never appear as internal function before vectorization? OK with that changes. Richard. > + return false; > + } > + else > + { > + if (!operand_equal_p (gimple_call_fn (call1), > + gimple_call_fn (call2), 0)) > + return false; > + > + if (gimple_call_fntype (call1) != gimple_call_fntype (call2)) > + return false; > + } > + return true; > +} > + > /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the > caller's attempt to find the vector type in STMT with the narrowest > element type. Return true if VECTYPE is nonnull and if it is valid > @@ -625,8 +660,8 @@ vect_record_max_nunits (vec_info *vinfo, > static bool > vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, > vec stmts, unsigned int group_size, > - unsigned nops, poly_uint64 *max_nunits, > - bool *matches, bool *two_operators) > + poly_uint64 *max_nunits, bool *matches, > + bool *two_operators) > { > unsigned int i; > gimple *first_stmt = stmts[0], *stmt = stmts[0]; > @@ -698,7 +733,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, > if (gcall *call_stmt = dyn_cast (stmt)) > { > rhs_code = CALL_EXPR; > - if (gimple_call_internal_p (call_stmt) > + if ((gimple_call_internal_p (call_stmt) > + && (!vectorizable_internal_fn_p > + (gimple_call_internal_fn (call_stmt)))) > || gimple_call_tail_p (call_stmt) > || gimple_call_noreturn_p (call_stmt) > || !gimple_call_nothrow_p (call_stmt) > @@ -833,11 +870,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, > if (rhs_code == CALL_EXPR) > { > gimple *first_stmt = stmts[0]; > - if (gimple_call_num_args (stmt) != nops > - || !operand_equal_p (gimple_call_fn (first_stmt), > - gimple_call_fn (stmt), 0) > - || gimple_call_fntype (first_stmt) > - != gimple_call_fntype (stmt)) > + if (!compatible_calls_p (as_a (first_stmt), > + as_a (stmt))) > { > if (dump_enabled_p ()) > { > @@ -1166,8 +1200,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, > bool two_operators = false; > unsigned char *swap = XALLOCAVEC (unsigned char, group_size); > - if (!vect_build_slp_tree_1 (vinfo, swap, > - stmts, group_size, nops, > + if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size, > &this_max_nunits, matches, &two_operators)) > return NULL; > Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c 2018-05-16 11:12:11.872116220 +0100 > @@ -0,0 +1,62 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include > + > +#define TEST(TYPE, NAME, OP) \ > + void __attribute__ ((noinline, noclone)) \ > + test_##TYPE##_##NAME (TYPE *__restrict x, \ > + TYPE *__restrict y, \ > + TYPE z1, TYPE z2, \ > + TYPE *__restrict pred, int n) \ > + { \ > + for (int i = 0; i < n; i += 2) \ > + { \ > + x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]); \ > + x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]); \ > + } \ > + } > + > +#define TEST_INT_TYPE(TYPE) \ > + TEST (TYPE, div, /) > + > +#define TEST_FP_TYPE(TYPE) \ > + TEST (TYPE, add, +) \ > + TEST (TYPE, sub, -) \ > + TEST (TYPE, mul, *) \ > + TEST (TYPE, div, /) > + > +#define TEST_ALL \ > + TEST_INT_TYPE (int32_t) \ > + TEST_INT_TYPE (uint32_t) \ > + TEST_INT_TYPE (int64_t) \ > + TEST_INT_TYPE (uint64_t) \ > + TEST_FP_TYPE (float) \ > + TEST_FP_TYPE (double) > + > +TEST_ALL > + > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 12 } } */ > +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 6 } } */ > + > +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 12 } } */ > +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 6 } } */ > + > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c 2018-05-16 11:12:11.872116220 +0100 > @@ -0,0 +1,32 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "cond_arith_4.c" > + > +#define N 98 > + > +#undef TEST > +#define TEST(TYPE, NAME, OP) \ > + { \ > + TYPE x[N], y[N], pred[N], z[2] = { 5, 7 }; \ > + for (int i = 0; i < N; ++i) \ > + { \ > + y[i] = i * i; \ > + pred[i] = i % 3; \ > + } \ > + test_##TYPE##_##NAME (x, y, z[0], z[1], pred, N); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i]; \ > + if (x[i] != expected) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +int > +main (void) > +{ > + TEST_ALL > + return 0; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c 2018-05-16 11:12:11.872116220 +0100 > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ > + > +#include > + > +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP) \ > + void __attribute__ ((noinline, noclone)) \ > + test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (DATA_TYPE *__restrict x, \ > + DATA_TYPE *__restrict y, \ > + DATA_TYPE z1, DATA_TYPE z2, \ > + DATA_TYPE *__restrict pred, \ > + OTHER_TYPE *__restrict foo, \ > + int n) \ > + { \ > + for (int i = 0; i < n; i += 2) \ > + { \ > + x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]); \ > + x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]); \ > + foo[i] += 1; \ > + foo[i + 1] += 2; \ > + } \ > + } > + > +#define TEST_INT_TYPE(DATA_TYPE, OTHER_TYPE) \ > + TEST (DATA_TYPE, OTHER_TYPE, div, /) > + > +#define TEST_FP_TYPE(DATA_TYPE, OTHER_TYPE) \ > + TEST (DATA_TYPE, OTHER_TYPE, add, +) \ > + TEST (DATA_TYPE, OTHER_TYPE, sub, -) \ > + TEST (DATA_TYPE, OTHER_TYPE, mul, *) \ > + TEST (DATA_TYPE, OTHER_TYPE, div, /) > + > +#define TEST_ALL \ > + TEST_INT_TYPE (int32_t, int8_t) \ > + TEST_INT_TYPE (int32_t, int16_t) \ > + TEST_INT_TYPE (uint32_t, int8_t) \ > + TEST_INT_TYPE (uint32_t, int16_t) \ > + TEST_INT_TYPE (int64_t, int8_t) \ > + TEST_INT_TYPE (int64_t, int16_t) \ > + TEST_INT_TYPE (int64_t, int32_t) \ > + TEST_INT_TYPE (uint64_t, int8_t) \ > + TEST_INT_TYPE (uint64_t, int16_t) \ > + TEST_INT_TYPE (uint64_t, int32_t) \ > + TEST_FP_TYPE (float, int8_t) \ > + TEST_FP_TYPE (float, int16_t) \ > + TEST_FP_TYPE (double, int8_t) \ > + TEST_FP_TYPE (double, int16_t) \ > + TEST_FP_TYPE (double, int32_t) > + > +TEST_ALL > + > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ > + > +/* The load XFAILs for fixed-length SVE account for extra loads from the > + constant pool. */ > +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7],} 12 } } */ > + > +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7],} 12 } } */ > + > +/* 72 for x operations, 6 for foo operations. */ > +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 78 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* 36 for x operations, 6 for foo operations. */ > +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 42 } } */ > + > +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 168 } } */ > +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 84 } } */ > + > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c 2018-05-16 11:12:11.873116180 +0100 > @@ -0,0 +1,35 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "cond_arith_5.c" > + > +#define N 98 > + > +#undef TEST > +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP) \ > + { \ > + DATA_TYPE x[N], y[N], pred[N], z[2] = { 5, 7 }; \ > + OTHER_TYPE foo[N]; \ > + for (int i = 0; i < N; ++i) \ > + { \ > + y[i] = i * i; \ > + pred[i] = i % 3; \ > + foo[i] = i * 5; \ > + } \ > + test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (x, y, z[0], z[1], \ > + pred, foo, N); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + DATA_TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i]; \ > + if (x[i] != expected) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +int > +main (void) > +{ > + TEST_ALL > + return 0; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14.c 2018-05-16 11:12:11.873116180 +0100 > @@ -0,0 +1,48 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include > + > +#define VEC_PERM(TYPE) \ > +void __attribute__ ((weak)) \ > +vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n) \ > +{ \ > + for (int i = 0; i < n; ++i) \ > + { \ > + TYPE a1 = a[i * 2]; \ > + TYPE a2 = a[i * 2 + 1]; \ > + TYPE b1 = b[i * 2]; \ > + TYPE b2 = b[i * 2 + 1]; \ > + a[i * 2] = b1 > 1 ? a1 / b1 : a1; \ > + a[i * 2 + 1] = b2 > 2 ? a2 / b2 : a2; \ > + } \ > +} > + > +#define TEST_ALL(T) \ > + T (int32_t) \ > + T (uint32_t) \ > + T (int64_t) \ > + T (uint64_t) \ > + T (float) \ > + T (double) > + > +TEST_ALL (VEC_PERM) > + > +/* The loop should be fully-masked. The load XFAILs for fixed-length > + SVE account for extra loads from the constant pool. */ > +/* { dg-final { scan-assembler-times {\tld1w\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* { dg-final { scan-assembler-times {\tst1w\t} 3 } } */ > +/* { dg-final { scan-assembler-times {\tld1d\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ > +/* { dg-final { scan-assembler-times {\tst1d\t} 3 } } */ > +/* { dg-final { scan-assembler-not {\tldr} } } */ > +/* { dg-final { scan-assembler-not {\tstr} } } */ > + > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */ > +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */ > + > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s} 1 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s} 1 } } */ > +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d} 1 } } */ > +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d} 1 } } */ > +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d} 1 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c 2018-05-16 11:12:11.873116180 +0100 > @@ -0,0 +1,34 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "slp_14.c" > + > +#define N1 (103 * 2) > +#define N2 (111 * 2) > + > +#define HARNESS(TYPE) \ > + { \ > + TYPE a[N2], b[N2]; \ > + for (unsigned int i = 0; i < N2; ++i) \ > + { \ > + a[i] = i * 2 + i % 5; \ > + b[i] = i % 11; \ > + } \ > + vec_slp_##TYPE (a, b, N1 / 2); \ > + for (unsigned int i = 0; i < N2; ++i) \ > + { \ > + TYPE orig_a = i * 2 + i % 5; \ > + TYPE orig_b = i % 11; \ > + TYPE expected_a = orig_a; \ > + if (i < N1 && orig_b > (i & 1 ? 2 : 1)) \ > + expected_a /= orig_b; \ > + if (a[i] != expected_a || b[i] != orig_b) \ > + __builtin_abort (); \ > + } \ > + } > + > +int > +main (void) > +{ > + TEST_ALL (HARNESS) > +}