From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by sourceware.org (Postfix) with ESMTPS id A6B2A3858D32 for ; Mon, 5 Sep 2022 10:58:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A6B2A3858D32 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x636.google.com with SMTP id fy31so16296991ejc.6 for ; Mon, 05 Sep 2022 03:58:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=F6Hr6XvQBTKP81vhT94jRu2ye3Mgmpf3NzeQ18SAYcM=; b=lFt3kr7Ff8v2vlJvBRIuCOmaFBhgJIMkcN64LKIZW8F9lihFMa5ENcuJGFXdtIV1zt PB3x9HCx2W9HE/F6p39UjsmYpoCXkWAIwm3hktxBnb6XNawOrjqLPUb6NEuLLNYtvw5X VFPbdEE+HksPX1ColB2ypUA7sPYbrBfrsRfU1Zhi+LUWbXSzYiOUhAy+u09MXl4w56XZ 9yBMPoSWdiZ7rYzKezKkHvcq+Qc0op60hT86WbCzgC3U04KoEot5PwKoqCFf31dbrA5x L8j4HL+TdIiu3cbz1Nntg1cLmBeyCI7q9SOV1EQHzyrd0PEcxfY1muMNdaHnJ7Q+nd+I Cn5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=F6Hr6XvQBTKP81vhT94jRu2ye3Mgmpf3NzeQ18SAYcM=; b=3XkEZW7b0sHJJQwrEk+RE4aLDHmVhoEJsZUJePlDHzbjzSBwxwhAlfd06nCcMa4saG BfEQTQ7aQahbL9rvap8ITKKN/225PT4y2k9vl/gq+3GAlLV3bYOvmza9DIrqmOKMGxab krnnTbIg/Ccu2Bqd1Jli0C7chBnCW2AAmdpIwwIvQ5Jif893E5JARm0dNsDtz4VLsYN2 M50PNsAhGRkN7aNucYgXMMk2CahwHrA9u13/5qGNIowhADs3Y1Pt/kbDzIqvsBAChI6s 5z6pTk3dNt10OE6hkC2G7r7u2LWevaWdV1XlP7qmAXv6lem2BLmXG3bPtABC9vDF19c3 G9cg== X-Gm-Message-State: ACgBeo265Ibf4Gml0hxBJcps62OabF0i097AWzRCXQ2RZtndp+8S+mac crrmHe/gB6Xsc+iUjVy4kNqK/FnnWxFakMFbt600lHC/ X-Google-Smtp-Source: AA6agR5LC8yDWN4Xw7OIyATu0PK8kB8kvkBoPKf6qx5oeCK/LhkIktcO3sqYK4HcGgWD+BEJnLkTxLjV9Rk+KNudPbY= X-Received: by 2002:a17:907:7613:b0:73d:ca26:1ec7 with SMTP id jx19-20020a170907761300b0073dca261ec7mr34552184ejc.511.1662375523839; Mon, 05 Sep 2022 03:58:43 -0700 (PDT) MIME-Version: 1.0 References: <20220829052444.86744-1-hongtao.liu@intel.com> In-Reply-To: <20220829052444.86744-1-hongtao.liu@intel.com> From: Richard Biener Date: Mon, 5 Sep 2022 12:58:31 +0200 Message-ID: Subject: Re: [PATCH V2] Extend vectorizer to handle nonlinear induction for neg, mul/lshift/rshift with a constant. To: liuhongt Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Aug 29, 2022 at 7:29 AM liuhongt via Gcc-patches wrote: > > >Looks good overall - a few comments inline. Also can you please add > >SLP support? > >I've tried hard to fill in gaps where SLP support is missing since my > >goal is still to get > >rid of non-SLP. > For slp with different induction type, they need separate iv update and > an vector permutation. And if they are the same induction type, iv can > be updated with 1 instructions w/o permutation. > I'll add a incremental patch for that. > > >gcc_assert (is_a (loop_phi_node)); > >is shorter > It looks like it doesn't support gphi*, just support gimple*. Since it's > already gphi* here, I've removed the assert. > > >I think you should use > >init_expr = PHI_ARG_DEF_FROM_EDGE (loop_phi_node,loop_preheader_edge (loop)); > >and loop_latch_edge (loop) for ev_expr, you can't rely on them being arg 0 / 1. > Changed. > > >Likewise. Use preheader/latch edge. > Changed. > > >and those two should then go away. > Removed. > > >is not vectorized? I think it should be possible to relax > >this. > Relaxed. > > >def is never NULL so a cheaper way to write this is > > || ((def = SSA_NAME_DEF_STMT (ev_expr)), true) > Changed. > > >not sure if we need to bother here - ideally vectorizable_live_operation would > >give up instead (but I suppose the regular IV / induction analysis gives up > >here as well?) > Removed. > > >the above can at least go into the combined switch case > Changed. > > >Seeing this - did you check whether you handle prologue peeling correctly? The > >same issue might show up with a vectorized epilogue. I think you can force a > >peeled prologue with storing unaligned and -fno-vect-cost-model (that IIRC will > >simply optimize for the larger number of aligned memory ops) > Update in vect_update_ivs_after_vectorizer, also support peel for unaligned cases. > > >since you only handle inner loop nonlinear IVs you should probably > >swap the two checks? > Changed. > > >There might be a more canonical way to build the series expr > build_vec_series doens't add stmt to sequence, so i'll still keep VEC_SERY_EXPR here? > > >use types_compatible_p (...) instead of comparing TYPE_CANONICAL. > >A small enhancement would be to support different signedness > >(use tree_nop_conversion_p then). > Support different signedness. > > >above you asserted that the conversion is only necessary for constants > >but then fold_convert will also generate a tree NOP_EXPR for > >some types_compatible_p types. So maybe only do this for INTEGER_CST > >init_expr or use init_expr = gimple_convert (...) and insert required stmts > >on the preheader. > Changed. > > >Alternatively you could perform the vector IV updates in an unsigned type? > Changed. > > >why's that necessary? can we do a MIN (vector_step, { prec-1, prec-1, > >prec-1 ... }) > It's true for ashr, but not for ashl, lshr. For the later 2, when vector_step >= precision > The result should be zero instead of shift by prec - 1. > > >> + new_name = vect_create_nonlinear_iv_step (&stmts, step_expr, > >> + nunits, induction_type); > >> + > >> + vec_step = vect_create_nonlinear_iv_vec_step (loop_vinfo, stmt_info, > >> + new_name, vectype, > >> + induction_type); > > >are these not the same as created above?are these not the same as created above? > > They are different, the first one is vf, this is nunits, vf could be multi copy of nunits which > is exact this code is handled and phi_latch is updated in the former vf place. > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. OK, and sorry for the delay. Thanks. Richard. > > For neg, the patch create a vec_init as [ a, -a, a, -a, ... ] and no > vec_step is needed to update vectorized iv since vf is always multiple > of 2(negative * negative is positive). > > For shift, the patch create a vec_init as [ a, a >> c, a >> 2*c, ..] > as vec_step as [ c * nunits, c * nunits, c * nunits, ... ], vectorized iv is > updated as vec_def = vec_init >>/<< vec_step. > > For mul, the patch create a vec_init as [ a, a * c, a * pow(c, 2), ..] > as vec_step as [ pow(c,nunits), pow(c,nunits),...] iv is updated as vec_def = > vec_init * vec_step. > > The patch handles nonlinear iv for > 1. Integer type only, floating point is not handled. > 2. No slp_node. > 3. iv_loop should be same as vector loop, not nested loop. > 4. No UD is created, for mul, use unsigned mult to avoid UD, for > shift, shift count should be less than type precision. > > gcc/ChangeLog: > > PR tree-optimization/103144 > * tree-vect-loop.cc (vect_is_nonlinear_iv_evolution): New function. > (vect_analyze_scalar_cycles_1): Detect nonlinear iv by upper function. > (vect_create_nonlinear_iv_init): New function. > (vect_peel_nonlinear_iv_init): Ditto. > (vect_create_nonlinear_iv_step): Ditto > (vect_create_nonlinear_iv_vec_step): Ditto > (vect_update_nonlinear_iv): Ditto > (vectorizable_nonlinear_induction): Ditto. > (vectorizable_induction): Call > vectorizable_nonlinear_induction when induction_type is not > vect_step_op_add. > * tree-vect-loop-manip.cc (vect_update_ivs_after_vectorizer): > Update nonlinear iv for epilogue loop. > * tree-vectorizer.h (enum vect_induction_op_type): New enum. > (STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE): New Macro. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr103144-mul-1.c: New test. > * gcc.target/i386/pr103144-mul-2.c: New test. > * gcc.target/i386/pr103144-neg-1.c: New test. > * gcc.target/i386/pr103144-neg-2.c: New test. > * gcc.target/i386/pr103144-shift-1.c: New test. > * gcc.target/i386/pr103144-shift-2.c: New test. > --- > .../gcc.target/i386/pr103144-mul-1.c | 51 ++ > .../gcc.target/i386/pr103144-mul-2.c | 51 ++ > .../gcc.target/i386/pr103144-neg-1.c | 51 ++ > .../gcc.target/i386/pr103144-neg-2.c | 44 ++ > .../gcc.target/i386/pr103144-shift-1.c | 70 ++ > .../gcc.target/i386/pr103144-shift-2.c | 79 ++ > gcc/tree-vect-loop-manip.cc | 37 +- > gcc/tree-vect-loop.cc | 678 +++++++++++++++++- > gcc/tree-vectorizer.h | 15 + > 9 files changed, 1062 insertions(+), 14 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mul-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mul-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-neg-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-neg-2.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-shift-1.c > create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-shift-2.c > > diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c > new file mode 100644 > index 00000000000..640c34fd959 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c > @@ -0,0 +1,51 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ > + > +#define N 10000 > + > +void > +__attribute__((noipa)) > +foo_mul (int* a, int b) > +{ > + for (int i = 0; i != N; i++) > + { > + a[i] = b; > + b *= 3; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_mul_const (int* a) > +{ > + int b = 1; > + for (int i = 0; i != N; i++) > + { > + a[i] = b; > + b *= 3; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_mul_peel (int* a, int b) > +{ > + for (int i = 0; i != 39; i++) > + { > + a[i] = b; > + b *= 3; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_mul_peel_const (int* a) > +{ > + int b = 1; > + for (int i = 0; i != 39; i++) > + { > + a[i] = b; > + b *= 3; > + } > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c > new file mode 100644 > index 00000000000..39fdea3a69d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c > @@ -0,0 +1,51 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */ > +/* { dg-require-effective-target avx2 } */ > + > +#include "avx2-check.h" > +#include > +#include "pr103144-mul-1.c" > + > +typedef int v8si __attribute__((vector_size(32))); > + > +void > +avx2_test (void) > +{ > + int* epi32_exp = (int*) malloc (N * sizeof (int)); > + int* epi32_dst = (int*) malloc (N * sizeof (int)); > + > + __builtin_memset (epi32_exp, 0, N * sizeof (int)); > + int b = 8; > + v8si init = __extension__(v8si) { b, b * 3, b * 9, b * 27, b * 81, b * 243, b * 729, b * 2187 }; > + > + for (int i = 0; i != N / 8; i++) > + { > + memcpy (epi32_exp + i * 8, &init, 32); > + init *= 6561; > + } > + > + foo_mul (epi32_dst, b); > + if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0) > + __builtin_abort (); > + > + foo_mul_peel (epi32_dst, b); > + if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * 4) != 0) > + __builtin_abort (); > + > + init = __extension__(v8si) { 1, 3, 9, 27, 81, 243, 729, 2187 }; > + for (int i = 0; i != N / 8; i++) > + { > + memcpy (epi32_exp + i * 8, &init, 32); > + init *= 6561; > + } > + > + foo_mul_const (epi32_dst); > + if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0) > + __builtin_abort (); > + > + foo_mul_peel_const (epi32_dst); > + if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * 4) != 0) > + __builtin_abort (); > + > + return; > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr103144-neg-1.c b/gcc/testsuite/gcc.target/i386/pr103144-neg-1.c > new file mode 100644 > index 00000000000..f87b1d6e529 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103144-neg-1.c > @@ -0,0 +1,51 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ > + > +#define N 10000 > + > +void > +__attribute__((noipa)) > +foo_neg (int* a, int b) > +{ > + for (int i = 0; i != N; i++) > + { > + a[i] = b; > + b = -b; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_neg_const (int* a) > +{ > + int b = 1; > + for (int i = 0; i != N; i++) > + { > + a[i] = b; > + b = -b; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_neg_peel (int* a, int b, int n) > +{ > + for (int i = 0; i != n; i++) > + { > + a[i] = b; > + b = -b; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_neg_const_peel (int* a, int n) > +{ > + int b = 1; > + for (int i = 0; i != n; i++) > + { > + a[i] = b; > + b = -b; > + } > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr103144-neg-2.c b/gcc/testsuite/gcc.target/i386/pr103144-neg-2.c > new file mode 100644 > index 00000000000..bb8c22b9f9e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103144-neg-2.c > @@ -0,0 +1,44 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */ > +/* { dg-require-effective-target avx2 } */ > + > +#include "avx2-check.h" > +#include > +#include "pr103144-neg-1.c" > + > +void > +avx2_test (void) > +{ > + int* epi32_exp = (int*) malloc (N * sizeof (int)); > + int* epi32_dst = (int*) malloc (N * sizeof (int)); > + long long* epi64_exp = (long long*) malloc (N * sizeof (int)); > + > + __builtin_memset (epi32_exp, 0, N * sizeof (int)); > + int b = 100; > + > + for (int i = 0; i != N / 2; i++) > + epi64_exp[i] = ((long long) b) | (((long long) -b) << 32); > + > + memcpy (epi32_exp, epi64_exp, N * sizeof (int)); > + foo_neg (epi32_dst, b); > + if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0) > + __builtin_abort (); > + > + foo_neg_peel (epi32_dst, b, 39); > + if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * sizeof (int)) != 0) > + __builtin_abort (); > + > + for (int i = 0; i != N / 2; i++) > + epi64_exp[i] = ((long long) 1) | (((long long) -1) << 32); > + > + memcpy (epi32_exp, epi64_exp, N * sizeof (int)); > + foo_neg_const (epi32_dst); > + if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0) > + __builtin_abort (); > + > + foo_neg_const_peel (epi32_dst, 39); > + if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * sizeof (int)) != 0) > + __builtin_abort (); > + > + return; > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr103144-shift-1.c b/gcc/testsuite/gcc.target/i386/pr103144-shift-1.c > new file mode 100644 > index 00000000000..2a6920350dd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103144-shift-1.c > @@ -0,0 +1,70 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -mprefer-vector-width=256" } */ > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 6 "vect" } } */ > + > +#define N 10000 > +void > +__attribute__((noipa)) > +foo_shl (int* a, int b) > +{ > + for (int i = 0; i != N; i++) > + { > + a[i] = b; > + b <<= 1; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_ashr (int* a, int b) > +{ > + for (int i = 0; i != N; i++) > + { > + a[i] = b; > + b >>= 1; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_lshr (unsigned int* a, unsigned int b) > +{ > + for (int i = 0; i != N; i++) > + { > + a[i] = b; > + b >>= 1U; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_shl_peel (int* a, int b) > +{ > + for (int i = 0; i != 39; i++) > + { > + a[i] = b; > + b <<= 1; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_ashr_peel (int* a, int b) > +{ > + for (int i = 0; i != 39; i++) > + { > + a[i] = b; > + b >>= 1; > + } > +} > + > +void > +__attribute__((noipa)) > +foo_lshr_peel (unsigned int* a, unsigned int b) > +{ > + for (int i = 0; i != 39; i++) > + { > + a[i] = b; > + b >>= 1U; > + } > +} > diff --git a/gcc/testsuite/gcc.target/i386/pr103144-shift-2.c b/gcc/testsuite/gcc.target/i386/pr103144-shift-2.c > new file mode 100644 > index 00000000000..6f477191d96 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr103144-shift-2.c > @@ -0,0 +1,79 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited -mprefer-vector-width=256" } */ > +/* { dg-require-effective-target avx2 } */ > + > +#include "avx2-check.h" > +#include > +#include "pr103144-shift-1.c" > + > +typedef int v8si __attribute__((vector_size(32))); > +typedef unsigned int v8usi __attribute__((vector_size(32))); > + > +void > +avx2_test (void) > +{ > + int* epi32_exp = (int*) malloc (N * sizeof (int)); > + int* epi32_dst = (int*) malloc (N * sizeof (int)); > + unsigned int* epu32_exp = (unsigned int*) malloc (N * sizeof (int)); > + unsigned int* epu32_dst = (unsigned int*) malloc (N * sizeof (int)); > + > + __builtin_memset (epi32_exp, 0, N * sizeof (int)); > + int b = 8; > + v8si init = __extension__(v8si) { b, b << 1, b << 2, b << 3, b << 4, b << 5, b << 6, b << 7 }; > + > + for (int i = 0; i != N / 8; i++) > + { > + memcpy (epi32_exp + i * 8, &init, 32); > + init <<= 8; > + } > + > + foo_shl (epi32_dst, b); > + if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0) > + __builtin_abort (); > + > + foo_shl_peel (epi32_dst, b); > + if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * sizeof (int)) != 0) > + __builtin_abort (); > + > + b = -11111; > + init = __extension__(v8si) { b, b >> 1, b >> 2, b >> 3, b >> 4, b >> 5, b >> 6, b >> 7 }; > + for (int i = 0; i != N / 8; i++) > + { > + memcpy (epi32_exp + i * 8, &init, 32); > + init >>= 8; > + } > + > + foo_ashr (epi32_dst, b); > + if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0) > + __builtin_abort (); > + > + foo_ashr_peel (epi32_dst, b); > + if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * sizeof (int)) != 0) > + { > + for (int i = 0; i != 39; i++) > + { > + printf ("epi32_dst[%d] is %d ----", i, epi32_dst[i]); > + printf ("epi32_exp[%d] is %d\n", i, epi32_exp[i]); > + } > + __builtin_abort (); > + } > + > + __builtin_memset (epu32_exp, 0, N * sizeof (int)); > + unsigned int c = 11111111; > + v8usi initu = __extension__(v8usi) { c, c >> 1U, c >> 2U, c >> 3U, c >> 4U, c >> 5U, c >> 6U, c >> 7U }; > + for (int i = 0; i != N / 8; i++) > + { > + memcpy (epu32_exp + i * 8, &initu, 32); > + initu >>= 8U; > + } > + > + foo_lshr (epu32_dst, c); > + if (__builtin_memcmp (epu32_dst, epu32_exp, N * sizeof (int)) != 0) > + __builtin_abort (); > + > + foo_lshr_peel (epu32_dst, c); > + if (__builtin_memcmp (epu32_dst, epu32_exp, 39 * sizeof (int)) != 0) > + __builtin_abort (); > + > + return; > +} > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index 86d2264054a..fc7901d8a8a 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -1559,15 +1559,28 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, > gcc_assert (!tree_is_chrec (step_expr)); > > init_expr = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); > + gimple_seq stmts = NULL; > + enum vect_induction_op_type induction_type > + = STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info); > > - off = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr), > - fold_convert (TREE_TYPE (step_expr), niters), > - step_expr); > - if (POINTER_TYPE_P (type)) > - ni = fold_build_pointer_plus (init_expr, off); > + if (induction_type == vect_step_op_add) > + { > + off = fold_build2 (MULT_EXPR, TREE_TYPE (step_expr), > + fold_convert (TREE_TYPE (step_expr), niters), > + step_expr); > + if (POINTER_TYPE_P (type)) > + ni = fold_build_pointer_plus (init_expr, off); > + else > + ni = fold_build2 (PLUS_EXPR, type, > + init_expr, fold_convert (type, off)); > + } > + /* Don't bother call vect_peel_nonlinear_iv_init. */ > + else if (induction_type == vect_step_op_neg) > + ni = init_expr; > else > - ni = fold_build2 (PLUS_EXPR, type, > - init_expr, fold_convert (type, off)); > + ni = vect_peel_nonlinear_iv_init (&stmts, init_expr, > + niters, step_expr, > + induction_type); > > var = create_tmp_var (type, "tmp"); > > @@ -1576,9 +1589,15 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, > ni_name = force_gimple_operand (ni, &new_stmts, false, var); > /* Exit_bb shouldn't be empty. */ > if (!gsi_end_p (last_gsi)) > - gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT); > + { > + gsi_insert_seq_after (&last_gsi, stmts, GSI_SAME_STMT); > + gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT); > + } > else > - gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT); > + { > + gsi_insert_seq_before (&last_gsi, stmts, GSI_SAME_STMT); > + gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT); > + } > > /* Fix phi expressions in the successor bb. */ > adjust_phi_and_debug_stmts (phi1, update_e, ni_name); > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 2257b29a652..c2293466c1c 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -425,6 +425,77 @@ vect_is_simple_iv_evolution (unsigned loop_nb, tree access_fn, tree * init, > return true; > } > > +/* Function vect_is_nonlinear_iv_evolution > + > + Only support nonlinear induction for integer type > + 1. neg > + 2. mul by constant > + 3. lshift/rshift by constant. > + > + For neg induction, return a fake step as integer -1. */ > +static bool > +vect_is_nonlinear_iv_evolution (class loop* loop, stmt_vec_info stmt_info, > + gphi* loop_phi_node, tree *init, tree *step) > +{ > + tree init_expr, ev_expr, result, op1, op2; > + gimple* def; > + > + if (gimple_phi_num_args (loop_phi_node) != 2) > + return false; > + > + init_expr = PHI_ARG_DEF_FROM_EDGE (loop_phi_node, loop_preheader_edge (loop)); > + ev_expr = PHI_ARG_DEF_FROM_EDGE (loop_phi_node, loop_latch_edge (loop)); > + > + /* Support nonlinear induction only for integer type. */ > + if (!INTEGRAL_TYPE_P (TREE_TYPE (init_expr))) > + return false; > + > + *init = init_expr; > + result = PHI_RESULT (loop_phi_node); > + > + if (TREE_CODE (ev_expr) != SSA_NAME > + || ((def = SSA_NAME_DEF_STMT (ev_expr)), false) > + || !is_gimple_assign (def)) > + return false; > + > + enum tree_code t_code = gimple_assign_rhs_code (def); > + switch (t_code) > + { > + case NEGATE_EXPR: > + if (gimple_assign_rhs1 (def) != result) > + return false; > + *step = build_int_cst (TREE_TYPE (init_expr), -1); > + STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (stmt_info) = vect_step_op_neg; > + break; > + > + case RSHIFT_EXPR: > + case LSHIFT_EXPR: > + case MULT_EXPR: > + op1 = gimple_assign_rhs1 (def); > + op2 = gimple_assign_rhs2 (def); > + if (TREE_CODE (op2) != INTEGER_CST > + || op1 != result) > + return false; > + *step = op2; > + if (t_code == LSHIFT_EXPR) > + STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (stmt_info) = vect_step_op_shl; > + else if (t_code == RSHIFT_EXPR) > + STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (stmt_info) = vect_step_op_shr; > + /* NEGATE_EXPR and MULT_EXPR are both vect_step_op_mul. */ > + else > + STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (stmt_info) = vect_step_op_mul; > + break; > + > + default: > + return false; > + } > + > + STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED (stmt_info) = *init; > + STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info) = *step; > + > + return true; > +} > + > /* Return true if PHI, described by STMT_INFO, is the inner PHI in > what we are assuming is a double reduction. For example, given > a structure like this: > @@ -512,11 +583,16 @@ vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, class loop *loop, > = evolution_part_in_loop_num (access_fn, loop->num); > } > > - if (!access_fn > - || vect_inner_phi_in_double_reduction_p (loop_vinfo, phi) > - || !vect_is_simple_iv_evolution (loop->num, access_fn, &init, &step) > - || (LOOP_VINFO_LOOP (loop_vinfo) != loop > - && TREE_CODE (step) != INTEGER_CST)) > + if ((!access_fn > + || vect_inner_phi_in_double_reduction_p (loop_vinfo, phi) > + || !vect_is_simple_iv_evolution (loop->num, access_fn, > + &init, &step) > + || (LOOP_VINFO_LOOP (loop_vinfo) != loop > + && TREE_CODE (step) != INTEGER_CST)) > + /* Only handle nonlinear iv for same loop. */ > + && (LOOP_VINFO_LOOP (loop_vinfo) != loop > + || !vect_is_nonlinear_iv_evolution (loop, stmt_vinfo, > + phi, &init, &step))) > { > worklist.safe_push (stmt_vinfo); > continue; > @@ -8229,6 +8305,591 @@ vect_can_vectorize_without_simd_p (code_helper code) > && vect_can_vectorize_without_simd_p (tree_code (code))); > } > > +/* Create vector init for vectorized iv. */ > +static tree > +vect_create_nonlinear_iv_init (gimple_seq* stmts, tree init_expr, > + tree step_expr, poly_uint64 nunits, > + tree vectype, > + enum vect_induction_op_type induction_type) > +{ > + unsigned HOST_WIDE_INT const_nunits; > + tree vec_shift, vec_init, new_name; > + unsigned i; > + tree itype = TREE_TYPE (vectype); > + > + /* iv_loop is the loop to be vectorized. Create: > + vec_init = [X, X+S, X+2*S, X+3*S] (S = step_expr, X = init_expr). */ > + new_name = gimple_convert (stmts, itype, init_expr); > + switch (induction_type) > + { > + case vect_step_op_shr: > + case vect_step_op_shl: > + /* Build the Initial value from shift_expr. */ > + vec_init = gimple_build_vector_from_val (stmts, > + vectype, > + new_name); > + vec_shift = gimple_build (stmts, VEC_SERIES_EXPR, vectype, > + build_zero_cst (itype), step_expr); > + vec_init = gimple_build (stmts, > + (induction_type == vect_step_op_shr > + ? RSHIFT_EXPR : LSHIFT_EXPR), > + vectype, vec_init, vec_shift); > + break; > + > + case vect_step_op_neg: > + { > + vec_init = gimple_build_vector_from_val (stmts, > + vectype, > + new_name); > + tree vec_neg = gimple_build (stmts, NEGATE_EXPR, > + vectype, vec_init); > + /* The encoding has 2 interleaved stepped patterns. */ > + vec_perm_builder sel (nunits, 2, 3); > + sel.quick_grow (6); > + for (i = 0; i < 3; i++) > + { > + sel[2 * i] = i; > + sel[2 * i + 1] = i + nunits; > + } > + vec_perm_indices indices (sel, 2, nunits); > + tree perm_mask_even > + = vect_gen_perm_mask_checked (vectype, indices); > + vec_init = gimple_build (stmts, VEC_PERM_EXPR, > + vectype, > + vec_init, vec_neg, > + perm_mask_even); > + } > + break; > + > + case vect_step_op_mul: > + { > + /* Use unsigned mult to avoid UD integer overflow. */ > + gcc_assert (nunits.is_constant (&const_nunits)); > + tree utype = unsigned_type_for (itype); > + tree uvectype = build_vector_type (utype, > + TYPE_VECTOR_SUBPARTS (vectype)); > + new_name = gimple_convert (stmts, utype, new_name); > + vec_init = gimple_build_vector_from_val (stmts, > + uvectype, > + new_name); > + tree_vector_builder elts (uvectype, const_nunits, 1); > + tree elt_step = build_one_cst (utype); > + > + elts.quick_push (elt_step); > + for (i = 1; i < const_nunits; i++) > + { > + /* Create: new_name_i = new_name + step_expr. */ > + elt_step = gimple_build (stmts, MULT_EXPR, > + utype, elt_step, step_expr); > + elts.quick_push (elt_step); > + } > + /* Create a vector from [new_name_0, new_name_1, ..., > + new_name_nunits-1]. */ > + tree vec_mul = gimple_build_vector (stmts, &elts); > + vec_init = gimple_build (stmts, MULT_EXPR, uvectype, > + vec_init, vec_mul); > + vec_init = gimple_convert (stmts, vectype, vec_init); > + } > + break; > + > + default: > + gcc_unreachable (); > + } > + > + return vec_init; > +} > + > +/* Peel init_expr by skip_niter for induction_type. */ > +tree > +vect_peel_nonlinear_iv_init (gimple_seq* stmts, tree init_expr, > + tree skip_niters, tree step_expr, > + enum vect_induction_op_type induction_type) > +{ > + gcc_assert (TREE_CODE (skip_niters) == INTEGER_CST); > + tree type = TREE_TYPE (init_expr); > + unsigned prec = TYPE_PRECISION (type); > + switch (induction_type) > + { > + case vect_step_op_neg: > + if (TREE_INT_CST_LOW (skip_niters) % 2) > + init_expr = gimple_build (stmts, NEGATE_EXPR, type, init_expr); > + /* else no change. */ > + break; > + > + case vect_step_op_shr: > + case vect_step_op_shl: > + skip_niters = gimple_convert (stmts, type, skip_niters); > + step_expr = gimple_build (stmts, MULT_EXPR, type, step_expr, skip_niters); > + /* When shift mount >= precision, need to avoid UD. > + In the original loop, there's no UD, and according to semantic, > + init_expr should be 0 for lshr, ashl, and >>= (prec - 1) for ashr. */ > + if (!tree_fits_uhwi_p (step_expr) > + || tree_to_uhwi (step_expr) >= prec) > + { > + if (induction_type == vect_step_op_shl > + || TYPE_UNSIGNED (type)) > + init_expr = build_zero_cst (type); > + else > + init_expr = gimple_build (stmts, RSHIFT_EXPR, type, > + init_expr, > + wide_int_to_tree (type, prec - 1)); > + } > + else > + init_expr = gimple_build (stmts, (induction_type == vect_step_op_shr > + ? RSHIFT_EXPR : LSHIFT_EXPR), > + type, init_expr, step_expr); > + break; > + > + case vect_step_op_mul: > + { > + tree utype = unsigned_type_for (type); > + init_expr = gimple_convert (stmts, utype, init_expr); > + unsigned skipn = TREE_INT_CST_LOW (skip_niters); > + wide_int begin = wi::to_wide (step_expr); > + for (unsigned i = 0; i != skipn - 1; i++) > + begin = wi::mul (begin, wi::to_wide (step_expr)); > + tree mult_expr = wide_int_to_tree (utype, begin); > + init_expr = gimple_build (stmts, MULT_EXPR, utype, init_expr, mult_expr); > + init_expr = gimple_convert (stmts, type, init_expr); > + } > + break; > + > + default: > + gcc_unreachable (); > + } > + > + return init_expr; > +} > + > +/* Create vector step for vectorized iv. */ > +static tree > +vect_create_nonlinear_iv_step (gimple_seq* stmts, tree step_expr, > + poly_uint64 vf, > + enum vect_induction_op_type induction_type) > +{ > + tree expr = build_int_cst (TREE_TYPE (step_expr), vf); > + tree new_name = NULL; > + /* Step should be pow (step, vf) for mult induction. */ > + if (induction_type == vect_step_op_mul) > + { > + gcc_assert (vf.is_constant ()); > + wide_int begin = wi::to_wide (step_expr); > + > + for (unsigned i = 0; i != vf.to_constant () - 1; i++) > + begin = wi::mul (begin, wi::to_wide (step_expr)); > + > + new_name = wide_int_to_tree (TREE_TYPE (step_expr), begin); > + } > + else if (induction_type == vect_step_op_neg) > + /* Do nothing. */ > + ; > + else > + new_name = gimple_build (stmts, MULT_EXPR, TREE_TYPE (step_expr), > + expr, step_expr); > + return new_name; > +} > + > +static tree > +vect_create_nonlinear_iv_vec_step (loop_vec_info loop_vinfo, > + stmt_vec_info stmt_info, > + tree new_name, tree vectype, > + enum vect_induction_op_type induction_type) > +{ > + /* No step is needed for neg induction. */ > + if (induction_type == vect_step_op_neg) > + return NULL; > + > + tree t = unshare_expr (new_name); > + gcc_assert (CONSTANT_CLASS_P (new_name) > + || TREE_CODE (new_name) == SSA_NAME); > + tree new_vec = build_vector_from_val (vectype, t); > + tree vec_step = vect_init_vector (loop_vinfo, stmt_info, > + new_vec, vectype, NULL); > + return vec_step; > +} > + > +/* Update vectorized iv with vect_step, induc_def is init. */ > +static tree > +vect_update_nonlinear_iv (gimple_seq* stmts, tree vectype, > + tree induc_def, tree vec_step, > + enum vect_induction_op_type induction_type) > +{ > + tree vec_def = induc_def; > + switch (induction_type) > + { > + case vect_step_op_mul: > + { > + /* Use unsigned mult to avoid UD integer overflow. */ > + tree uvectype > + = build_vector_type (unsigned_type_for (TREE_TYPE (vectype)), > + TYPE_VECTOR_SUBPARTS (vectype)); > + vec_def = gimple_convert (stmts, uvectype, vec_def); > + vec_step = gimple_convert (stmts, uvectype, vec_step); > + vec_def = gimple_build (stmts, MULT_EXPR, uvectype, > + vec_def, vec_step); > + vec_def = gimple_convert (stmts, vectype, vec_def); > + } > + break; > + > + case vect_step_op_shr: > + vec_def = gimple_build (stmts, RSHIFT_EXPR, vectype, > + vec_def, vec_step); > + break; > + > + case vect_step_op_shl: > + vec_def = gimple_build (stmts, LSHIFT_EXPR, vectype, > + vec_def, vec_step); > + break; > + case vect_step_op_neg: > + vec_def = induc_def; > + /* Do nothing. */ > + break; > + default: > + gcc_unreachable (); > + } > + > + return vec_def; > + > +} > +/* Function vectorizable_induction > + > + Check if STMT_INFO performs an nonlinear induction computation that can be > + vectorized. If VEC_STMT is also passed, vectorize the induction PHI: create > + a vectorized phi to replace it, put it in VEC_STMT, and add it to the same > + basic block. > + Return true if STMT_INFO is vectorizable in this way. */ > + > +static bool > +vectorizable_nonlinear_induction (loop_vec_info loop_vinfo, > + stmt_vec_info stmt_info, > + gimple **vec_stmt, slp_tree slp_node, > + stmt_vector_for_cost *cost_vec) > +{ > + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > + unsigned ncopies; > + bool nested_in_vect_loop = false; > + class loop *iv_loop; > + tree vec_def; > + edge pe = loop_preheader_edge (loop); > + basic_block new_bb; > + tree vec_init, vec_step; > + tree new_name; > + gimple *new_stmt; > + gphi *induction_phi; > + tree induc_def, vec_dest; > + tree init_expr, step_expr; > + tree niters_skip; > + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > + unsigned i; > + gimple_stmt_iterator si; > + > + gphi *phi = dyn_cast (stmt_info->stmt); > + > + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > + poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > + enum vect_induction_op_type induction_type > + = STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (stmt_info); > + > + gcc_assert (induction_type > vect_step_op_add); > + > + if (slp_node) > + ncopies = 1; > + else > + ncopies = vect_get_num_copies (loop_vinfo, vectype); > + gcc_assert (ncopies >= 1); > + > + /* FORNOW. Only handle nonlinear induction in the same loop. */ > + if (nested_in_vect_loop_p (loop, stmt_info)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "nonlinear induction in nested loop.\n"); > + return false; > + } > + > + iv_loop = loop; > + gcc_assert (iv_loop == (gimple_bb (phi))->loop_father); > + > + /* TODO: Support slp for nonlinear iv. There should be separate vector iv > + update for each iv and a permutation to generate wanted vector iv. */ > + if (slp_node) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "SLP induction not supported for nonlinear" > + " induction.\n"); > + return false; > + } > + > + /* Init_expr will be update by vect_update_ivs_after_vectorizer, > + if niters is unkown: > + For shift, when shift mount >= precision, there would be UD. > + For mult, don't known how to generate > + init_expr * pow (step, niters) for variable niters. > + For neg, it should be ok, since niters of vectorized main loop > + will always be multiple of 2. */ > + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > + && induction_type != vect_step_op_neg) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "Peeling for epilogue is not supported" > + " for nonlinear induction except neg" > + " when iteration count is unknown.\n"); > + return false; > + } > + > + /* Also doens't support peel for neg when niter is variable. > + ??? generate something like niter_expr & 1 ? init_expr : -init_expr? */ > + niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo); > + if (niters_skip != NULL_TREE > + && TREE_CODE (niters_skip) != INTEGER_CST) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "Peeling for alignement is not supported" > + " for nonlinear induction when niters_skip" > + " is not constant.\n"); > + return false; > + } > + > + if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > + && induction_type == vect_step_op_mul) > + if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype))) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "floating point nonlinear induction vectorization" > + " not supported.\n"); > + return false; > + } > + > + step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info); > + init_expr = vect_phi_initial_value (phi); > + gcc_assert (step_expr != NULL_TREE && init_expr != NULL > + && TREE_CODE (step_expr) == INTEGER_CST); > + /* step_expr should be aligned with init_expr, > + .i.e. uint64 a >> 1, step is int, but vector shift is used. */ > + step_expr = fold_convert (TREE_TYPE (vectype), step_expr); > + > + if (TREE_CODE (init_expr) == INTEGER_CST) > + init_expr = fold_convert (TREE_TYPE (vectype), init_expr); > + else > + gcc_assert (tree_nop_conversion_p (TREE_TYPE (vectype), > + TREE_TYPE (init_expr))); > + > + switch (induction_type) > + { > + case vect_step_op_neg: > + if (TREE_CODE (init_expr) != INTEGER_CST > + && TREE_CODE (init_expr) != REAL_CST) > + { > + /* Check for backend support of NEGATE_EXPR and vec_perm. */ > + if (!directly_supported_p (NEGATE_EXPR, vectype)) > + return false; > + > + /* The encoding has 2 interleaved stepped patterns. */ > + vec_perm_builder sel (nunits, 2, 3); > + machine_mode mode = TYPE_MODE (vectype); > + sel.quick_grow (6); > + for (i = 0; i < 3; i++) > + { > + sel[i * 2] = i; > + sel[i * 2 + 1] = i + nunits; > + } > + vec_perm_indices indices (sel, 2, nunits); > + if (!can_vec_perm_const_p (mode, mode, indices)) > + return false; > + } > + break; > + > + case vect_step_op_mul: > + { > + /* Check for backend support of MULT_EXPR. */ > + if (!directly_supported_p (MULT_EXPR, vectype)) > + return false; > + > + /* ?? How to construct vector step for variable number vector. > + [ 1, step, pow (step, 2), pow (step, 4), .. ]. */ > + if (!vf.is_constant ()) > + return false; > + } > + break; > + > + case vect_step_op_shr: > + /* Check for backend support of RSHIFT_EXPR. */ > + if (!directly_supported_p (RSHIFT_EXPR, vectype, optab_vector)) > + return false; > + > + /* Don't shift more than type precision to avoid UD. */ > + if (!tree_fits_uhwi_p (step_expr) > + || maybe_ge (nunits * tree_to_uhwi (step_expr), > + TYPE_PRECISION (TREE_TYPE (init_expr)))) > + return false; > + break; > + > + case vect_step_op_shl: > + /* Check for backend support of RSHIFT_EXPR. */ > + if (!directly_supported_p (LSHIFT_EXPR, vectype, optab_vector)) > + return false; > + > + /* Don't shift more than type precision to avoid UD. */ > + if (!tree_fits_uhwi_p (step_expr) > + || maybe_ge (nunits * tree_to_uhwi (step_expr), > + TYPE_PRECISION (TREE_TYPE (init_expr)))) > + return false; > + > + break; > + > + default: > + gcc_unreachable (); > + } > + > + if (!vec_stmt) /* transformation not required. */ > + { > + unsigned inside_cost = 0, prologue_cost = 0; > + /* loop cost for vec_loop. Neg induction doesn't have any > + inside_cost. */ > + inside_cost = record_stmt_cost (cost_vec, ncopies, vector_stmt, > + stmt_info, 0, vect_body); > + > + /* loop cost for vec_loop. Neg induction doesn't have any > + inside_cost. */ > + if (induction_type == vect_step_op_neg) > + inside_cost = 0; > + > + /* prologue cost for vec_init and vec_step. */ > + prologue_cost = record_stmt_cost (cost_vec, 2, scalar_to_vec, > + stmt_info, 0, vect_prologue); > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "vect_model_induction_cost: inside_cost = %d, " > + "prologue_cost = %d. \n", inside_cost, > + prologue_cost); > + > + STMT_VINFO_TYPE (stmt_info) = induc_vec_info_type; > + DUMP_VECT_SCOPE ("vectorizable_nonlinear_induction"); > + return true; > + } > + > + /* Transform. */ > + > + /* Compute a vector variable, initialized with the first VF values of > + the induction variable. E.g., for an iv with IV_PHI='X' and > + evolution S, for a vector of 4 units, we want to compute: > + [X, X + S, X + 2*S, X + 3*S]. */ > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "transform induction phi.\n"); > + > + pe = loop_preheader_edge (iv_loop); > + /* Find the first insertion point in the BB. */ > + basic_block bb = gimple_bb (phi); > + si = gsi_after_labels (bb); > + > + gimple_seq stmts = NULL; > + > + /* If we are using the loop mask to "peel" for alignment then we need > + to adjust the start value here. */ > + if (niters_skip != NULL_TREE) > + init_expr = vect_peel_nonlinear_iv_init (&stmts, init_expr, niters_skip, > + step_expr, induction_type); > + > + vec_init = vect_create_nonlinear_iv_init (&stmts, init_expr, > + step_expr, nunits, vectype, > + induction_type); > + if (stmts) > + { > + new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); > + gcc_assert (!new_bb); > + } > + > + stmts = NULL; > + new_name = vect_create_nonlinear_iv_step (&stmts, step_expr, > + vf, induction_type); > + if (stmts) > + { > + new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); > + gcc_assert (!new_bb); > + } > + > + vec_step = vect_create_nonlinear_iv_vec_step (loop_vinfo, stmt_info, > + new_name, vectype, > + induction_type); > + /* Create the following def-use cycle: > + loop prolog: > + vec_init = ... > + vec_step = ... > + loop: > + vec_iv = PHI > + ... > + STMT > + ... > + vec_loop = vec_iv + vec_step; */ > + > + /* Create the induction-phi that defines the induction-operand. */ > + vec_dest = vect_get_new_vect_var (vectype, vect_simple_var, "vec_iv_"); > + induction_phi = create_phi_node (vec_dest, iv_loop->header); > + induc_def = PHI_RESULT (induction_phi); > + > + /* Create the iv update inside the loop. */ > + stmts = NULL; > + vec_def = vect_update_nonlinear_iv (&stmts, vectype, > + induc_def, vec_step, > + induction_type); > + > + gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT); > + new_stmt = SSA_NAME_DEF_STMT (vec_def); > + > + /* Set the arguments of the phi node: */ > + add_phi_arg (induction_phi, vec_init, pe, UNKNOWN_LOCATION); > + add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop), > + UNKNOWN_LOCATION); > + > + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (induction_phi); > + *vec_stmt = induction_phi; > + > + /* In case that vectorization factor (VF) is bigger than the number > + of elements that we can fit in a vectype (nunits), we have to generate > + more than one vector stmt - i.e - we need to "unroll" the > + vector stmt by a factor VF/nunits. For more details see documentation > + in vectorizable_operation. */ > + > + if (ncopies > 1) > + { > + stmts = NULL; > + /* FORNOW. This restriction should be relaxed. */ > + gcc_assert (!nested_in_vect_loop); > + > + new_name = vect_create_nonlinear_iv_step (&stmts, step_expr, > + nunits, induction_type); > + > + vec_step = vect_create_nonlinear_iv_vec_step (loop_vinfo, stmt_info, > + new_name, vectype, > + induction_type); > + vec_def = induc_def; > + for (i = 1; i < ncopies; i++) > + { > + /* vec_i = vec_prev + vec_step. */ > + stmts = NULL; > + vec_def = vect_update_nonlinear_iv (&stmts, vectype, > + vec_def, vec_step, > + induction_type); > + gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT); > + new_stmt = SSA_NAME_DEF_STMT (vec_def); > + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); > + } > + } > + > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "transform induction: created def-use cycle: %G%G", > + induction_phi, SSA_NAME_DEF_STMT (vec_def)); > + > + return true; > +} > + > /* Function vectorizable_induction > > Check if STMT_INFO performs an induction computation that can be vectorized. > @@ -8259,6 +8920,8 @@ vectorizable_induction (loop_vec_info loop_vinfo, > unsigned i; > tree expr; > gimple_stmt_iterator si; > + enum vect_induction_op_type induction_type > + = STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (stmt_info); > > gphi *phi = dyn_cast (stmt_info->stmt); > if (!phi) > @@ -8271,6 +8934,11 @@ vectorizable_induction (loop_vec_info loop_vinfo, > if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def) > return false; > > + /* Handle nonlinear induction in a separate place. */ > + if (induction_type != vect_step_op_add) > + return vectorizable_nonlinear_induction (loop_vinfo, stmt_info, > + vec_stmt, slp_node, cost_vec); > + > tree vectype = STMT_VINFO_VECTYPE (stmt_info); > poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > index e5fdc9e0a14..1fa08126ad8 100644 > --- a/gcc/tree-vectorizer.h > +++ b/gcc/tree-vectorizer.h > @@ -68,6 +68,15 @@ enum vect_def_type { > vect_unknown_def_type > }; > > +/* Define operation type of linear/non-linear induction variable. */ > +enum vect_induction_op_type { > + vect_step_op_add = 0, > + vect_step_op_neg, > + vect_step_op_mul, > + vect_step_op_shl, > + vect_step_op_shr > +}; > + > /* Define type of reduction. */ > enum vect_reduction_type { > TREE_CODE_REDUCTION, > @@ -1188,6 +1197,7 @@ public: > the version here. */ > tree loop_phi_evolution_base_unchanged; > tree loop_phi_evolution_part; > + enum vect_induction_op_type loop_phi_evolution_type; > > /* Used for various bookkeeping purposes, generally holding a pointer to > some other stmt S that is in some way "related" to this stmt. > @@ -1421,6 +1431,7 @@ struct gather_scatter_info { > ((S)->dr_aux.dr && DR_GROUP_FIRST_ELEMENT(S)) > #define STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED(S) (S)->loop_phi_evolution_base_unchanged > #define STMT_VINFO_LOOP_PHI_EVOLUTION_PART(S) (S)->loop_phi_evolution_part > +#define STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE(S) (S)->loop_phi_evolution_type > #define STMT_VINFO_MIN_NEG_DIST(S) (S)->min_neg_dist > #define STMT_VINFO_REDUC_TYPE(S) (S)->reduc_type > #define STMT_VINFO_REDUC_CODE(S) (S)->reduc_code > @@ -2327,6 +2338,10 @@ extern int vect_get_known_peeling_cost (loop_vec_info, int, int *, > stmt_vector_for_cost *); > extern tree cse_and_gimplify_to_preheader (loop_vec_info, tree); > > +/* Nonlinear induction. */ > +extern tree vect_peel_nonlinear_iv_init (gimple_seq*, tree, tree, > + tree, enum vect_induction_op_type); > + > /* In tree-vect-slp.cc. */ > extern void vect_slp_init (void); > extern void vect_slp_fini (void); > -- > 2.18.1 >