From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) by sourceware.org (Postfix) with ESMTPS id BFCC0385BAEE for ; Thu, 7 Dec 2023 14:13:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BFCC0385BAEE Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BFCC0385BAEE Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::134 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701958405; cv=none; b=TVmWKUrZVMbdXOymeIU311ZJ+BhB9W3uDjGdAqZN/job/VHwNiwbheld2cLQXSAFEhfcQHRlncDcgEhVbS16rR0z8Vra12qitJFV+8Nnffg+nwJL7igblUKoSwAWW6nQ8BepTG9qRofvYQkIqKEjxR5Dq+FUDGiSHCG5c4jhMvI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701958405; c=relaxed/simple; bh=btkaU5Jm399+eZASSfi8YVRtR5/wF1YACgvy5q1ANMA=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=XT6NF/VNV87qyJ1G4F4FQy5lkXx8H5d+oVXcR4SgsSvLm97JKYqKf74VvmClUMxqzu34Na8vjDVSEtXUhEKb7ZeBoceeGWdfiVEAOxOMqS8h1D9sV7yKjO3lW65ZUxzl2BCnr4lnhqG07NPeTOqkiupD4G3xMepvpQBRX9TagQ0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lf1-x134.google.com with SMTP id 2adb3069b0e04-50bef9b7a67so837762e87.1 for ; Thu, 07 Dec 2023 06:13:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701958392; x=1702563192; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=okFxVC2asmpGA7OStakGlUKqR5fiQzY6NUb/NELLKBU=; b=BVwh/cZiSKERsy6UAu9+df92FXmW9pZXTyPLvM1bTqQVNIVdN4vcpqiZHf1+oeM00u no9MjOCJIwStfC4Va+a5M2U6feExpNx75bLeIKZBN1ahloFMSefMMwya3oCMzmw26E7y nIg7p5PTV/b7R/QksRzSuks9NjnQSQWrSg8Irwck8ReLqmN1QyUXeFGabWPXTjn13J9e mJ2JCKzYHq7Vo9cZfXvR9fsGsp1URdib2A87ecg8FIbvienaS7tWmJD8SVlLeQqIpDWv bclbvupP2OJZkkoIyVi1l98vFgynUcQRVfLQqXHInm6kX7HWLrnDYF+GWgZR0/2giP17 D5iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701958392; x=1702563192; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=okFxVC2asmpGA7OStakGlUKqR5fiQzY6NUb/NELLKBU=; b=XwENP5grlf/yx4yyTZUZsjvsKXrft6hSbrEF9DrnNsopheWqdMkoLVzc3RPGfDiKai NRGCCf1QMDGTL5NRcXmV7w301gzmj+UDaM7tYCZGNg57P4bBpqOCb5LdgrCgDs3C3Fan bpA8r0PbpTqTVQEfaRQQ05FAVZxvXRd1FcVt7c1TI42F+3J3W9KlFZOtQP8WoPh3jsFH agS5v351sqCqUI1RyZC6Sc1e178ZgQ6HN9TQiodWAHQ4zs9gtjtYpqufNhG8uKRDU0Vc dD4CcYpvQMMe7zjp8j5mLm7JsdP3dmQU6DKxRt0LHitUKAVUnapzHytgCWsvidJ+Sdfg /SpA== X-Gm-Message-State: AOJu0YzaUDz7ZBfVfqzuzSFApalvE8HLSzWd4xN0JFKqDA4zjHUGw/Uc P0NMk6zfPUSEeRDLsYiGf7T2+eCrDFss7Ulbh2r7jpJogs8= X-Google-Smtp-Source: AGHT+IF7mRRjD7MhfepQOB0/vVLBDRM9VQx7NesdX4FDHb1E8H3JXQLlvzp4Wyd0YTFIH4Ti1JpeVJKnzXO+Bnvp9+Y= X-Received: by 2002:a05:6512:786:b0:50b:c147:4ac with SMTP id x6-20020a056512078600b0050bc14704acmr1500773lfr.66.1701958391759; Thu, 07 Dec 2023 06:13:11 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Thu, 7 Dec 2023 15:12:03 +0100 Message-ID: Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag To: Hao Liu OS Cc: "GCC-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Dec 7, 2023 at 9:59=E2=80=AFAM Hao Liu OS wrote: > > > Can you try to do some statistics on say SPEC CPU? I'm usually > > building (with -j1) with -fopt-info-vec and diff build logs, you can th= en see > > how many more loops (and which) we vectorize additionally? > > I tried this option with SPEC2017 intrate+fprate and count the "optimized= : " > lines. Five more loops are vectorized (aarch64-linux-gnu: O3/Ofast for in= t/fp): > > | O3/Ofast | before | after | additionally vectorized | > | --------------- | ------ | ----- | --------------------------- | > | 502.gcc_r | 1075 | 1076 | reload1.c:1934 | > | 510.parest_r | 9818 | 9819 | fe_dgp_monomial.cc:104 | > | 523.xalancbmk_r | 4791 | 4824 | XMLReader.cpp:1650 | > | 526.blender_r | 4520 | 4522 | infback.c:441,inflate.c:983 | > > All of them access arrays with unsigned offsets, which are previously tho= ught > can be overflow. E.g., the case in 502.gcc: > > unsigned int i; > unsigned int this_nregs =3D ...; > > for (j =3D 1; j < this_nregs; j++) > { > this_cost +=3D spill_add_cost[regno + j]; > if ((TEST_HARD_REG_BIT (not_usable, regno + j)) > || TEST_HARD_REG_BIT (used_by_other_reload, regno + j)) > ok =3D 0; > } > > However, as they are not hot, the performance is not affected. I measured= the build time, which is also not affected. With "-flto", more benchmarks = (12 in total) will be affected (details are not analyzed): > > | O3/Ofast + -flto | before | after | diff | > | ---------------- | ------ | ----- | ---- | > | 502.gcc_r | 979 | 980 | +1 | > | 507.cactuBSSN_r | 3454 | 3458 | +4 | > | 508.namd_r | 858 | 857 | -1 | > | 510.parest_r | 1575 | 1577 | +2 | > | 511.povray_r | 810 | 812 | +2 | > | 521.wrf_r | 8769 | 8763 | -6 | > | 523.xalancbmk_r | 3959 | 3979 | +20 | > | 526.blender_r | 4580 | 4575 | -5 | > | 527.cam4_r | 2371 | 2370 | -1 | > | 538.imagick_r | 462 | 461 | -1 | > | 549.fotonik3d_r | 436 | 437 | +1 | > | 554.roms_r | 852 | 851 | -1 | > > I think using unsigned index to access array should be rare. Programmers > tend to use "for (int i; ...)" instead of unsigned values. But there may = be > special requirements. This opportunity is found in a real application, wh= ich > has hot loops with such unsigned access pattern, and it can get huge > improvements. Yes, I can see that. I think the patch is OK with a minor nit - can you please document the nothrow flag usage in TREE_CHREC in tree-core.h? There's a big comment doing flags documentation: /* The following table lists the uses of each of the above flags and for which types of nodes they are defined. ... OK with that change. Let's see how it goes ... Thanks, Richard. > Thanks, > Hao > > ________________________________________ > From: Richard Biener > Sent: Wednesday, December 6, 2023 19:49 > To: Hao Liu OS > Cc: GCC-patches@gcc.gnu.org > Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec = tree with a nonwrapping flag > > On Wed, Dec 6, 2023 at 10:46=E2=80=AFAM Hao Liu OS wrote: > > > > Hi, > > > > Update the patch to fix problems in the test case: > > - add "-details" option to the dump command > > - add dg-require and target filters to avoid potential failures on pla= tforms that don't support vectorization. > > Interesting simple trick - the downside is that this makes the > recursive dependence > of SCEV on niter analysis and niter analysis on SCEV even "worse". Also = you > set the flag on CHRECs that are not necessarily cached, so I'm not sure h= ow > effective this will be ... > > Can you try to do some statistics on say SPEC CPU? I'm usually > building (with -j1) with -fopt-info-vec and diff build logs, you can then= see > how many more loops (and which) we vectorize additionally? > > Thanks, > Richard. > > > Thanks, > > -Hao > > > > gcc/ChangeLog: > > > > PR tree-optimization/112774 > > * tree-pretty-print.cc: if nonwrapping flag is set, chrec will = be > > printed with additional info. > > * tree-scalar-evolution.cc: add record_nonwrapping_chrec and > > nonwrapping_chrec_p to set and check the new flag respectively. > > * tree-scalar-evolution.h: Likewise. > > * tree-ssa-loop-niter.cc (idx_infer_loop_bounds, > > infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_si= gnedness, > > scev_probably_wraps_p): call record_nonwrapping_chrec before > > record_nonwrapping_iv, call nonwrapping_chrec_p to check the fl= ag is set and > > return false from scev_probably_wraps_p. > > * tree-vect-loop.cc (vect_analyze_loop): call > > free_numbers_of_iterations_estimates explicitly. > > * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used > > to represent the nonwrapping info. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/tree-ssa/scev-16.c: New test. > > --- > > gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++++++++++++++++++ > > gcc/tree-pretty-print.cc | 2 +- > > gcc/tree-scalar-evolution.cc | 24 ++++++++++++++++++++++++ > > gcc/tree-scalar-evolution.h | 2 ++ > > gcc/tree-ssa-loop-niter.cc | 21 ++++++++++++++++----- > > gcc/tree-vect-loop.cc | 4 ++++ > > gcc/tree.h | 8 +++++--- > > 7 files changed, 70 insertions(+), 9 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gc= c.dg/tree-ssa/scev-16.c > > new file mode 100644 > > index 00000000000..120f40c0b6c > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ > > + > > +int A[1024 * 2]; > > + > > +int foo (unsigned offset, unsigned N) > > +{ > > + int sum =3D 0; > > + > > + for (unsigned i =3D 0; i < N; i++) > > + sum +=3D A[i + offset]; > > + > > + return sum; > > +} > > + > > +/* Loop can be vectorized by referring "i + offset" is nonwrapping fro= m array. */ > > +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { = ! { avr-*-* msp430-*-* pru-*-* } } } } } */ > > diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc > > index 1fadd752d05..0dabb6d1580 100644 > > --- a/gcc/tree-pretty-print.cc > > +++ b/gcc/tree-pretty-print.cc > > @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node,= int spc, dump_flags_t flags, > > dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false); > > pp_string (pp, ", +, "); > > dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false); > > - pp_string (pp, "}_"); > > + pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}_"); > > pp_scalar (pp, "%u", CHREC_VARIABLE (node)); > > is_stmt =3D false; > > break; > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.c= c > > index f61277c32df..81630603c12 100644 > > --- a/gcc/tree-scalar-evolution.cc > > +++ b/gcc/tree-scalar-evolution.cc > > @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree= var) > > return res; > > } > > > > +/* If CHREC doesn't overflow, set the nonwrapping flag. */ > > + > > +void record_nonwrapping_chrec (tree chrec) > > +{ > > + CHREC_NOWRAP(chrec) =3D 1; > > + > > + if (dump_file && (dump_flags & TDF_SCEV)) > > + { > > + fprintf (dump_file, "(record_nonwrapping_chrec: "); > > + print_generic_expr (dump_file, chrec); > > + fprintf (dump_file, ")\n"); > > + } > > +} > > + > > +/* Return true if CHREC's nonwrapping flag is set. */ > > + > > +bool nonwrapping_chrec_p (tree chrec) > > +{ > > + if (!chrec || TREE_CODE(chrec) !=3D POLYNOMIAL_CHREC) > > + return false; > > + > > + return CHREC_NOWRAP(chrec); > > +} > > + > > /* Analyzes and returns the scalar evolution of VAR address in LOOP. = */ > > > > static tree > > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h > > index a64ed78fe63..f57fde12ee2 100644 > > --- a/gcc/tree-scalar-evolution.h > > +++ b/gcc/tree-scalar-evolution.h > > @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tr= ee, struct affine_iv *, > > bool); > > extern bool iv_can_overflow_p (class loop *, tree, tree, tree); > > extern tree compute_overall_effect_of_inner_loop (class loop *, tree); > > +extern void record_nonwrapping_chrec (tree); > > +extern bool nonwrapping_chrec_p (tree); > > > > /* Returns the basic block preceding LOOP, or the CFG entry block when > > the loop is function's body. */ > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc > > index 2098bef9a97..d465e0ed7e1 100644 > > --- a/gcc/tree-ssa-loop-niter.cc > > +++ b/gcc/tree-ssa-loop-niter.cc > > @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, vo= id *dta) > > > > /* If access is not executed on every iteration, we must ensure that= overlow > > may not make the access valid later. */ > > - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->s= tmt)) > > - && scev_probably_wraps_p (NULL_TREE, > > - initial_condition_in_loop_num (ev, loop= ->num), > > - step, data->stmt, loop, true)) > > - upper =3D false; > > + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->s= tmt))) > > + { > > + if (scev_probably_wraps_p (NULL_TREE, > > + initial_condition_in_loop_num (ev, loo= p->num), > > + step, data->stmt, loop, true)) > > + upper =3D false; > > + } > > + else > > + record_nonwrapping_chrec (ev); > > > > record_nonwrapping_iv (loop, init, step, data->stmt, low, high, fals= e, upper); > > return true; > > @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop = *loop, gimple *stmt) > > if (flag_delete_null_pointer_checks && int_cst_value (low) =3D=3D 0) > > low =3D build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYP= E (type))); > > > > + record_nonwrapping_chrec (scev); > > record_nonwrapping_iv (loop, base, step, stmt, low, high, false, tru= e); > > } > > > > @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *lo= op, gimple *stmt) > > high =3D wide_int_to_tree (type, r.upper_bound ()); > > } > > > > + record_nonwrapping_chrec (scev); > > record_nonwrapping_iv (loop, base, step, stmt, low, high, false, tru= e); > > } > > > > @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree= step, > > if (loop_exits_before_overflow (base, step, at_stmt, loop)) > > return false; > > > > + /* Check the nonwrapping flag, which may be set by niter analysis (e= .g., the > > + above loop exits before overflow). */ > > + if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)= )) > > + return false; > > + > > /* At this point we still don't have a proof that the iv does not > > overflow: give up. */ > > return true; > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index 3df020d2228..7f278e2d8f4 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_sh= ared *shared) > > analysis are done under the assumptions. */ > > loop_constraint_set (loop, LOOP_C_FINITE); > > } > > + else > > + /* Clear the existing niter information to make sure the nonwrappi= ng flag > > + will be calculated and set propriately. */ > > + free_numbers_of_iterations_estimates (loop); > > > > auto_vector_modes vector_modes; > > /* Autodetect first vector size we try. */ > > diff --git a/gcc/tree.h b/gcc/tree.h > > index 086b55f0375..59af8920f02 100644 > > --- a/gcc/tree.h > > +++ b/gcc/tree.h > > @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers > > #define COND_EXPR_ELSE(NODE) (TREE_OPERAND (COND_EXPR_CHECK (NODE), = 2)) > > > > /* Accessors for the chains of recurrences. */ > > -#define CHREC_LEFT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK= (NODE), 0) > > -#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK= (NODE), 1) > > -#define CHREC_VARIABLE(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.= u.chrec_var > > +#define CHREC_LEFT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (= NODE), 0) > > +#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (= NODE), 1) > > +#define CHREC_VARIABLE(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.u.= chrec_var > > +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping). */ > > +#define CHREC_NOWRAP(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.no= throw_flag > > > > /* LABEL_EXPR accessor. This gives access to the label associated with > > the given label expression. */ > > -- > > 2.34.1 > > > > > > ________________________________________ > > From: Hao Liu OS > > Sent: Monday, December 4, 2023 18:05 > > To: GCC-patches@gcc.gnu.org > > Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tr= ee with a nonwrapping flag > > > > Loop vecotorization can not optimize following case due to SCEV is not = affine > > failure (i+offset may overflow): > > > > int A[1024 * 2]; > > > > int foo (unsigned offset, unsigned N) > > { > > int sum =3D 0; > > for (unsigned i =3D 0; i < N; i++) > > sum +=3D A[i + offset]; > > return sum; > > } > > > > Actually, niter pass can find nonwrapping induction variables (i.e., i = + offset > > can not overflow) by inferring from undefined behaviors like array acce= ss (see > > record_nonwrapping_iv). But this information is not shared to SCEV yet.= This > > patch adds a nonwrapping flag to the chrec tree, which allows SCEV to r= e-use it > > to pass the nonwrapping checks like scev_probably_wraps_p, and finaly l= oop > > vectorization could succeed. > > > > The new flag is defined as CHREC_NOWRAP(tree), and the dump info is cha= nged from > > "{offset, +, 1}_1" -> "{offset, +, 1}_1" (nw is short for nonwrappi= ng). Two > > SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are ad= ded to > > set and check the flag respectively. > > > > However, an extra problem is caused by resetting SCEV cache (i.e., scev= _reset or > > reset_scev_htab), which may not be synchronized with the calling of > > free_numbers_of_iterations_estimates, which set the loop->estimate_stat= e to > > EST_NOT_COMPUTED and make sure the above inferring from array access is= called. > > In other words, the nonwrapping flag could be cleared and lost by reset= ting SCEV > > cache if the loop->estimate_state is not reset. > > E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist, > > which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the > > estimate_state is still kept as EST_AVAILABLE and the flag will not be = set in > > loop vectorization. > > > > This patch uses a simple fix by calling free_numbers_of_iterations_esti= mates in > > vect_analyze_loop, which will make sure the flag is always set propriat= ely in > > loop vectorization. This fix is a bit ad-hoc (works for loop vectorizat= ion > > only), if there is more reasonable method, I will revert the simple fix= and try > > that. > > > > This patch is bootstrapped and tested on aarch64-linux-gnu with no regr= essions. > > OK for trunk? > > > > --- > > The patch is as following: > > > > [PATCH] SCEV: extend the chrec tree with a nonwrapping flag > > [PR112774] > > > > The flag is defined as CHREC_NOWRAP(tree), and will be dumped from > > "{offset, +, 1}_1" to "{offset, +, 1}_1" (nw is short for nonwrappi= ng). > > Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p ar= e > > added to set and check the flag respectively. > > > > As resetting the SCEV cache (i.e., the chrec trees) may not reset the > > loop->estimate_state, free_numbers_of_iterations_estimates is called > > explicitly in loop vectorization to make sure the flag can be > > calculated propriately by niter. > > > > gcc/ChangeLog: > > > > PR tree-optimization/112774 > > * tree-pretty-print.cc: if nonwrapping flag is set, chrec will = be > > printed with additional info. > > * tree-scalar-evolution.cc: add record_nonwrapping_chrec and > > nonwrapping_chrec_p to set and check the new flag respectively. > > * tree-scalar-evolution.h: Likewise. > > * tree-ssa-loop-niter.cc (idx_infer_loop_bounds, > > infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_si= gnedness, > > scev_probably_wraps_p): call record_nonwrapping_chrec before > > record_nonwrapping_iv, call nonwrapping_chrec_p to check the fl= ag is set and > > return false from scev_probably_wraps_p. > > * tree-vect-loop.cc (vect_analyze_loop): call > > free_numbers_of_iterations_estimates explicitly. > > * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used > > to represent the nonwrapping info. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/tree-ssa/scev-16.c: New test. > > --- > > gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++ > > gcc/tree-pretty-print.cc | 2 +- > > gcc/tree-scalar-evolution.cc | 24 ++++++++++++++++++++++++ > > gcc/tree-scalar-evolution.h | 2 ++ > > gcc/tree-ssa-loop-niter.cc | 21 ++++++++++++++++----- > > gcc/tree-vect-loop.cc | 4 ++++ > > gcc/tree.h | 8 +++++--- > > 7 files changed, 69 insertions(+), 9 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gc= c.dg/tree-ssa/scev-16.c > > new file mode 100644 > > index 00000000000..96ea36e4c65 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c > > @@ -0,0 +1,17 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O3 -fdump-tree-vect-scev" } */ > > + > > +int A[1024 * 2]; > > + > > +int foo (unsigned offset, unsigned N) > > +{ > > + int sum =3D 0; > > + > > + for (unsigned i =3D 0; i < N; i++) > > + sum +=3D A[i + offset]; > > + > > + return sum; > > +} > > + > > +/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */ > > +/* { dg-final { scan-tree-dump-not "missed: failed: evolution of offs= et is not affine" "vect" } } */ > > diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc > > index 1fadd752d05..0dabb6d1580 100644 > > --- a/gcc/tree-pretty-print.cc > > +++ b/gcc/tree-pretty-print.cc > > @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node,= int spc, dump_flags_t flags, > > dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false); > > pp_string (pp, ", +, "); > > dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false); > > - pp_string (pp, "}_"); > > + pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}_"); > > pp_scalar (pp, "%u", CHREC_VARIABLE (node)); > > is_stmt =3D false; > > break; > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.c= c > > index 065bcd0743d..a9112572e0c 100644 > > --- a/gcc/tree-scalar-evolution.cc > > +++ b/gcc/tree-scalar-evolution.cc > > @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree= var) > > return res; > > } > > > > +/* If CHREC doesn't overflow, set the nonwrapping flag. */ > > + > > +void record_nonwrapping_chrec (tree chrec) > > +{ > > + CHREC_NOWRAP(chrec) =3D 1; > > + > > + if (dump_file && (dump_flags & TDF_SCEV)) > > + { > > + fprintf (dump_file, "(record_nonwrapping_chrec: "); > > + print_generic_expr (dump_file, chrec); > > + fprintf (dump_file, ")\n"); > > + } > > +} > > + > > +/* Return true if CHREC's nonwrapping flag is set. */ > > + > > +bool nonwrapping_chrec_p (tree chrec) > > +{ > > + if (!chrec || TREE_CODE(chrec) !=3D POLYNOMIAL_CHREC) > > + return false; > > + > > + return CHREC_NOWRAP(chrec); > > +} > > + > > /* Analyzes and returns the scalar evolution of VAR address in LOOP. = */ > > > > static tree > > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h > > index a64ed78fe63..f57fde12ee2 100644 > > --- a/gcc/tree-scalar-evolution.h > > +++ b/gcc/tree-scalar-evolution.h > > @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tr= ee, struct affine_iv *, > > bool); > > extern bool iv_can_overflow_p (class loop *, tree, tree, tree); > > extern tree compute_overall_effect_of_inner_loop (class loop *, tree); > > +extern void record_nonwrapping_chrec (tree); > > +extern bool nonwrapping_chrec_p (tree); > > > > /* Returns the basic block preceding LOOP, or the CFG entry block when > > the loop is function's body. */ > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc > > index 2098bef9a97..d465e0ed7e1 100644 > > --- a/gcc/tree-ssa-loop-niter.cc > > +++ b/gcc/tree-ssa-loop-niter.cc > > @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, vo= id *dta) > > > > /* If access is not executed on every iteration, we must ensure that= overlow > > may not make the access valid later. */ > > - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->s= tmt)) > > - && scev_probably_wraps_p (NULL_TREE, > > - initial_condition_in_loop_num (ev, loop= ->num), > > - step, data->stmt, loop, true)) > > - upper =3D false; > > + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->s= tmt))) > > + { > > + if (scev_probably_wraps_p (NULL_TREE, > > + initial_condition_in_loop_num (ev, loo= p->num), > > + step, data->stmt, loop, true)) > > + upper =3D false; > > + } > > + else > > + record_nonwrapping_chrec (ev); > > > > record_nonwrapping_iv (loop, init, step, data->stmt, low, high, fals= e, upper); > > return true; > > @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop = *loop, gimple *stmt) > > if (flag_delete_null_pointer_checks && int_cst_value (low) =3D=3D 0) > > low =3D build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYP= E (type))); > > > > + record_nonwrapping_chrec (scev); > > record_nonwrapping_iv (loop, base, step, stmt, low, high, false, tru= e); > > } > > > > @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *lo= op, gimple *stmt) > > high =3D wide_int_to_tree (type, r.upper_bound ()); > > } > > > > + record_nonwrapping_chrec (scev); > > record_nonwrapping_iv (loop, base, step, stmt, low, high, false, tru= e); > > } > > > > @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree= step, > > if (loop_exits_before_overflow (base, step, at_stmt, loop)) > > return false; > > > > + /* Check the nonwrapping flag, which may be set by niter analysis (e= .g., the > > + above loop exits before overflow). */ > > + if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)= )) > > + return false; > > + > > /* At this point we still don't have a proof that the iv does not > > overflow: give up. */ > > return true; > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index dd584ab4a42..6261cd1be1d 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_sh= ared *shared) > > analysis are done under the assumptions. */ > > loop_constraint_set (loop, LOOP_C_FINITE); > > } > > + else > > + /* Clear the existing niter information to make sure the nonwrappi= ng flag > > + will be calculated and set propriately. */ > > + free_numbers_of_iterations_estimates (loop); > > > > auto_vector_modes vector_modes; > > /* Autodetect first vector size we try. */ > > diff --git a/gcc/tree.h b/gcc/tree.h > > index 086b55f0375..59af8920f02 100644 > > --- a/gcc/tree.h > > +++ b/gcc/tree.h > > @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers > > #define COND_EXPR_ELSE(NODE) (TREE_OPERAND (COND_EXPR_CHECK (NODE), = 2)) > > > > /* Accessors for the chains of recurrences. */ > > -#define CHREC_LEFT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK= (NODE), 0) > > -#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK= (NODE), 1) > > -#define CHREC_VARIABLE(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.= u.chrec_var > > +#define CHREC_LEFT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (= NODE), 0) > > +#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (= NODE), 1) > > +#define CHREC_VARIABLE(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.u.= chrec_var > > +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping). */ > > +#define CHREC_NOWRAP(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.no= throw_flag > > > > /* LABEL_EXPR accessor. This gives access to the label associated with > > the given label expression. */ > > -- > > 2.34.1 > >