public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Hao Liu OS <hliu@os.amperecomputing.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: "GCC-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
Date: Thu, 7 Dec 2023 08:54:23 +0000	[thread overview]
Message-ID: <SN6PR0102MB3487E160228E89466FD31EA7E18BA@SN6PR0102MB3487.prod.exchangelabs.com> (raw)
In-Reply-To: <CAFiYyc30QnqvrjGGfoA+fnSY=6egOMNpbq2KDyGhNgcDo5Uv_w@mail.gmail.com>

> Can you try to do some statistics on say SPEC CPU?  I'm usually
> building (with -j1) with -fopt-info-vec and diff build logs, you can then see
> how many more loops (and which) we vectorize additionally?

I tried this option with SPEC2017 intrate+fprate and count the "optimized: "
lines. Five more loops are vectorized (aarch64-linux-gnu: O3/Ofast for int/fp):

    | O3/Ofast        | before | after | additionally vectorized     |
    | --------------- | ------ | ----- | --------------------------- |
    | 502.gcc_r       | 1075   | 1076  | reload1.c:1934              |
    | 510.parest_r    | 9818   | 9819  | fe_dgp_monomial.cc:104      |
    | 523.xalancbmk_r | 4791   | 4824  | XMLReader.cpp:1650          |
    | 526.blender_r   | 4520   | 4522  | infback.c:441,inflate.c:983 |

All of them access arrays with unsigned offsets, which are previously thought
can be overflow. E.g., the case in 502.gcc:

    unsigned int i;
    unsigned int this_nregs = ...;

    for (j = 1; j < this_nregs; j++)
      {
        this_cost += spill_add_cost[regno + j];
        if ((TEST_HARD_REG_BIT (not_usable, regno + j))
          || TEST_HARD_REG_BIT (used_by_other_reload, regno + j))
          ok = 0;
      }

However, as they are not hot, the performance is not affected. I measured the build time, which is also not affected. With "-flto", more benchmarks (12 in total) will be affected (details are not analyzed):

    | O3/Ofast + -flto | before | after | diff |
    | ---------------- | ------ | ----- | ---- |
    | 502.gcc_r        | 979    | 980   | +1   |
    | 507.cactuBSSN_r  | 3454   | 3458  | +4   |
    | 508.namd_r       | 858    | 857   | -1   |
    | 510.parest_r     | 1575   | 1577  | +2   |
    | 511.povray_r     | 810    | 812   | +2   |
    | 521.wrf_r        | 8769   | 8763  | -6   |
    | 523.xalancbmk_r  | 3959   | 3979  | +20  |
    | 526.blender_r    | 4580   | 4575  | -5   |
    | 527.cam4_r       | 2371   | 2370  | -1   |
    | 538.imagick_r    | 462    | 461   | -1   |
    | 549.fotonik3d_r  | 436    | 437   | +1   |
    | 554.roms_r       | 852    | 851   | -1   |

I think using unsigned index to access array should be rare. Programmers
tend to use "for (int i; ...)" instead of unsigned values. But there may be
special requirements. This opportunity is found in a real application, which
has hot loops with such unsigned access pattern, and it can get huge
improvements.

Thanks,
Hao

________________________________________
From: Richard Biener <richard.guenther@gmail.com>
Sent: Wednesday, December 6, 2023 19:49
To: Hao Liu OS
Cc: GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag

On Wed, Dec 6, 2023 at 10:46 AM Hao Liu OS <hliu@os.amperecomputing.com> wrote:
>
> Hi,
>
> Update the patch to fix problems in the test case:
>  - add "-details" option to the dump command
>  - add dg-require and target filters to avoid potential failures on platforms that don't support vectorization.

Interesting simple trick - the downside is that this makes the
recursive dependence
of SCEV on niter analysis and niter analysis on SCEV even "worse".  Also you
set the flag on CHRECs that are not necessarily cached, so I'm not sure how
effective this will be ...

Can you try to do some statistics on say SPEC CPU?  I'm usually
building (with -j1) with -fopt-info-vec and diff build logs, you can then see
how many more loops (and which) we vectorize additionally?

Thanks,
Richard.

> Thanks,
> -Hao
>
> gcc/ChangeLog:
>
>         PR tree-optimization/112774
>         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
>         printed with additional <nw> info.
>         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
>         nonwrapping_chrec_p to set and check the new flag respectively.
>         * tree-scalar-evolution.h: Likewise.
>         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
>         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
>         scev_probably_wraps_p): call record_nonwrapping_chrec before
>         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
>         return false from scev_probably_wraps_p.
>         * tree-vect-loop.cc (vect_analyze_loop): call
>         free_numbers_of_iterations_estimates explicitly.
>         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
>         to represent the nonwrapping info.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/scev-16.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 18 ++++++++++++++++++
>  gcc/tree-pretty-print.cc                |  2 +-
>  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
>  gcc/tree-scalar-evolution.h             |  2 ++
>  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
>  gcc/tree-vect-loop.cc                   |  4 ++++
>  gcc/tree.h                              |  8 +++++---
>  7 files changed, 70 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> new file mode 100644
> index 00000000000..120f40c0b6c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
> +
> +int A[1024 * 2];
> +
> +int foo (unsigned offset, unsigned N)
> +{
> +  int sum = 0;
> +
> +  for (unsigned i = 0; i < N; i++)
> +    sum += A[i + offset];
> +
> +  return sum;
> +}
> +
> +/* Loop can be vectorized by referring "i + offset" is nonwrapping from array.  */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { ! { avr-*-* msp430-*-* pru-*-* } } } } } */
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 1fadd752d05..0dabb6d1580 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
>        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
>        pp_string (pp, ", +, ");
>        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> -      pp_string (pp, "}_");
> +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
>        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
>        is_stmt = false;
>        break;
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index f61277c32df..81630603c12 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
>    return res;
>  }
>
> +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> +
> +void record_nonwrapping_chrec (tree chrec)
> +{
> +  CHREC_NOWRAP(chrec) = 1;
> +
> +  if (dump_file && (dump_flags & TDF_SCEV))
> +    {
> +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> +      print_generic_expr (dump_file, chrec);
> +      fprintf (dump_file, ")\n");
> +    }
> +}
> +
> +/* Return true if CHREC's nonwrapping flag is set.  */
> +
> +bool nonwrapping_chrec_p (tree chrec)
> +{
> +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> +    return false;
> +
> +  return CHREC_NOWRAP(chrec);
> +}
> +
>  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
>
>  static tree
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index a64ed78fe63..f57fde12ee2 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
>                        bool);
>  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> +extern void record_nonwrapping_chrec (tree);
> +extern bool nonwrapping_chrec_p (tree);
>
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
>     the loop is function's body.  */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 2098bef9a97..d465e0ed7e1 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
>
>    /* If access is not executed on every iteration, we must ensure that overlow
>       may not make the access valid later.  */
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> -      && scev_probably_wraps_p (NULL_TREE,
> -                               initial_condition_in_loop_num (ev, loop->num),
> -                               step, data->stmt, loop, true))
> -    upper = false;
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> +    {
> +      if (scev_probably_wraps_p (NULL_TREE,
> +                                initial_condition_in_loop_num (ev, loop->num),
> +                                step, data->stmt, loop, true))
> +       upper = false;
> +    }
> +  else
> +    record_nonwrapping_chrec (ev);
>
>    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
> @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
>    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
>      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
>        high = wide_int_to_tree (type, r.upper_bound ());
>      }
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
>    if (loop_exits_before_overflow (base, step, at_stmt, loop))
>      return false;
>
> +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> +     above loop exits before overflow).  */
> +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> +    return false;
> +
>    /* At this point we still don't have a proof that the iv does not
>       overflow: give up.  */
>    return true;
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3df020d2228..7f278e2d8f4 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>          analysis are done under the assumptions.  */
>        loop_constraint_set (loop, LOOP_C_FINITE);
>      }
> +  else
> +    /* Clear the existing niter information to make sure the nonwrapping flag
> +       will be calculated and set propriately.  */
> +    free_numbers_of_iterations_estimates (loop);
>
>    auto_vector_modes vector_modes;
>    /* Autodetect first vector size we try.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 086b55f0375..59af8920f02 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
>  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
>
>  /* Accessors for the chains of recurrences.  */
> -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
>
>  /* LABEL_EXPR accessor. This gives access to the label associated with
>     the given label expression.  */
> --
> 2.34.1
>
>
> ________________________________________
> From: Hao Liu OS <hliu@os.amperecomputing.com>
> Sent: Monday, December 4, 2023 18:05
> To: GCC-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/PR112774 - SCEV: extend the chrec tree with a nonwrapping flag
>
> Loop vecotorization can not optimize following case due to SCEV is not affine
> failure (i+offset may overflow):
>
>     int A[1024 * 2];
>
>     int foo (unsigned offset, unsigned N)
>     {
>       int sum = 0;
>       for (unsigned i = 0; i < N; i++)
>         sum += A[i + offset];
>       return sum;
>     }
>
> Actually, niter pass can find nonwrapping induction variables (i.e., i + offset
> can not overflow) by inferring from undefined behaviors like array access (see
> record_nonwrapping_iv). But this information is not shared to SCEV yet. This
> patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it
> to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop
> vectorization could succeed.
>
> The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from
> "{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two
> SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to
> set and check the flag respectively.
>
> However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or
> reset_scev_htab), which may not be synchronized with the calling of
> free_numbers_of_iterations_estimates, which set the loop->estimate_state to
> EST_NOT_COMPUTED and make sure the above inferring from array access is called.
> In other words, the nonwrapping flag could be cleared and lost by resetting SCEV
> cache if the loop->estimate_state is not reset.
> E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist,
> which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the
> estimate_state is still kept as EST_AVAILABLE and the flag will not be set in
> loop vectorization.
>
> This patch uses a simple fix by calling free_numbers_of_iterations_estimates in
> vect_analyze_loop, which will make sure the flag is always set propriately in
> loop vectorization. This fix is a bit ad-hoc (works for loop vectorization
> only), if there is more reasonable method, I will revert the simple fix and try
> that.
>
> This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions.
> OK for trunk?
>
> ---
> The patch is as following:
>
> [PATCH] SCEV: extend the chrec tree with a nonwrapping flag
>  [PR112774]
>
> The flag is defined as CHREC_NOWRAP(tree), and will be dumped from
> "{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping).
> Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are
> added to set and check the flag respectively.
>
> As resetting the SCEV cache (i.e., the chrec trees) may not reset the
> loop->estimate_state, free_numbers_of_iterations_estimates is called
> explicitly in loop vectorization to make sure the flag can be
> calculated propriately by niter.
>
> gcc/ChangeLog:
>
>         PR tree-optimization/112774
>         * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be
>         printed with additional <nw> info.
>         * tree-scalar-evolution.cc: add record_nonwrapping_chrec and
>         nonwrapping_chrec_p to set and check the new flag respectively.
>         * tree-scalar-evolution.h: Likewise.
>         * tree-ssa-loop-niter.cc (idx_infer_loop_bounds,
>         infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness,
>         scev_probably_wraps_p): call record_nonwrapping_chrec before
>         record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and
>         return false from scev_probably_wraps_p.
>         * tree-vect-loop.cc (vect_analyze_loop): call
>         free_numbers_of_iterations_estimates explicitly.
>         * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used
>         to represent the nonwrapping info.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/scev-16.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++
>  gcc/tree-pretty-print.cc                |  2 +-
>  gcc/tree-scalar-evolution.cc            | 24 ++++++++++++++++++++++++
>  gcc/tree-scalar-evolution.h             |  2 ++
>  gcc/tree-ssa-loop-niter.cc              | 21 ++++++++++++++++-----
>  gcc/tree-vect-loop.cc                   |  4 ++++
>  gcc/tree.h                              |  8 +++++---
>  7 files changed, 69 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> new file mode 100644
> index 00000000000..96ea36e4c65
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-vect-scev" } */
> +
> +int A[1024 * 2];
> +
> +int foo (unsigned offset, unsigned N)
> +{
> +  int sum = 0;
> +
> +  for (unsigned i = 0; i < N; i++)
> +    sum += A[i + offset];
> +
> +  return sum;
> +}
> +
> +/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "missed:  failed: evolution of offset is not affine" "vect" } } */
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 1fadd752d05..0dabb6d1580 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
>        dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false);
>        pp_string (pp, ", +, ");
>        dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false);
> -      pp_string (pp, "}_");
> +      pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_");
>        pp_scalar (pp, "%u", CHREC_VARIABLE (node));
>        is_stmt = false;
>        break;
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index 065bcd0743d..a9112572e0c 100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var)
>    return res;
>  }
>
> +/* If CHREC doesn't overflow, set the nonwrapping flag.  */
> +
> +void record_nonwrapping_chrec (tree chrec)
> +{
> +  CHREC_NOWRAP(chrec) = 1;
> +
> +  if (dump_file && (dump_flags & TDF_SCEV))
> +    {
> +      fprintf (dump_file, "(record_nonwrapping_chrec: ");
> +      print_generic_expr (dump_file, chrec);
> +      fprintf (dump_file, ")\n");
> +    }
> +}
> +
> +/* Return true if CHREC's nonwrapping flag is set.  */
> +
> +bool nonwrapping_chrec_p (tree chrec)
> +{
> +  if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC)
> +    return false;
> +
> +  return CHREC_NOWRAP(chrec);
> +}
> +
>  /* Analyzes and returns the scalar evolution of VAR address in LOOP.  */
>
>  static tree
> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> index a64ed78fe63..f57fde12ee2 100644
> --- a/gcc/tree-scalar-evolution.h
> +++ b/gcc/tree-scalar-evolution.h
> @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *,
>                        bool);
>  extern bool iv_can_overflow_p (class loop *, tree, tree, tree);
>  extern tree compute_overall_effect_of_inner_loop (class loop *, tree);
> +extern void record_nonwrapping_chrec (tree);
> +extern bool nonwrapping_chrec_p (tree);
>
>  /* Returns the basic block preceding LOOP, or the CFG entry block when
>     the loop is function's body.  */
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 2098bef9a97..d465e0ed7e1 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta)
>
>    /* If access is not executed on every iteration, we must ensure that overlow
>       may not make the access valid later.  */
> -  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> -      && scev_probably_wraps_p (NULL_TREE,
> -                               initial_condition_in_loop_num (ev, loop->num),
> -                               step, data->stmt, loop, true))
> -    upper = false;
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)))
> +    {
> +      if (scev_probably_wraps_p (NULL_TREE,
> +                                initial_condition_in_loop_num (ev, loop->num),
> +                                step, data->stmt, loop, true))
> +       upper = false;
> +    }
> +  else
> +    record_nonwrapping_chrec (ev);
>
>    record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper);
>    return true;
> @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt)
>    if (flag_delete_null_pointer_checks && int_cst_value (low) == 0)
>      low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type)));
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt)
>        high = wide_int_to_tree (type, r.upper_bound ());
>      }
>
> +  record_nonwrapping_chrec (scev);
>    record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true);
>  }
>
> @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step,
>    if (loop_exits_before_overflow (base, step, at_stmt, loop))
>      return false;
>
> +  /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the
> +     above loop exits before overflow).  */
> +  if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var)))
> +    return false;
> +
>    /* At this point we still don't have a proof that the iv does not
>       overflow: give up.  */
>    return true;
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index dd584ab4a42..6261cd1be1d 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>          analysis are done under the assumptions.  */
>        loop_constraint_set (loop, LOOP_C_FINITE);
>      }
> +  else
> +    /* Clear the existing niter information to make sure the nonwrapping flag
> +       will be calculated and set propriately.  */
> +    free_numbers_of_iterations_estimates (loop);
>
>    auto_vector_modes vector_modes;
>    /* Autodetect first vector size we try.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 086b55f0375..59af8920f02 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers
>  #define COND_EXPR_ELSE(NODE)   (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2))
>
>  /* Accessors for the chains of recurrences.  */
> -#define CHREC_LEFT(NODE)          TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> -#define CHREC_RIGHT(NODE)         TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> -#define CHREC_VARIABLE(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +#define CHREC_LEFT(NODE)        TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0)
> +#define CHREC_RIGHT(NODE)       TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1)
> +#define CHREC_VARIABLE(NODE)    POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var
> +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping).  */
> +#define CHREC_NOWRAP(NODE)      POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag
>
>  /* LABEL_EXPR accessor. This gives access to the label associated with
>     the given label expression.  */
> --
> 2.34.1
>

  reply	other threads:[~2023-12-07  8:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 10:05 Hao Liu OS
2023-12-06  9:46 ` Hao Liu OS
2023-12-06 11:49   ` Richard Biener
2023-12-07  8:54     ` Hao Liu OS [this message]
2023-12-07 14:12       ` Richard Biener
2023-12-08  3:34         ` Hao Liu OS

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR0102MB3487E160228E89466FD31EA7E18BA@SN6PR0102MB3487.prod.exchangelabs.com \
    --to=hliu@os.amperecomputing.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).