From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-475353-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14761 invoked by alias); 29 Jan 2015 06:48:38 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 14635 invoked by uid 48); 29 Jan 2015 06:48:30 -0000
From: "amker at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/62173] [5.0 regression] 64bit Arch can't ivopt while 32bit Arch can
Date: Thu, 29 Jan 2015 06:48:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: amker at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: jiwang at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-62173-4-RDtFxEuaVG@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-62173-4@http.gcc.gnu.org/bugzilla/>
References: <bug-62173-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2015-01/txt/msg03347.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173
--- Comment #30 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #17)
> I really wonder why IVOPTs calls convert_affine_scev with
> !use_overflow_semantics.
> 
> Note that for the original testcase 'i' may be negative or zero and thus 'd'
> may be zero.  We do a bad analysis here because IVOPTs follows complete
> peeling immediately...  but at least we have range information that looks
> useful:
> 
>   <bb 16>:
>   # RANGE [0, 10] NONZERO 15
>   # d_26 = PHI <i_6(D)(15), d_13(17)>
>   # RANGE [0, 9] NONZERO 15
>   d_13 = d_26 + -1;
>   _14 = A[d_26];
>   # RANGE [0, 255] NONZERO 255
>   _15 = (int) _14;
>   # USE = nonlocal
>   # CLB = nonlocal
>   foo (_15);
>   if (d_13 != 0)
>     goto <bb 17>;
>   else
>     goto <bb 3>;
> 
>   <bb 17>:
>   goto <bb 16>;
> 
> but unfortunately we expand the initial value of the IV for d all the way
> to i_6(D) so we don't see that i_6(D) is constrained by the range for d_26.
> 
> So when we are in idx_find_step before we replace *idx with iv->base
> we could check range-information on whether it wrapped.  Hmm, I think
> we can't really compute this.  But we can transfer range information
> (temporarily) from d_26 to iv->base i_6(D) and make use of that in
> scev_probably_wraps_p.  There we currently compute whether
> (unsigned) i_6(D) + 2147483648 (??) > 9 using fold_binary but with
> range information [0, 10] it would compute as false (huh, so what is it
> actually testing?!).  I think the computation of 'delta' should instead
> be adjusted to use range information - max for negative step and min
> for positive step.  Like the following:
> 
> Index: gcc/tree-ssa-loop-niter.c
> ===================================================================
> --- gcc/tree-ssa-loop-niter.c   (revision 220038)
> +++ gcc/tree-ssa-loop-niter.c   (working copy)
> @@ -3863,12 +3863,17 @@ scev_probably_wraps_p (tree base, tree s
>       bound of the type, and verify that the loop is exited before this
>       occurs.  */
>    unsigned_type = unsigned_type_for (type);
> -  base = fold_convert (unsigned_type, base);
> -
>    if (tree_int_cst_sign_bit (step))
>      {
>        tree extreme = fold_convert (unsigned_type,
>                                    lower_bound_in_type (type, type));
> +      wide_int min, max;
> +      if (TREE_CODE (base) == SSA_NAME
> +         && INTEGRAL_TYPE_P (TREE_TYPE (base))
> +         && get_range_info (base, &min, &max) == VR_RANGE)
> +       base = wide_int_to_tree (unsigned_type, max);
> +      else
> +       base = fold_convert (unsigned_type, base);
>        delta = fold_build2 (MINUS_EXPR, unsigned_type, base, extreme);
>        step_abs = fold_build1 (NEGATE_EXPR, unsigned_type,
>                               fold_convert (unsigned_type, step));
> @@ -3877,6 +3882,13 @@ scev_probably_wraps_p (tree base, tree s
>      {
>        tree extreme = fold_convert (unsigned_type,
>                                    upper_bound_in_type (type, type));
> +      wide_int min, max;
> +      if (TREE_CODE (base) == SSA_NAME
> +         && INTEGRAL_TYPE_P (TREE_TYPE (base))
> +         && get_range_info (base, &min, &max) == VR_RANGE)
> +       base = wide_int_to_tree (unsigned_type, min);
> +      else
> +       base = fold_convert (unsigned_type, base);
>        delta = fold_build2 (MINUS_EXPR, unsigned_type, extreme, base);
>        step_abs = fold_convert (unsigned_type, step);
>      }
> 
> doesn't really help this case unless i_6(D) gets range-information transfered
> temporarily as I said above, of course.

As you said, range information for i_6(D) actually is flow-sensitive
information, we can't simply propagate range_info(d) to i_6(D) generally.  Even
if we can, the code change is not natural since the related functions are
scattered across different functions (ivopt/chrec/niter).
So I take the other way around by passing the IV's ssa_name into
scev_probably_wraps_p along call sequence
"idx_find_step->convert_affine_scev->scev_probably_wraps".  Since the IV's
ssa_name is tagged with right range information, we can use it when proving
there is no overflow/wrap in src scev.

This mothed works and GCC now recognizes the address iv use in A[d].

BUT, the problem not only exists in address type iv's code, but also in compare
type iv's.  The IV dump file is now as below:


  <bb 8>:
  _4 = (sizetype) i_6(D);
  _3 = &A + _4;
  ivtmp.11_17 = (unsigned long) _3;
  _1 = (sizetype) i_6(D);
  _2 = (unsigned int) i_6(D);
  _22 = _2 + 4294967295;
  _21 = (sizetype) _22;
  _20 = _1 - _21;
  _29 = _20 + 18446744073709551615;
  _30 = &A + _29;
  _31 = (unsigned long) _30;

  <bb 9>:
  # ivtmp.11_18 = PHI <ivtmp.11_17(8), ivtmp.11_8(11)>
  _5 = (void *) ivtmp.11_18;
  _14 = MEM[base: _5, offset: 0B];
  foo (_14);
  ivtmp.11_8 = ivtmp.11_18 - 1;
  if (ivtmp.11_8 != _31)
    goto <bb 11>;
  else
    goto <bb 10>;

  <bb 10>:
  goto <bb 3>;

  <bb 11>:
  goto <bb 9>;

The loop preheader is bloated by computing "cand_value_at (loop, cand,
use->stmt, desc->niter, &bnd);" in function "may_eliminate_iv" and we have:

(gdb) call debug_generic_expr(cand->iv->base)
(unsigned long) ((char *) &A + (sizetype) i_6(D))
(gdb) call debug_generic_expr(cand->iv->step)
18446744073709551615
(gdb) call debug_generic_expr(desc->niter)
(unsigned int) (i_6(D) + -1)

GCC is not aware of RANGE_INFO(i_6(D)): [1:10], it doesn't know that below
condition holds:
  (unsigned long)(unsigned int)(i_6(D) + -1)  == (unsigned long)(i_6(D) + -1)


The problem is niter is computed in unsigned version type of control iv, which
is unsigned int because control biv (d) is of type int.  But the candidate is
of larger type "unsigned long" on 64 bits target.

I really think LLVM's front-end makes better decision by promoting the index of
array_ref A[d] to 64 bits signed type, while GCC keeps it in 32 bits.