From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-468761-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 6055 invoked by alias); 27 Nov 2014 12:16:01 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 5963 invoked by uid 55); 27 Nov 2014 12:15:56 -0000
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/62173] [5.0 regression] 64bit Arch can't ivopt while 32bit Arch can
Date: Thu, 27 Nov 2014 12:16:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 5.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: jiwang at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 5.0
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-62173-4-ALqeyNlyBF@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-62173-4@http.gcc.gnu.org/bugzilla/>
References: <bug-62173-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-11/txt/msg03233.txt.bz2

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173
--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 27 Nov 2014, rguenther at suse dot de wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173
> 
> --- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> ---
> On Thu, 27 Nov 2014, jiwang at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173
> > 
> > Jiong Wang <jiwang at gcc dot gnu.org> changed:
> > 
> >            What    |Removed                     |Added
> > ----------------------------------------------------------------------------
> >                  CC|                            |rguenth at gcc dot gnu.org
> >          Depends on|                            |52563
> > 
> > --- Comment #12 from Jiong Wang <jiwang at gcc dot gnu.org> ---
> > the root cause why it's not ivopted on AArch64 is because
> > 
> >   must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_PRECISION (type);
> > 
> > code above in convert_affine_scev, the input type is sizetype, so DI for 64bit
> > arch, SI for 32bit arch, ct is SI, thus, must_check_src_overflow set to true
> > for 64bit arch, then failed later scev_probably_wraps_p check.
> > 
> > And I found similar issue reported back in 2012, at bug 52563.
> > 
> > I verified this bug exist on other 64 archs, like mips64, ppc64, x86-64
> > 
> > Richard, on 52563, I see you was working on this, do you have any thoughts on
> > this?
> 
> See comment #5 of that bug.  For 4.8(?) I started on work to relax
> the type requirements of the offset parameter of POINTER_PLUS_EXPR
> by abstracting stuff but I didn't get to continue on that work.
> 
> Basically that we force the offset to 'sizetype' has both correctness
> issues (for targets where sizetype precision doesn't match Pmode
> precision) and optimization issues as we lose for example sign
> information and overflow knowledge in the computation of the offset.
> The last thing is also because we have transforms in fold which
> push typecasts of expressions down to operands - thus
> from (sizetype) (4 * i) we get 4 * (sizetype)i which may now be
> an unsigned multiplication with wrapping overflow.  Note that
> it is the frontends who start the conversion thing and apply some
> "tricks" for code-gen (see pointer_int_sum in c-common.c).
> It's also not clear whether if you write p[i] with i of type int
> the multiplication by sizeof (*p) invokes undefined behavior if
> it wraps (that is, the C standard does not define the type the
> multiplication is performed in but just defines things in terms
> of array elements).
> 
> Ideally we'd use a widening multiplication here but optimizers
> have little knowledge of that so it probably would cause quite
> some regressions.  We could also keep the multiplication signed
> (but using ssizetype), but then fold will come along and
> undo that trick IIRC.
> 
> That said, both the POINTER_PLUS_EXPR constraints on the offset
> type _and_ the C language issue with int * sizeof (element)
> overflowing for 64bit pointer sizes prevent us from optimizing this.
> 
> It's a very tricky area ;)

A related part of code to pointer_int_sum is get_inner_reference.
If you do

Index: gcc/expr.c
===================================================================
--- gcc/expr.c  (revision 218121)
+++ gcc/expr.c  (working copy)
@@ -6852,9 +6852,10 @@ get_inner_reference (tree exp, HOST_WIDE
                                   index, low_bound);

            offset = size_binop (PLUS_EXPR, offset,
+                                fold_convert (sizetype,
                                 size_binop (MULT_EXPR,
-                                            fold_convert (sizetype, 
index),
-                                            unit_size));
+                                            fold_convert (ssizetype, 
index),
+                                            fold_convert (ssizetype, 
unit_size))));
          }
          break;


thus compute index * size multiplication in a signed type and
then only convert the result back to unsigned then it for example
improves the testcase in PR52563 comment #4 in the following way:

 .L5:
-       movq    %rdx, %rcx
-       movq    %rax, %r8
-       movl    $100, (%rax)
-       addq    %rdi, %rdx
-       addq    %rsi, %rax
-       cmpq    $999, %rcx
+       movl    $100, a(,%rax,4)
+       addq    %rdi, %rax
+       movq    %rdx, %rsi
+       addq    %rcx, %rdx
+       cmpq    $999, %rax
        jle     .L5

not sure if that's really faster in the end though, but IVOPTs
does its job in that case.

Now you need to argue that doing that is safe - the change does
two things.

 1) we interpret 'index' as signed
 2) we say the multiplication does not overflow

that would break for example

 unsigned long i = 0x8000000000000000UL;
 int a[1];
 a[i] = 0;

as i * 4 overflows to 0 and thus the access is valid.

Of course accessing the element 0x8000000000000000UL of an array
of size 1 invokes undefined behavior.  Now you get to prove
that there is no case that breaks where it wouldn't be undefined
from the start ...

I'd be happy to approve such a change - maybe you can check if
it helps you and whether it passes testing and enough benchmarks?
(there is still the related code in pointer_int_sum which should
be adjusted accordingly)