From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6055 invoked by alias); 27 Nov 2014 12:16:01 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 5963 invoked by uid 55); 27 Nov 2014 12:15:56 -0000 From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/62173] [5.0 regression] 64bit Arch can't ivopt while 32bit Arch can Date: Thu, 27 Nov 2014 12:16:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: jiwang at gcc dot gnu.org X-Bugzilla-Target-Milestone: 5.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-11/txt/msg03233.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173 --- Comment #14 from rguenther at suse dot de --- On Thu, 27 Nov 2014, rguenther at suse dot de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173 > > --- Comment #13 from rguenther at suse dot de --- > On Thu, 27 Nov 2014, jiwang at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62173 > > > > Jiong Wang changed: > > > > What |Removed |Added > > ---------------------------------------------------------------------------- > > CC| |rguenth at gcc dot gnu.org > > Depends on| |52563 > > > > --- Comment #12 from Jiong Wang --- > > the root cause why it's not ivopted on AArch64 is because > > > > must_check_src_overflow = TYPE_PRECISION (ct) < TYPE_PRECISION (type); > > > > code above in convert_affine_scev, the input type is sizetype, so DI for 64bit > > arch, SI for 32bit arch, ct is SI, thus, must_check_src_overflow set to true > > for 64bit arch, then failed later scev_probably_wraps_p check. > > > > And I found similar issue reported back in 2012, at bug 52563. > > > > I verified this bug exist on other 64 archs, like mips64, ppc64, x86-64 > > > > Richard, on 52563, I see you was working on this, do you have any thoughts on > > this? > > See comment #5 of that bug. For 4.8(?) I started on work to relax > the type requirements of the offset parameter of POINTER_PLUS_EXPR > by abstracting stuff but I didn't get to continue on that work. > > Basically that we force the offset to 'sizetype' has both correctness > issues (for targets where sizetype precision doesn't match Pmode > precision) and optimization issues as we lose for example sign > information and overflow knowledge in the computation of the offset. > The last thing is also because we have transforms in fold which > push typecasts of expressions down to operands - thus > from (sizetype) (4 * i) we get 4 * (sizetype)i which may now be > an unsigned multiplication with wrapping overflow. Note that > it is the frontends who start the conversion thing and apply some > "tricks" for code-gen (see pointer_int_sum in c-common.c). > It's also not clear whether if you write p[i] with i of type int > the multiplication by sizeof (*p) invokes undefined behavior if > it wraps (that is, the C standard does not define the type the > multiplication is performed in but just defines things in terms > of array elements). > > Ideally we'd use a widening multiplication here but optimizers > have little knowledge of that so it probably would cause quite > some regressions. We could also keep the multiplication signed > (but using ssizetype), but then fold will come along and > undo that trick IIRC. > > That said, both the POINTER_PLUS_EXPR constraints on the offset > type _and_ the C language issue with int * sizeof (element) > overflowing for 64bit pointer sizes prevent us from optimizing this. > > It's a very tricky area ;) A related part of code to pointer_int_sum is get_inner_reference. If you do Index: gcc/expr.c =================================================================== --- gcc/expr.c (revision 218121) +++ gcc/expr.c (working copy) @@ -6852,9 +6852,10 @@ get_inner_reference (tree exp, HOST_WIDE index, low_bound); offset = size_binop (PLUS_EXPR, offset, + fold_convert (sizetype, size_binop (MULT_EXPR, - fold_convert (sizetype, index), - unit_size)); + fold_convert (ssizetype, index), + fold_convert (ssizetype, unit_size)))); } break; thus compute index * size multiplication in a signed type and then only convert the result back to unsigned then it for example improves the testcase in PR52563 comment #4 in the following way: .L5: - movq %rdx, %rcx - movq %rax, %r8 - movl $100, (%rax) - addq %rdi, %rdx - addq %rsi, %rax - cmpq $999, %rcx + movl $100, a(,%rax,4) + addq %rdi, %rax + movq %rdx, %rsi + addq %rcx, %rdx + cmpq $999, %rax jle .L5 not sure if that's really faster in the end though, but IVOPTs does its job in that case. Now you need to argue that doing that is safe - the change does two things. 1) we interpret 'index' as signed 2) we say the multiplication does not overflow that would break for example unsigned long i = 0x8000000000000000UL; int a[1]; a[i] = 0; as i * 4 overflows to 0 and thus the access is valid. Of course accessing the element 0x8000000000000000UL of an array of size 1 invokes undefined behavior. Now you get to prove that there is no case that breaks where it wouldn't be undefined from the start ... I'd be happy to approve such a change - maybe you can check if it helps you and whether it passes testing and enough benchmarks? (there is still the related code in pointer_int_sum which should be adjusted accordingly)