From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 35609 invoked by alias); 13 Jun 2017 10:25:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 33872 invoked by uid 89); 13 Jun 2017 10:25:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Jun 2017 10:25:40 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 343C8AD78; Tue, 13 Jun 2017 10:25:43 +0000 (UTC) Date: Tue, 13 Jun 2017 10:25:00 -0000 From: Richard Biener To: Richard Sandiford cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix PR66313 In-Reply-To: <877f0gfbwz.fsf@linaro.org> Message-ID: References: <877f0gfbwz.fsf@linaro.org> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2017-06/txt/msg00889.txt.bz2 On Tue, 13 Jun 2017, Richard Sandiford wrote: > Richard Biener writes: > > So I've come back to PR66313 and found a solution to the tailrecursion > > missed optimization when fixing the factoring folding to use an unsigned > > type when we're not sure of overflow. > > > > The folding part is identical to my last try from 2015, the tailrecursion > > part makes us handle intermittent stmts that were introduced by foldings > > that "clobber" our quest walking the single-use chain of stmts between > > the call and the return (and failing at all stmts that are not part > > of said chain). A simple solution is to move the stmts that are not > > part of the chain and that we can move before the call. That handles > > the leaf conversions that now appear for tree-ssa/tailrecursion-6.c > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > > > Richard. > > > > 2017-05-31 Richard Biener > > > > PR middle-end/66313 > > * fold-const.c (fold_plusminus_mult_expr): If the factored > > factor may be zero use a wrapping type for the inner operation. > > * tree-tailcall.c (independent_of_stmt_p): Pass in to_move bitmap > > and handle moved defs. > > (process_assignment): Properly guard the unary op case. Return a > > tri-state indicating that moving the stmt before the call may allow > > to continue. Pass through to_move. > > (find_tail_calls): Handle moving unrelated defs before > > the call. > > > > * c-c++-common/ubsan/pr66313.c: New testcase. > > * gcc.dg/tree-ssa/loop-15.c: Adjust. > > > > Index: gcc/fold-const.c > > =================================================================== > > *** gcc/fold-const.c.orig 2015-10-29 12:32:33.302782318 +0100 > > --- gcc/fold-const.c 2015-10-29 14:08:39.936497739 +0100 > > *************** fold_plusminus_mult_expr (location_t loc > > *** 6916,6925 **** > > } > > same = NULL_TREE; > > > > ! if (operand_equal_p (arg01, arg11, 0)) > > ! same = arg01, alt0 = arg00, alt1 = arg10; > > ! else if (operand_equal_p (arg00, arg10, 0)) > > same = arg00, alt0 = arg01, alt1 = arg11; > > else if (operand_equal_p (arg00, arg11, 0)) > > same = arg00, alt0 = arg01, alt1 = arg10; > > else if (operand_equal_p (arg01, arg10, 0)) > > --- 6916,6926 ---- > > } > > same = NULL_TREE; > > > > ! /* Prefer factoring a common non-constant. */ > > ! if (operand_equal_p (arg00, arg10, 0)) > > same = arg00, alt0 = arg01, alt1 = arg11; > > + else if (operand_equal_p (arg01, arg11, 0)) > > + same = arg01, alt0 = arg00, alt1 = arg10; > > else if (operand_equal_p (arg00, arg11, 0)) > > same = arg00, alt0 = arg01, alt1 = arg10; > > else if (operand_equal_p (arg01, arg10, 0)) > > *************** fold_plusminus_mult_expr (location_t loc > > *** 6974,6987 **** > > } > > } > > > > ! if (same) > > return fold_build2_loc (loc, MULT_EXPR, type, > > fold_build2_loc (loc, code, type, > > fold_convert_loc (loc, type, alt0), > > fold_convert_loc (loc, type, alt1)), > > fold_convert_loc (loc, type, same)); > > > > ! return NULL_TREE; > > } > > > > /* Subroutine of native_encode_expr. Encode the INTEGER_CST > > --- 6975,7010 ---- > > } > > } > > > > ! if (!same) > > ! return NULL_TREE; > > ! > > ! if (! INTEGRAL_TYPE_P (type) > > ! || TYPE_OVERFLOW_WRAPS (type) > > ! /* We are neither factoring zero nor minus one. */ > > ! || TREE_CODE (same) == INTEGER_CST) > > return fold_build2_loc (loc, MULT_EXPR, type, > > fold_build2_loc (loc, code, type, > > fold_convert_loc (loc, type, alt0), > > fold_convert_loc (loc, type, alt1)), > > fold_convert_loc (loc, type, same)); > > > > ! /* Same may be zero and thus the operation 'code' may overflow. Likewise > > ! same may be minus one and thus the multiplication may overflow. Perform > > ! the operations in an unsigned type. */ > > ! tree utype = unsigned_type_for (type); > > ! tree tem = fold_build2_loc (loc, code, utype, > > ! fold_convert_loc (loc, utype, alt0), > > ! fold_convert_loc (loc, utype, alt1)); > > ! /* If the sum evaluated to a constant that is not -INF the multiplication > > ! cannot overflow. */ > > ! if (TREE_CODE (tem) == INTEGER_CST > > ! && ! wi::eq_p (tem, wi::min_value (TYPE_PRECISION (utype), SIGNED))) > > ! return fold_build2_loc (loc, MULT_EXPR, type, > > ! fold_convert (type, tem), same); > > ! > > ! return fold_convert_loc (loc, type, > > ! fold_build2_loc (loc, MULT_EXPR, utype, tem, > > ! fold_convert_loc (loc, utype, same))); > > } > > > > /* Subroutine of native_encode_expr. Encode the INTEGER_CST > > Sorry if you already know, but this part means that we can no longer > vectorise: > > int > f (int *x, int b1, int b2, int b3) > { > int foo = 0; > for (int i1 = 0; i1 < b1; ++i1) > for (int i2 = 0; i2 < b2; ++i2) > for (int i3 = 0; i3 < b3; ++i3) > foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)]; > return foo; > } > > We now convert all the arithmetic in the [...] to unsigned int > and reassociate it so that the "- 1" is applied last. We then assume > that the overflow in this subtraction is well-defined and that the > &x[...] might not be linear for the inner loop. Can you file a PR? # i3_44 = PHI i3.2_7 = (unsigned int) i3_44; _9 = i3.2_7 + _34; _10 = (int) _9; _11 = (long unsigned int) _10; _12 = _11 * 4; _13 = x_30(D) + _12; _14 = *_13; ... i3_32 = i3_44 + 1; so we have *(x_30(D) + 4 * ((sizetype)(int)((unsigned){ 0, + , 1} + X)) It might mean that we need to avoid this re-association. Not sure how to detect worthwhile vs. not worthwhile cases during folding. At least I see no easy way to recover from it in SCEV analysis unless the number of iterations is constrained more than in your example. Ideas welcome ;) Thanks, Richard.