From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 71691 invoked by alias); 13 Jun 2017 11:56:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 69394 invoked by uid 89); 13 Jun 2017 11:56:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-vk0-f52.google.com Received: from mail-vk0-f52.google.com (HELO mail-vk0-f52.google.com) (209.85.213.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Jun 2017 11:56:23 +0000 Received: by mail-vk0-f52.google.com with SMTP id p62so62702627vkp.0 for ; Tue, 13 Jun 2017 04:56:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=hGDlG9qEMEJcS+U7UlU3SRb3WfN6uAmUW1pyqYNZXXY=; b=YpSycehtEm+jbEpBSn9DxE84GvV8t0mBiuvPpyfbLz7al5qXg2sGzgjYpCLgAJE3Wb V3QGwBP7nO6dNrcfne4IuvB2hDOr2Zdp8BcEtr6E8HwFolJPj9Epb6PLMHiZW0sDtWoZ TRR+uYBnCuD1Vg/n6kmqN+Gw9bzVyuL/Icr2YS/p6tIX8S3MyEbuTJjzolCtCW/T07tl X1fgHtaw1XoES1RvjSsSpZXuCbVj8PAJ8EkBcZXFYu9ofELYLHmu0anuS+U5FBnswmII DtMmk+48BglNzFWOklz3vpG27DZj+v1DDWFatmjL032V33uBxyQsBZNAWbU+bmNbm63x MJLw== X-Gm-Message-State: AODbwcD3AP7o1HsbXIsz4Q8FxL8qY65Vk7oB6xjLwwTZVCLMvw6GmUgD TGsgQcTpU/8UdCjwRXc0FUVpuE5YBQ== X-Received: by 10.31.155.205 with SMTP id d196mr9805000vke.88.1497354986722; Tue, 13 Jun 2017 04:56:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.47.19 with HTTP; Tue, 13 Jun 2017 04:56:26 -0700 (PDT) In-Reply-To: References: <877f0gfbwz.fsf@linaro.org> <8737b4f7wj.fsf@linaro.org> From: "Bin.Cheng" Date: Tue, 13 Jun 2017 11:56:00 -0000 Message-ID: Subject: Re: [PATCH] Fix PR66313 To: Richard Biener Cc: Richard Sandiford , gcc-patches List Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2017-06/txt/msg00910.txt.bz2 On Tue, Jun 13, 2017 at 12:48 PM, Richard Biener wrote: > On Tue, 13 Jun 2017, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Tue, 13 Jun 2017, Richard Sandiford wrote: >> >> Richard Biener writes: >> >> > So I've come back to PR66313 and found a solution to the tailrecursion >> >> > missed optimization when fixing the factoring folding to use an unsigned >> >> > type when we're not sure of overflow. >> >> > >> >> > The folding part is identical to my last try from 2015, the tailrecursion >> >> > part makes us handle intermittent stmts that were introduced by foldings >> >> > that "clobber" our quest walking the single-use chain of stmts between >> >> > the call and the return (and failing at all stmts that are not part >> >> > of said chain). A simple solution is to move the stmts that are not >> >> > part of the chain and that we can move before the call. That handles >> >> > the leaf conversions that now appear for tree-ssa/tailrecursion-6.c >> >> > >> >> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. >> >> > >> >> > Richard. >> >> > >> >> > 2017-05-31 Richard Biener >> >> > >> >> > PR middle-end/66313 >> >> > * fold-const.c (fold_plusminus_mult_expr): If the factored >> >> > factor may be zero use a wrapping type for the inner operation. >> >> > * tree-tailcall.c (independent_of_stmt_p): Pass in to_move bitmap >> >> > and handle moved defs. >> >> > (process_assignment): Properly guard the unary op case. Return a >> >> > tri-state indicating that moving the stmt before the call may allow >> >> > to continue. Pass through to_move. >> >> > (find_tail_calls): Handle moving unrelated defs before >> >> > the call. >> >> > >> >> > * c-c++-common/ubsan/pr66313.c: New testcase. >> >> > * gcc.dg/tree-ssa/loop-15.c: Adjust. >> >> > >> >> > Index: gcc/fold-const.c >> >> > =================================================================== >> >> > *** gcc/fold-const.c.orig 2015-10-29 12:32:33.302782318 +0100 >> >> > --- gcc/fold-const.c 2015-10-29 14:08:39.936497739 +0100 >> >> > *************** fold_plusminus_mult_expr (location_t loc >> >> > *** 6916,6925 **** >> >> > } >> >> > same = NULL_TREE; >> >> > >> >> > ! if (operand_equal_p (arg01, arg11, 0)) >> >> > ! same = arg01, alt0 = arg00, alt1 = arg10; >> >> > ! else if (operand_equal_p (arg00, arg10, 0)) >> >> > same = arg00, alt0 = arg01, alt1 = arg11; >> >> > else if (operand_equal_p (arg00, arg11, 0)) >> >> > same = arg00, alt0 = arg01, alt1 = arg10; >> >> > else if (operand_equal_p (arg01, arg10, 0)) >> >> > --- 6916,6926 ---- >> >> > } >> >> > same = NULL_TREE; >> >> > >> >> > ! /* Prefer factoring a common non-constant. */ >> >> > ! if (operand_equal_p (arg00, arg10, 0)) >> >> > same = arg00, alt0 = arg01, alt1 = arg11; >> >> > + else if (operand_equal_p (arg01, arg11, 0)) >> >> > + same = arg01, alt0 = arg00, alt1 = arg10; >> >> > else if (operand_equal_p (arg00, arg11, 0)) >> >> > same = arg00, alt0 = arg01, alt1 = arg10; >> >> > else if (operand_equal_p (arg01, arg10, 0)) >> >> > *************** fold_plusminus_mult_expr (location_t loc >> >> > *** 6974,6987 **** >> >> > } >> >> > } >> >> > >> >> > ! if (same) >> >> > return fold_build2_loc (loc, MULT_EXPR, type, >> >> > fold_build2_loc (loc, code, type, >> >> > fold_convert_loc (loc, type, alt0), >> >> > fold_convert_loc (loc, type, alt1)), >> >> > fold_convert_loc (loc, type, same)); >> >> > >> >> > ! return NULL_TREE; >> >> > } >> >> > >> >> > /* Subroutine of native_encode_expr. Encode the INTEGER_CST >> >> > --- 6975,7010 ---- >> >> > } >> >> > } >> >> > >> >> > ! if (!same) >> >> > ! return NULL_TREE; >> >> > ! >> >> > ! if (! INTEGRAL_TYPE_P (type) >> >> > ! || TYPE_OVERFLOW_WRAPS (type) >> >> > ! /* We are neither factoring zero nor minus one. */ >> >> > ! || TREE_CODE (same) == INTEGER_CST) >> >> > return fold_build2_loc (loc, MULT_EXPR, type, >> >> > fold_build2_loc (loc, code, type, >> >> > fold_convert_loc (loc, type, alt0), >> >> > fold_convert_loc (loc, type, alt1)), >> >> > fold_convert_loc (loc, type, same)); >> >> > >> >> > ! /* Same may be zero and thus the operation 'code' may overflow. Likewise >> >> > ! same may be minus one and thus the multiplication may overflow. Perform >> >> > ! the operations in an unsigned type. */ >> >> > ! tree utype = unsigned_type_for (type); >> >> > ! tree tem = fold_build2_loc (loc, code, utype, >> >> > ! fold_convert_loc (loc, utype, alt0), >> >> > ! fold_convert_loc (loc, utype, alt1)); >> >> > ! /* If the sum evaluated to a constant that is not -INF the multiplication >> >> > ! cannot overflow. */ >> >> > ! if (TREE_CODE (tem) == INTEGER_CST >> >> > ! && ! wi::eq_p (tem, wi::min_value (TYPE_PRECISION (utype), SIGNED))) >> >> > ! return fold_build2_loc (loc, MULT_EXPR, type, >> >> > ! fold_convert (type, tem), same); >> >> > ! >> >> > ! return fold_convert_loc (loc, type, >> >> > ! fold_build2_loc (loc, MULT_EXPR, utype, tem, >> >> > ! fold_convert_loc (loc, utype, same))); >> >> > } >> >> > >> >> > /* Subroutine of native_encode_expr. Encode the INTEGER_CST >> >> >> >> Sorry if you already know, but this part means that we can no longer >> >> vectorise: >> >> >> >> int >> >> f (int *x, int b1, int b2, int b3) >> >> { >> >> int foo = 0; >> >> for (int i1 = 0; i1 < b1; ++i1) >> >> for (int i2 = 0; i2 < b2; ++i2) >> >> for (int i3 = 0; i3 < b3; ++i3) >> >> foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)]; >> >> return foo; >> >> } >> >> >> >> We now convert all the arithmetic in the [...] to unsigned int >> >> and reassociate it so that the "- 1" is applied last. We then assume >> >> that the overflow in this subtraction is well-defined and that the >> >> &x[...] might not be linear for the inner loop. >> > >> > Can you file a PR? >> >> OK, filed as PR81082. >> >> > # i3_44 = PHI >> > i3.2_7 = (unsigned int) i3_44; >> > _9 = i3.2_7 + _34; >> > _10 = (int) _9; >> > _11 = (long unsigned int) _10; >> > _12 = _11 * 4; >> > _13 = x_30(D) + _12; >> > _14 = *_13; >> > ... >> > i3_32 = i3_44 + 1; >> > >> > so we have *(x_30(D) + 4 * ((sizetype)(int)((unsigned){ 0, + , 1} + X)) >> > >> > It might mean that we need to avoid this re-association. Not sure how >> > to detect worthwhile vs. not worthwhile cases during folding. At least >> > I see no easy way to recover from it in SCEV analysis unless the >> > number of iterations is constrained more than in your example. >> >> Yeah, in the example that this was reduced from, not reassociating >> is far more preferable to changing the types. But like you say, >> I've no idea how we'd know that at this stage. > > In the past I played with not obfuscating things during FE lowering > so early, namely not expose the * 4 but keep the original types of > array / pointer indexes. On the original mem-ref branch I had > a IDX_EXPR that allowed to chain those (a widen-multiply-accumulate > op). > > That said, the context where this association is not interesting is > address context and the multiplication to the byte offset. > > Of course in your case there's also b3 factored out which looks > like a generally interesting transform (but eventually harmful in address > context again). > > From the FE we get > > *(x + (sizetype) ((long unsigned int) (((i1 * b2) * b3 + i2 * b3) + (i3 + > -1)) * 4)) > > and SCEV uses the fact that the mults/adds have undefined behavior > on overflow. The same missed optimization occurs if you make all those > vars unsigned (with or without the folding in question): But with unsigned type it's not missed optimization because computation could overflow and result in non-scev address. In this case the loop needs to be versioned under overflow/non-overflow to be parallelized/vectorized, right? Or split for cases we can easily infer boundary condition of overflow. Thanks, bin > > #define U unsigned > int > f (int *x, U int b1, U int b2, U int b3) > { > int foo = 0; > for (U int i1 = 0; i1 < b1; ++i1) > for (U int i2 = 0; i2 < b2; ++i2) > for (U int i3 = 0; i3 < b3; ++i3) > foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)]; > return foo; > } > > It works again for unsigned long as there can be no valid object > where the computation could wrap around. > > Richard. > > -- > Richard Biener > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)