From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-455815-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 71691 invoked by alias); 13 Jun 2017 11:56:28 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 69394 invoked by uid 89); 13 Jun 2017 11:56:25 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-6.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: mail-vk0-f52.google.com
Received: from mail-vk0-f52.google.com (HELO mail-vk0-f52.google.com) (209.85.213.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Jun 2017 11:56:23 +0000
Received: by mail-vk0-f52.google.com with SMTP id p62so62702627vkp.0        for <gcc-patches@gcc.gnu.org>; Tue, 13 Jun 2017 04:56:28 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=hGDlG9qEMEJcS+U7UlU3SRb3WfN6uAmUW1pyqYNZXXY=;        b=YpSycehtEm+jbEpBSn9DxE84GvV8t0mBiuvPpyfbLz7al5qXg2sGzgjYpCLgAJE3Wb         V3QGwBP7nO6dNrcfne4IuvB2hDOr2Zdp8BcEtr6E8HwFolJPj9Epb6PLMHiZW0sDtWoZ         TRR+uYBnCuD1Vg/n6kmqN+Gw9bzVyuL/Icr2YS/p6tIX8S3MyEbuTJjzolCtCW/T07tl         X1fgHtaw1XoES1RvjSsSpZXuCbVj8PAJ8EkBcZXFYu9ofELYLHmu0anuS+U5FBnswmII         DtMmk+48BglNzFWOklz3vpG27DZj+v1DDWFatmjL032V33uBxyQsBZNAWbU+bmNbm63x         MJLw==
X-Gm-Message-State: AODbwcD3AP7o1HsbXIsz4Q8FxL8qY65Vk7oB6xjLwwTZVCLMvw6GmUgD	TGsgQcTpU/8UdCjwRXc0FUVpuE5YBQ==
X-Received: by 10.31.155.205 with SMTP id d196mr9805000vke.88.1497354986722; Tue, 13 Jun 2017 04:56:26 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.103.47.19 with HTTP; Tue, 13 Jun 2017 04:56:26 -0700 (PDT)
In-Reply-To: <alpine.LSU.2.20.1706131329050.22867@zhemvz.fhfr.qr>
References: <alpine.LSU.2.20.1705311603540.20726@zhemvz.fhfr.qr> <877f0gfbwz.fsf@linaro.org> <alpine.LSU.2.20.1706131218550.22867@zhemvz.fhfr.qr> <8737b4f7wj.fsf@linaro.org> <alpine.LSU.2.20.1706131329050.22867@zhemvz.fhfr.qr>
From: "Bin.Cheng" <amker.cheng@gmail.com>
Date: Tue, 13 Jun 2017 11:56:00 -0000
Message-ID: <CAHFci28tLfLzwMYZQ0_99XiwxN31MTXa0mxsUxAZSandKq7EXA@mail.gmail.com>
Subject: Re: [PATCH] Fix PR66313
To: Richard Biener <rguenther@suse.de>
Cc: Richard Sandiford <richard.sandiford@linaro.org>, 	gcc-patches List <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2017-06/txt/msg00910.txt.bz2

On Tue, Jun 13, 2017 at 12:48 PM, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 13 Jun 2017, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Tue, 13 Jun 2017, Richard Sandiford wrote:
>> >> Richard Biener <rguenther@suse.de> writes:
>> >> > So I've come back to PR66313 and found a solution to the tailrecursion
>> >> > missed optimization when fixing the factoring folding to use an unsigned
>> >> > type when we're not sure of overflow.
>> >> >
>> >> > The folding part is identical to my last try from 2015, the tailrecursion
>> >> > part makes us handle intermittent stmts that were introduced by foldings
>> >> > that "clobber" our quest walking the single-use chain of stmts between
>> >> > the call and the return (and failing at all stmts that are not part
>> >> > of said chain).  A simple solution is to move the stmts that are not
>> >> > part of the chain and that we can move before the call.  That handles
>> >> > the leaf conversions that now appear for tree-ssa/tailrecursion-6.c
>> >> >
>> >> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> >> >
>> >> > Richard.
>> >> >
>> >> > 2017-05-31  Richard Biener  <rguenther@suse.de>
>> >> >
>> >> >  PR middle-end/66313
>> >> >  * fold-const.c (fold_plusminus_mult_expr): If the factored
>> >> >  factor may be zero use a wrapping type for the inner operation.
>> >> >  * tree-tailcall.c (independent_of_stmt_p): Pass in to_move bitmap
>> >> >  and handle moved defs.
>> >> >  (process_assignment): Properly guard the unary op case.  Return a
>> >> >  tri-state indicating that moving the stmt before the call may allow
>> >> >  to continue.  Pass through to_move.
>> >> >  (find_tail_calls): Handle moving unrelated defs before
>> >> >  the call.
>> >> >
>> >> >  * c-c++-common/ubsan/pr66313.c: New testcase.
>> >> >  * gcc.dg/tree-ssa/loop-15.c: Adjust.
>> >> >
>> >> > Index: gcc/fold-const.c
>> >> > ===================================================================
>> >> > *** gcc/fold-const.c.orig        2015-10-29 12:32:33.302782318 +0100
>> >> > --- gcc/fold-const.c     2015-10-29 14:08:39.936497739 +0100
>> >> > *************** fold_plusminus_mult_expr (location_t loc
>> >> > *** 6916,6925 ****
>> >> >       }
>> >> >     same = NULL_TREE;
>> >> >
>> >> > !   if (operand_equal_p (arg01, arg11, 0))
>> >> > !     same = arg01, alt0 = arg00, alt1 = arg10;
>> >> > !   else if (operand_equal_p (arg00, arg10, 0))
>> >> >       same = arg00, alt0 = arg01, alt1 = arg11;
>> >> >     else if (operand_equal_p (arg00, arg11, 0))
>> >> >       same = arg00, alt0 = arg01, alt1 = arg10;
>> >> >     else if (operand_equal_p (arg01, arg10, 0))
>> >> > --- 6916,6926 ----
>> >> >       }
>> >> >     same = NULL_TREE;
>> >> >
>> >> > !   /* Prefer factoring a common non-constant.  */
>> >> > !   if (operand_equal_p (arg00, arg10, 0))
>> >> >       same = arg00, alt0 = arg01, alt1 = arg11;
>> >> > +   else if (operand_equal_p (arg01, arg11, 0))
>> >> > +     same = arg01, alt0 = arg00, alt1 = arg10;
>> >> >     else if (operand_equal_p (arg00, arg11, 0))
>> >> >       same = arg00, alt0 = arg01, alt1 = arg10;
>> >> >     else if (operand_equal_p (arg01, arg10, 0))
>> >> > *************** fold_plusminus_mult_expr (location_t loc
>> >> > *** 6974,6987 ****
>> >> >          }
>> >> >       }
>> >> >
>> >> > !   if (same)
>> >> >       return fold_build2_loc (loc, MULT_EXPR, type,
>> >> >                          fold_build2_loc (loc, code, type,
>> >> >                                       fold_convert_loc (loc, type, alt0),
>> >> >                                       fold_convert_loc (loc, type, alt1)),
>> >> >                          fold_convert_loc (loc, type, same));
>> >> >
>> >> > !   return NULL_TREE;
>> >> >   }
>> >> >
>> >> >   /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
>> >> > --- 6975,7010 ----
>> >> >          }
>> >> >       }
>> >> >
>> >> > !   if (!same)
>> >> > !     return NULL_TREE;
>> >> > !
>> >> > !   if (! INTEGRAL_TYPE_P (type)
>> >> > !       || TYPE_OVERFLOW_WRAPS (type)
>> >> > !       /* We are neither factoring zero nor minus one.  */
>> >> > !       || TREE_CODE (same) == INTEGER_CST)
>> >> >       return fold_build2_loc (loc, MULT_EXPR, type,
>> >> >                          fold_build2_loc (loc, code, type,
>> >> >                                       fold_convert_loc (loc, type, alt0),
>> >> >                                       fold_convert_loc (loc, type, alt1)),
>> >> >                          fold_convert_loc (loc, type, same));
>> >> >
>> >> > !   /* Same may be zero and thus the operation 'code' may overflow.  Likewise
>> >> > !      same may be minus one and thus the multiplication may overflow.  Perform
>> >> > !      the operations in an unsigned type.  */
>> >> > !   tree utype = unsigned_type_for (type);
>> >> > !   tree tem = fold_build2_loc (loc, code, utype,
>> >> > !                              fold_convert_loc (loc, utype, alt0),
>> >> > !                              fold_convert_loc (loc, utype, alt1));
>> >> > !   /* If the sum evaluated to a constant that is not -INF the multiplication
>> >> > !      cannot overflow.  */
>> >> > !   if (TREE_CODE (tem) == INTEGER_CST
>> >> > !       && ! wi::eq_p (tem, wi::min_value (TYPE_PRECISION (utype), SIGNED)))
>> >> > !     return fold_build2_loc (loc, MULT_EXPR, type,
>> >> > !                            fold_convert (type, tem), same);
>> >> > !
>> >> > !   return fold_convert_loc (loc, type,
>> >> > !                           fold_build2_loc (loc, MULT_EXPR, utype, tem,
>> >> > !                                            fold_convert_loc (loc, utype, same)));
>> >> >   }
>> >> >
>> >> >   /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
>> >>
>> >> Sorry if you already know, but this part means that we can no longer
>> >> vectorise:
>> >>
>> >> int
>> >> f (int *x, int b1, int b2, int b3)
>> >> {
>> >>   int foo = 0;
>> >>   for (int i1 = 0; i1 < b1; ++i1)
>> >>     for (int i2 = 0; i2 < b2; ++i2)
>> >>       for (int i3 = 0; i3 < b3; ++i3)
>> >>         foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
>> >>   return foo;
>> >> }
>> >>
>> >> We now convert all the arithmetic in the [...] to unsigned int
>> >> and reassociate it so that the "- 1" is applied last.  We then assume
>> >> that the overflow in this subtraction is well-defined and that the
>> >> &x[...] might not be linear for the inner loop.
>> >
>> > Can you file a PR?
>>
>> OK, filed as PR81082.
>>
>> >   # i3_44 = PHI <i3_32(6), 0(4)>
>> >   i3.2_7 = (unsigned int) i3_44;
>> >   _9 = i3.2_7 + _34;
>> >   _10 = (int) _9;
>> >   _11 = (long unsigned int) _10;
>> >   _12 = _11 * 4;
>> >   _13 = x_30(D) + _12;
>> >   _14 = *_13;
>> > ...
>> >    i3_32 = i3_44 + 1;
>> >
>> > so we have *(x_30(D) + 4 * ((sizetype)(int)((unsigned){ 0, + , 1} + X))
>> >
>> > It might mean that we need to avoid this re-association.  Not sure how
>> > to detect worthwhile vs. not worthwhile cases during folding.  At least
>> > I see no easy way to recover from it in SCEV analysis unless the
>> > number of iterations is constrained more than in your example.
>>
>> Yeah, in the example that this was reduced from, not reassociating
>> is far more preferable to changing the types.  But like you say,
>> I've no idea how we'd know that at this stage.
>
> In the past I played with not obfuscating things during FE lowering
> so early, namely not expose the * 4 but keep the original types of
> array / pointer indexes.  On the original mem-ref branch I had
> a IDX_EXPR that allowed to chain those (a widen-multiply-accumulate
> op).
>
> That said, the context where this association is not interesting is
> address context and the multiplication to the byte offset.
>
> Of course in your case there's also b3 factored out which looks
> like a generally interesting transform (but eventually harmful in address
> context again).
>
> From the FE we get
>
> *(x + (sizetype) ((long unsigned int) (((i1 * b2) * b3 + i2 * b3) + (i3 +
> -1)) * 4))
>
> and SCEV uses the fact that the mults/adds have undefined behavior
> on overflow.  The same missed optimization occurs if you make all those
> vars unsigned (with or without the folding in question):
But with unsigned type it's not missed optimization because
computation could overflow and result in non-scev address.  In this
case the loop needs to be versioned under overflow/non-overflow to be
parallelized/vectorized, right?  Or split for cases we can easily
infer boundary condition of overflow.

Thanks,
bin
>
> #define U unsigned
> int
> f (int *x, U int b1, U int b2, U int b3)
> {
>   int foo = 0;
>   for (U int i1 = 0; i1 < b1; ++i1)
>     for (U int i2 = 0; i2 < b2; ++i2)
>       for (U int i3 = 0; i3 < b3; ++i3)
>         foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
>   return foo;
> }
>
> It works again for unsigned long as there can be no valid object
> where the computation could wrap around.
>
> Richard.
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)