From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-455794-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 35609 invoked by alias); 13 Jun 2017 10:25:48 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 33872 invoked by uid 89); 13 Jun 2017 10:25:47 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 13 Jun 2017 10:25:40 +0000
Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254])	by mx1.suse.de (Postfix) with ESMTP id 343C8AD78;	Tue, 13 Jun 2017 10:25:43 +0000 (UTC)
Date: Tue, 13 Jun 2017 10:25:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@linaro.org>
cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Fix PR66313
In-Reply-To: <877f0gfbwz.fsf@linaro.org>
Message-ID: <alpine.LSU.2.20.1706131218550.22867@zhemvz.fhfr.qr>
References: <alpine.LSU.2.20.1705311603540.20726@zhemvz.fhfr.qr> <877f0gfbwz.fsf@linaro.org>
User-Agent: Alpine 2.20 (LSU 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-SW-Source: 2017-06/txt/msg00889.txt.bz2

On Tue, 13 Jun 2017, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > So I've come back to PR66313 and found a solution to the tailrecursion
> > missed optimization when fixing the factoring folding to use an unsigned
> > type when we're not sure of overflow.
> >
> > The folding part is identical to my last try from 2015, the tailrecursion
> > part makes us handle intermittent stmts that were introduced by foldings
> > that "clobber" our quest walking the single-use chain of stmts between
> > the call and the return (and failing at all stmts that are not part
> > of said chain).  A simple solution is to move the stmts that are not
> > part of the chain and that we can move before the call.  That handles
> > the leaf conversions that now appear for tree-ssa/tailrecursion-6.c
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > Richard.
> >
> > 2017-05-31  Richard Biener  <rguenther@suse.de>
> >
> > 	PR middle-end/66313
> > 	* fold-const.c (fold_plusminus_mult_expr): If the factored
> > 	factor may be zero use a wrapping type for the inner operation.
> > 	* tree-tailcall.c (independent_of_stmt_p): Pass in to_move bitmap
> > 	and handle moved defs.
> > 	(process_assignment): Properly guard the unary op case.  Return a
> > 	tri-state indicating that moving the stmt before the call may allow
> > 	to continue.  Pass through to_move.
> > 	(find_tail_calls): Handle moving unrelated defs before
> > 	the call.
> >
> > 	* c-c++-common/ubsan/pr66313.c: New testcase.
> > 	* gcc.dg/tree-ssa/loop-15.c: Adjust.
> >
> > Index: gcc/fold-const.c
> > ===================================================================
> > *** gcc/fold-const.c.orig	2015-10-29 12:32:33.302782318 +0100
> > --- gcc/fold-const.c	2015-10-29 14:08:39.936497739 +0100
> > *************** fold_plusminus_mult_expr (location_t loc
> > *** 6916,6925 ****
> >       }
> >     same = NULL_TREE;
> >   
> > !   if (operand_equal_p (arg01, arg11, 0))
> > !     same = arg01, alt0 = arg00, alt1 = arg10;
> > !   else if (operand_equal_p (arg00, arg10, 0))
> >       same = arg00, alt0 = arg01, alt1 = arg11;
> >     else if (operand_equal_p (arg00, arg11, 0))
> >       same = arg00, alt0 = arg01, alt1 = arg10;
> >     else if (operand_equal_p (arg01, arg10, 0))
> > --- 6916,6926 ----
> >       }
> >     same = NULL_TREE;
> >   
> > !   /* Prefer factoring a common non-constant.  */
> > !   if (operand_equal_p (arg00, arg10, 0))
> >       same = arg00, alt0 = arg01, alt1 = arg11;
> > +   else if (operand_equal_p (arg01, arg11, 0))
> > +     same = arg01, alt0 = arg00, alt1 = arg10;
> >     else if (operand_equal_p (arg00, arg11, 0))
> >       same = arg00, alt0 = arg01, alt1 = arg10;
> >     else if (operand_equal_p (arg01, arg10, 0))
> > *************** fold_plusminus_mult_expr (location_t loc
> > *** 6974,6987 ****
> >   	}
> >       }
> >   
> > !   if (same)
> >       return fold_build2_loc (loc, MULT_EXPR, type,
> >   			fold_build2_loc (loc, code, type,
> >   				     fold_convert_loc (loc, type, alt0),
> >   				     fold_convert_loc (loc, type, alt1)),
> >   			fold_convert_loc (loc, type, same));
> >   
> > !   return NULL_TREE;
> >   }
> >   
> >   /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
> > --- 6975,7010 ----
> >   	}
> >       }
> >   
> > !   if (!same)
> > !     return NULL_TREE;
> > ! 
> > !   if (! INTEGRAL_TYPE_P (type)
> > !       || TYPE_OVERFLOW_WRAPS (type)
> > !       /* We are neither factoring zero nor minus one.  */
> > !       || TREE_CODE (same) == INTEGER_CST)
> >       return fold_build2_loc (loc, MULT_EXPR, type,
> >   			fold_build2_loc (loc, code, type,
> >   				     fold_convert_loc (loc, type, alt0),
> >   				     fold_convert_loc (loc, type, alt1)),
> >   			fold_convert_loc (loc, type, same));
> >   
> > !   /* Same may be zero and thus the operation 'code' may overflow.  Likewise
> > !      same may be minus one and thus the multiplication may overflow.  Perform
> > !      the operations in an unsigned type.  */
> > !   tree utype = unsigned_type_for (type);
> > !   tree tem = fold_build2_loc (loc, code, utype,
> > ! 			      fold_convert_loc (loc, utype, alt0),
> > ! 			      fold_convert_loc (loc, utype, alt1));
> > !   /* If the sum evaluated to a constant that is not -INF the multiplication
> > !      cannot overflow.  */
> > !   if (TREE_CODE (tem) == INTEGER_CST
> > !       && ! wi::eq_p (tem, wi::min_value (TYPE_PRECISION (utype), SIGNED)))
> > !     return fold_build2_loc (loc, MULT_EXPR, type,
> > ! 			    fold_convert (type, tem), same);
> > ! 
> > !   return fold_convert_loc (loc, type,
> > ! 			   fold_build2_loc (loc, MULT_EXPR, utype, tem,
> > ! 					    fold_convert_loc (loc, utype, same)));
> >   }
> >   
> >   /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
> 
> Sorry if you already know, but this part means that we can no longer
> vectorise:
> 
> int
> f (int *x, int b1, int b2, int b3)
> {
>   int foo = 0;
>   for (int i1 = 0; i1 < b1; ++i1)
>     for (int i2 = 0; i2 < b2; ++i2)
>       for (int i3 = 0; i3 < b3; ++i3)
>         foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
>   return foo;
> }
> 
> We now convert all the arithmetic in the [...] to unsigned int
> and reassociate it so that the "- 1" is applied last.  We then assume
> that the overflow in this subtraction is well-defined and that the
> &x[...] might not be linear for the inner loop.

Can you file a PR?

  # i3_44 = PHI <i3_32(6), 0(4)>
  i3.2_7 = (unsigned int) i3_44;
  _9 = i3.2_7 + _34;
  _10 = (int) _9;
  _11 = (long unsigned int) _10;
  _12 = _11 * 4;
  _13 = x_30(D) + _12;
  _14 = *_13;
...
   i3_32 = i3_44 + 1;

so we have *(x_30(D) + 4 * ((sizetype)(int)((unsigned){ 0, + , 1} + X))

It might mean that we need to avoid this re-association.  Not sure how
to detect worthwhile vs. not worthwhile cases during folding.  At least
I see no easy way to recover from it in SCEV analysis unless the
number of iterations is constrained more than in your example.

Ideas welcome ;)

Thanks,
Richard.