From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13696 invoked by alias); 12 Apr 2012 14:43:39 -0000 Received: (qmail 13678 invoked by uid 22791); 12 Apr 2012 14:43:35 -0000 X-SWARE-Spam-Status: No, hits=-4.7 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,KHOP_THREADED,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,TW_FN,TW_HF,TW_NL,TW_TM X-Spam-Check-By: sourceware.org Received: from mail-iy0-f175.google.com (HELO mail-iy0-f175.google.com) (209.85.210.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 12 Apr 2012 14:43:14 +0000 Received: by iaag37 with SMTP id g37so3047731iaa.20 for ; Thu, 12 Apr 2012 07:43:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.117.129 with SMTP id t1mr2396787icq.0.1334241793441; Thu, 12 Apr 2012 07:43:13 -0700 (PDT) Received: by 10.42.228.200 with HTTP; Thu, 12 Apr 2012 07:43:13 -0700 (PDT) In-Reply-To: <1333633759.19102.46.camel@gnopaine> References: <1331066954.17488.7.camel@gnopaine> <1333484715.19102.22.camel@gnopaine> <1333542912.19102.27.camel@gnopaine> <1333566953.4145.2.camel@oc2474580526.ibm.com> <1333633759.19102.46.camel@gnopaine> Date: Thu, 12 Apr 2012 14:43:00 -0000 Message-ID: Subject: Re: [PATCH] Fix PR18589 From: Richard Guenther To: "William J. Schmidt" Cc: gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2012-04/txt/msg00769.txt.bz2 On Thu, Apr 5, 2012 at 3:49 PM, William J. Schmidt wrote: > On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: >> On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt >> wrote: >> > >> > Unfortunately this seems to be necessary if I name the two passes >> > "reassoc1" and "reassoc2". =A0If I try to name both of them "reassoc" I >> > get failures in other tests like gfortran.dg/reassoc_4, where >> > -fdump-tree-reassoc1 doesn't work. =A0Unless I'm missing something >> > obvious, I think I need to keep that change. >> >> Hm, naming them "reassoc1" and "reassoc2" is a hack. =A0Naming both >> "reassoc" will not trigger re-naming them to reassoc1 and reassoc2 >> I think. =A0How ugly. =A0Especially that -fdump-tree-reassoc will no lon= ger >> work. =A0Maybe instead of using two pass structs resort to using >> the existing hack with using first_pass_instance and TODO_mark_first_ins= tance. > > OK, that seems to be the best among evils. =A0Using the > first_pass_instance hack, the patch is transformed as below. > Regstrapped on powerpc64-linux, no additional failures. =A0OK for trunk? Ok. Thanks, Richard. > Thanks, > Bill > > > gcc: > > 2012-04-05 =A0Bill Schmidt =A0 > > =A0 =A0 =A0 =A0PR tree-optimization/18589 > =A0 =A0 =A0 =A0* tree-ssa-reassoc.c (reassociate_stats): Add two fields. > =A0 =A0 =A0 =A0(operand_entry): Add count field. > =A0 =A0 =A0 =A0(add_repeat_to_ops_vec): New function. > =A0 =A0 =A0 =A0(completely_remove_stmt): Likewise. > =A0 =A0 =A0 =A0(remove_def_if_absorbed_call): Likewise. > =A0 =A0 =A0 =A0(remove_visited_stmt_chain): Remove feeding builtin pow/po= wi calls. > =A0 =A0 =A0 =A0(acceptable_pow_call): New function. > =A0 =A0 =A0 =A0(linearize_expr_tree): Look for builtin pow/powi calls and= add operand > =A0 =A0 =A0 =A0entries with repeat counts when found. > =A0 =A0 =A0 =A0(repeat_factor_d): New struct and associated typedefs. > =A0 =A0 =A0 =A0(repeat_factor_vec): New static vector variable. > =A0 =A0 =A0 =A0(compare_repeat_factors): New function. > =A0 =A0 =A0 =A0(get_reassoc_pow_ssa_name): Likewise. > =A0 =A0 =A0 =A0(attempt_builtin_powi): Likewise. > =A0 =A0 =A0 =A0(reassociate_bb): Call attempt_builtin_powi. > =A0 =A0 =A0 =A0(fini_reassoc): Two new calls to statistics_counter_event. > > gcc/testsuite: > > 2012-04-05 =A0Bill Schmidt =A0 > > =A0 =A0 =A0 =A0PR tree-optimization/18589 > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-1.c: New test. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-2.c: Likewise. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-3.c: Likewise. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-4.c: Likewise. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-5.c: Likewise. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-6.c: Likewise. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-7.c: Likewise. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-8.c: Likewise. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-9.c: Likewise. > =A0 =A0 =A0 =A0* gcc.dg/tree-ssa/pr18589-10.c: Likewise. > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c =A0 (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z, double u) > +{ > + =A0return x * x * y * y * y * z * z * z * z * u; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c =A0 (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z, double u) > +{ > + =A0return x * x * x * y * y * y * z * z * z * z * u * u * u * u; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c =A0 (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y) > +{ > + =A0return __builtin_pow (x, 3.0) * __builtin_pow (y, 4.0); > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 4 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c =A0 (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +float baz (float x, float y) > +{ > + =A0return x * x * x * x * y * y * y * y; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c =A0 (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +long double baz (long double x, long double y) > +{ > + =A0return x * x * x * x * y * y * y * y; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c =A0 (revision 0) > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z) > +{ > + =A0return (__builtin_pow (x, 3.0) * __builtin_pow (y, 2.0) > + =A0 =A0 =A0 =A0 * __builtin_pow (z, 5.0)); > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c =A0(revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c =A0(revision 0) > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z) > +{ > + =A0return (__builtin_pow (x, 4.0) * __builtin_pow (y, 2.0) > + =A0 =A0 =A0 =A0 * __builtin_pow (z, 4.0)); > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c =A0 (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y) > +{ > + =A0return x * x * x * x * y * y * y * y; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c =A0 (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y) > +{ > + =A0return x * y * y * x * y * x * x * y; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c =A0 (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c =A0 (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z) > +{ > + =A0return x * x * y * y * y * z * z * z * z; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/tree-ssa-reassoc.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/tree-ssa-reassoc.c =A0 =A0 =A0(revision 186108) > +++ gcc/tree-ssa-reassoc.c =A0 =A0 =A0(working copy) > @@ -61,6 +61,10 @@ along with GCC; see the file COPYING3. =A0If not see > =A0 =A0 3. Optimization of the operand lists, eliminating things like a + > =A0 =A0 -a, a & a, etc. > > + =A0 =A03a. Combine repeated factors with the same occurrence counts > + =A0 =A0into a __builtin_powi call that will later be optimized into > + =A0 =A0an optimal number of multiplies. > + > =A0 =A0 4. Rewrite the expression trees we linearized and optimized so > =A0 =A0 they are in proper rank order. > > @@ -169,6 +173,8 @@ static struct > =A0 int constants_eliminated; > =A0 int ops_eliminated; > =A0 int rewritten; > + =A0int pows_encountered; > + =A0int pows_created; > =A0} reassociate_stats; > > =A0/* Operator, rank pair. =A0*/ > @@ -177,6 +183,7 @@ typedef struct operand_entry > =A0 unsigned int rank; > =A0 int id; > =A0 tree op; > + =A0unsigned int count; > =A0} *operand_entry_t; > > =A0static alloc_pool operand_entry_pool; > @@ -515,9 +522,28 @@ add_to_ops_vec (VEC(operand_entry_t, heap) **ops, > =A0 oe->op =3D op; > =A0 oe->rank =3D get_rank (op); > =A0 oe->id =3D next_operand_entry_id++; > + =A0oe->count =3D 1; > =A0 VEC_safe_push (operand_entry_t, heap, *ops, oe); > =A0} > > +/* Add an operand entry to *OPS for the tree operand OP with repeat > + =A0 count REPEAT. =A0*/ > + > +static void > +add_repeat_to_ops_vec (VEC(operand_entry_t, heap) **ops, tree op, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0HOST_WIDE_INT repeat) > +{ > + =A0operand_entry_t oe =3D (operand_entry_t) pool_alloc (operand_entry_p= ool); > + > + =A0oe->op =3D op; > + =A0oe->rank =3D get_rank (op); > + =A0oe->id =3D next_operand_entry_id++; > + =A0oe->count =3D repeat; > + =A0VEC_safe_push (operand_entry_t, heap, *ops, oe); > + > + =A0reassociate_stats.pows_encountered++; > +} > + > =A0/* Return true if STMT is reassociable operation containing a binary > =A0 =A0operation with tree code CODE, and is inside LOOP. =A0*/ > > @@ -2049,6 +2075,32 @@ is_phi_for_stmt (gimple stmt, tree operand) > =A0 return false; > =A0} > > +/* Remove STMT, unlink its virtual defs, and release its SSA defs. =A0*/ > + > +static inline void > +completely_remove_stmt (gimple stmt) > +{ > + =A0gimple_stmt_iterator gsi =3D gsi_for_stmt (stmt); > + =A0gsi_remove (&gsi, true); > + =A0unlink_stmt_vdef (stmt); > + =A0release_defs (stmt); > +} > + > +/* If OP is defined by a builtin call that has been absorbed by > + =A0 reassociation, remove its defining statement completely. =A0*/ > + > +static inline void > +remove_def_if_absorbed_call (tree op) > +{ > + =A0gimple stmt; > + > + =A0if (TREE_CODE (op) =3D=3D SSA_NAME > + =A0 =A0 =A0&& has_zero_uses (op) > + =A0 =A0 =A0&& is_gimple_call ((stmt =3D SSA_NAME_DEF_STMT (op))) > + =A0 =A0 =A0&& gimple_visited_p (stmt)) > + =A0 =A0completely_remove_stmt (stmt); > +} > + > =A0/* Remove def stmt of VAR if VAR has zero uses and recurse > =A0 =A0on rhs1 operand if so. =A0*/ > > @@ -2057,19 +2109,31 @@ remove_visited_stmt_chain (tree var) > =A0{ > =A0 gimple stmt; > =A0 gimple_stmt_iterator gsi; > + =A0tree var2; > > =A0 while (1) > =A0 =A0 { > =A0 =A0 =A0 if (TREE_CODE (var) !=3D SSA_NAME || !has_zero_uses (var)) > =A0 =A0 =A0 =A0return; > =A0 =A0 =A0 stmt =3D SSA_NAME_DEF_STMT (var); > - =A0 =A0 =A0if (!is_gimple_assign (stmt) > - =A0 =A0 =A0 =A0 || !gimple_visited_p (stmt)) > + =A0 =A0 =A0if (is_gimple_assign (stmt) && gimple_visited_p (stmt)) > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 var =3D gimple_assign_rhs1 (stmt); > + =A0 =A0 =A0 =A0 var2 =3D gimple_assign_rhs2 (stmt); > + =A0 =A0 =A0 =A0 gsi =3D gsi_for_stmt (stmt); > + =A0 =A0 =A0 =A0 gsi_remove (&gsi, true); > + =A0 =A0 =A0 =A0 release_defs (stmt); > + =A0 =A0 =A0 =A0 /* A multiply whose operands are both fed by builtin po= w/powi > + =A0 =A0 =A0 =A0 =A0 =A0calls must check whether to remove rhs2 as well.= =A0*/ > + =A0 =A0 =A0 =A0 remove_def_if_absorbed_call (var2); > + =A0 =A0 =A0 } > + =A0 =A0 =A0else if (is_gimple_call (stmt) && gimple_visited_p (stmt)) > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 completely_remove_stmt (stmt); > + =A0 =A0 =A0 =A0 return; > + =A0 =A0 =A0 } > + =A0 =A0 =A0else > =A0 =A0 =A0 =A0return; > - =A0 =A0 =A0var =3D gimple_assign_rhs1 (stmt); > - =A0 =A0 =A0gsi =3D gsi_for_stmt (stmt); > - =A0 =A0 =A0gsi_remove (&gsi, true); > - =A0 =A0 =A0release_defs (stmt); > =A0 =A0 } > =A0} > > @@ -2564,6 +2628,75 @@ break_up_subtract (gimple stmt, gimple_stmt_iterat > =A0 update_stmt (stmt); > =A0} > > +/* Determine whether STMT is a builtin call that raises an SSA name > + =A0 to an integer power and has only one use. =A0If so, and this is ear= ly > + =A0 reassociation and unsafe math optimizations are permitted, place > + =A0 the SSA name in *BASE and the exponent in *EXPONENT, and return TRU= E. > + =A0 If any of these conditions does not hold, return FALSE. =A0*/ > + > +static bool > +acceptable_pow_call (gimple stmt, tree *base, HOST_WIDE_INT *exponent) > +{ > + =A0tree fndecl, arg1; > + =A0REAL_VALUE_TYPE c, cint; > + > + =A0if (!first_pass_instance > + =A0 =A0 =A0|| !flag_unsafe_math_optimizations > + =A0 =A0 =A0|| !is_gimple_call (stmt) > + =A0 =A0 =A0|| !has_single_use (gimple_call_lhs (stmt))) > + =A0 =A0return false; > + > + =A0fndecl =3D gimple_call_fndecl (stmt); > + > + =A0if (!fndecl > + =A0 =A0 =A0|| DECL_BUILT_IN_CLASS (fndecl) !=3D BUILT_IN_NORMAL) > + =A0 =A0return false; > + > + =A0switch (DECL_FUNCTION_CODE (fndecl)) > + =A0 =A0{ > + =A0 =A0CASE_FLT_FN (BUILT_IN_POW): > + =A0 =A0 =A0*base =3D gimple_call_arg (stmt, 0); > + =A0 =A0 =A0arg1 =3D gimple_call_arg (stmt, 1); > + > + =A0 =A0 =A0if (TREE_CODE (arg1) !=3D REAL_CST) > + =A0 =A0 =A0 return false; > + > + =A0 =A0 =A0c =3D TREE_REAL_CST (arg1); > + > + =A0 =A0 =A0if (REAL_EXP (&c) > HOST_BITS_PER_WIDE_INT) > + =A0 =A0 =A0 return false; > + > + =A0 =A0 =A0*exponent =3D real_to_integer (&c); > + =A0 =A0 =A0real_from_integer (&cint, VOIDmode, *exponent, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*exponent < 0 ? -1 : 0, = 0); > + =A0 =A0 =A0if (!real_identical (&c, &cint)) > + =A0 =A0 =A0 return false; > + > + =A0 =A0 =A0break; > + > + =A0 =A0CASE_FLT_FN (BUILT_IN_POWI): > + =A0 =A0 =A0*base =3D gimple_call_arg (stmt, 0); > + =A0 =A0 =A0arg1 =3D gimple_call_arg (stmt, 1); > + > + =A0 =A0 =A0if (!host_integerp (arg1, 0)) > + =A0 =A0 =A0 return false; > + > + =A0 =A0 =A0*exponent =3D TREE_INT_CST_LOW (arg1); > + =A0 =A0 =A0break; > + > + =A0 =A0default: > + =A0 =A0 =A0return false; > + =A0 =A0} > + > + =A0/* Expanding negative exponents is generally unproductive, so we don= 't > + =A0 =A0 complicate matters with those. =A0Exponents of zero and one sho= uld > + =A0 =A0 have been handled by expression folding. =A0*/ > + =A0if (*exponent < 2 || TREE_CODE (*base) !=3D SSA_NAME) > + =A0 =A0return false; > + > + =A0return true; > +} > + > =A0/* Recursively linearize a binary expression that is the RHS of STMT. > =A0 =A0Place the operands of the expression tree in the vector named OPS.= =A0*/ > > @@ -2573,11 +2706,13 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** > =A0{ > =A0 tree binlhs =3D gimple_assign_rhs1 (stmt); > =A0 tree binrhs =3D gimple_assign_rhs2 (stmt); > - =A0gimple binlhsdef, binrhsdef; > + =A0gimple binlhsdef =3D NULL, binrhsdef =3D NULL; > =A0 bool binlhsisreassoc =3D false; > =A0 bool binrhsisreassoc =3D false; > =A0 enum tree_code rhscode =3D gimple_assign_rhs_code (stmt); > =A0 struct loop *loop =3D loop_containing_stmt (stmt); > + =A0tree base =3D NULL_TREE; > + =A0HOST_WIDE_INT exponent =3D 0; > > =A0 if (set_visited) > =A0 =A0 gimple_set_visited (stmt, true); > @@ -2615,8 +2750,26 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** > > =A0 =A0 =A0 if (!binrhsisreassoc) > =A0 =A0 =A0 =A0{ > - =A0 =A0 =A0 =A0 add_to_ops_vec (ops, binrhs); > - =A0 =A0 =A0 =A0 add_to_ops_vec (ops, binlhs); > + =A0 =A0 =A0 =A0 if (rhscode =3D=3D MULT_EXPR > + =A0 =A0 =A0 =A0 =A0 =A0 && TREE_CODE (binrhs) =3D=3D SSA_NAME > + =A0 =A0 =A0 =A0 =A0 =A0 && acceptable_pow_call (binrhsdef, &base, &expo= nent)) > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 add_repeat_to_ops_vec (ops, base, exponent); > + =A0 =A0 =A0 =A0 =A0 =A0 gimple_set_visited (binrhsdef, true); > + =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 else > + =A0 =A0 =A0 =A0 =A0 add_to_ops_vec (ops, binrhs); > + > + =A0 =A0 =A0 =A0 if (rhscode =3D=3D MULT_EXPR > + =A0 =A0 =A0 =A0 =A0 =A0 && TREE_CODE (binlhs) =3D=3D SSA_NAME > + =A0 =A0 =A0 =A0 =A0 =A0 && acceptable_pow_call (binlhsdef, &base, &expo= nent)) > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 add_repeat_to_ops_vec (ops, base, exponent); > + =A0 =A0 =A0 =A0 =A0 =A0 gimple_set_visited (binlhsdef, true); > + =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 else > + =A0 =A0 =A0 =A0 =A0 add_to_ops_vec (ops, binlhs); > + > =A0 =A0 =A0 =A0 =A0return; > =A0 =A0 =A0 =A0} > > @@ -2655,7 +2808,16 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0rhscode, loop)); > =A0 linearize_expr_tree (ops, SSA_NAME_DEF_STMT (binlhs), > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 is_associative, set_visited); > - =A0add_to_ops_vec (ops, binrhs); > + > + =A0if (rhscode =3D=3D MULT_EXPR > + =A0 =A0 =A0&& TREE_CODE (binrhs) =3D=3D SSA_NAME > + =A0 =A0 =A0&& acceptable_pow_call (SSA_NAME_DEF_STMT (binrhs), &base, &= exponent)) > + =A0 =A0{ > + =A0 =A0 =A0add_repeat_to_ops_vec (ops, base, exponent); > + =A0 =A0 =A0gimple_set_visited (SSA_NAME_DEF_STMT (binrhs), true); > + =A0 =A0} > + =A0else > + =A0 =A0add_to_ops_vec (ops, binrhs); > =A0} > > =A0/* Repropagate the negates back into subtracts, since no other pass > @@ -2815,6 +2977,347 @@ break_up_subtract_bb (basic_block bb) > =A0 =A0 break_up_subtract_bb (son); > =A0} > > +/* Used for repeated factor analysis. =A0*/ > +struct repeat_factor_d > +{ > + =A0/* An SSA name that occurs in a multiply chain. =A0*/ > + =A0tree factor; > + > + =A0/* Cached rank of the factor. =A0*/ > + =A0unsigned rank; > + > + =A0/* Number of occurrences of the factor in the chain. =A0*/ > + =A0HOST_WIDE_INT count; > + > + =A0/* An SSA name representing the product of this factor and > + =A0 =A0 all factors appearing later in the repeated factor vector. =A0*/ > + =A0tree repr; > +}; > + > +typedef struct repeat_factor_d repeat_factor, *repeat_factor_t; > +typedef const struct repeat_factor_d *const_repeat_factor_t; > + > +DEF_VEC_O (repeat_factor); > +DEF_VEC_ALLOC_O (repeat_factor, heap); > + > +static VEC (repeat_factor, heap) *repeat_factor_vec; > + > +/* Used for sorting the repeat factor vector. =A0Sort primarily by > + =A0 ascending occurrence count, secondarily by descending rank. =A0*/ > + > +static int > +compare_repeat_factors (const void *x1, const void *x2) > +{ > + =A0const_repeat_factor_t rf1 =3D (const_repeat_factor_t) x1; > + =A0const_repeat_factor_t rf2 =3D (const_repeat_factor_t) x2; > + > + =A0if (rf1->count !=3D rf2->count) > + =A0 =A0return rf1->count - rf2->count; > + > + =A0return rf2->rank - rf1->rank; > +} > + > +/* Get a new SSA name for register variable *TARGET of type TYPE. > + =A0 If *TARGET is null or incompatible with TYPE, create the variable > + =A0 first. =A0*/ > + > +static tree > +get_reassoc_pow_ssa_name (tree *target, tree type) > +{ > + =A0if (!*target || !types_compatible_p (type, TREE_TYPE (*target))) > + =A0 =A0{ > + =A0 =A0 =A0*target =3D create_tmp_reg (type, "reassocpow"); > + =A0 =A0 =A0add_referenced_var (*target); > + =A0 =A0} > + > + =A0return make_ssa_name (*target, NULL); > +} > + > +/* Look for repeated operands in OPS in the multiply tree rooted at > + =A0 STMT. =A0Replace them with an optimal sequence of multiplies and po= wi > + =A0 builtin calls, and remove the used operands from OPS. =A0Push new > + =A0 SSA names onto OPS that represent the introduced multiplies and > + =A0 builtin calls. =A0*/ > + > +static void > +attempt_builtin_powi (gimple stmt, VEC(operand_entry_t, heap) **ops, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 tree *target) > +{ > + =A0unsigned i, j, vec_len; > + =A0int ii; > + =A0operand_entry_t oe; > + =A0repeat_factor_t rf1, rf2; > + =A0repeat_factor rfnew; > + =A0tree target_ssa, iter_result; > + =A0tree type =3D TREE_TYPE (gimple_get_lhs (stmt)); > + =A0tree powi_fndecl =3D mathfn_built_in (type, BUILT_IN_POWI); > + =A0gimple_stmt_iterator gsi =3D gsi_for_stmt (stmt); > + =A0gimple mul_stmt, pow_stmt; > + > + =A0/* Nothing to do if BUILT_IN_POWI doesn't exist for this type and > + =A0 =A0 target. =A0*/ > + =A0if (!powi_fndecl) > + =A0 =A0return; > + > + =A0/* Allocate the repeated factor vector. =A0*/ > + =A0repeat_factor_vec =3D VEC_alloc (repeat_factor, heap, 10); > + > + =A0/* Scan the OPS vector for all SSA names in the product and build > + =A0 =A0 up a vector of occurrence counts for each factor. =A0*/ > + =A0FOR_EACH_VEC_ELT (operand_entry_t, *ops, i, oe) > + =A0 =A0{ > + =A0 =A0 =A0if (TREE_CODE (oe->op) =3D=3D SSA_NAME) > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, = rf1) > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 if (rf1->factor =3D=3D oe->op) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rf1->count +=3D oe->count; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 } > + > + =A0 =A0 =A0 =A0 if (j >=3D VEC_length (repeat_factor, repeat_factor_vec= )) > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 rfnew.factor =3D oe->op; > + =A0 =A0 =A0 =A0 =A0 =A0 rfnew.rank =3D oe->rank; > + =A0 =A0 =A0 =A0 =A0 =A0 rfnew.count =3D oe->count; > + =A0 =A0 =A0 =A0 =A0 =A0 rfnew.repr =3D NULL_TREE; > + =A0 =A0 =A0 =A0 =A0 =A0 VEC_safe_push (repeat_factor, heap, repeat_fact= or_vec, &rfnew); > + =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 } > + =A0 =A0} > + > + =A0/* Sort the repeated factor vector by (a) increasing occurrence coun= t, > + =A0 =A0 and (b) decreasing rank. =A0*/ > + =A0VEC_qsort (repeat_factor, repeat_factor_vec, compare_repeat_factors); > + > + =A0/* It is generally best to combine as many base factors as possible > + =A0 =A0 into a product before applying __builtin_powi to the result. > + =A0 =A0 However, the sort order chosen for the repeated factor vector > + =A0 =A0 allows us to cache partial results for the product of the base > + =A0 =A0 factors for subsequent use. =A0When we already have a cached pa= rtial > + =A0 =A0 result from a previous iteration, it is best to make use of it > + =A0 =A0 before looking for another __builtin_pow opportunity. > + > + =A0 =A0 As an example, consider x * x * y * y * y * z * z * z * z. > + =A0 =A0 We want to first compose the product x * y * z, raise it to the > + =A0 =A0 second power, then multiply this by y * z, and finally multiply > + =A0 =A0 by z. =A0This can be done in 5 multiplies provided we cache y *= z > + =A0 =A0 for use in both expressions: > + > + =A0 =A0 =A0 =A0t1 =3D y * z > + =A0 =A0 =A0 t2 =3D t1 * x > + =A0 =A0 =A0 t3 =3D t2 * t2 > + =A0 =A0 =A0 t4 =3D t1 * t3 > + =A0 =A0 =A0 result =3D t4 * z > + > + =A0 =A0 If we instead ignored the cached y * z and first multiplied by > + =A0 =A0 the __builtin_pow opportunity z * z, we would get the inferior: > + > + =A0 =A0 =A0 =A0t1 =3D y * z > + =A0 =A0 =A0 t2 =3D t1 * x > + =A0 =A0 =A0 t3 =3D t2 * t2 > + =A0 =A0 =A0 t4 =3D z * z > + =A0 =A0 =A0 t5 =3D t3 * t4 > + =A0 =A0 =A0 =A0result =3D t5 * y =A0*/ > + > + =A0vec_len =3D VEC_length (repeat_factor, repeat_factor_vec); > + > + =A0/* Repeatedly look for opportunities to create a builtin_powi call. = =A0*/ > + =A0while (true) > + =A0 =A0{ > + =A0 =A0 =A0HOST_WIDE_INT power; > + > + =A0 =A0 =A0/* First look for the largest cached product of factors from > + =A0 =A0 =A0 =A0preceding iterations. =A0If found, create a builtin_powi= for > + =A0 =A0 =A0 =A0it if the minimum occurrence count for its factors is at > + =A0 =A0 =A0 =A0least 2, or just use this cached product as our next > + =A0 =A0 =A0 =A0multiplicand if the minimum occurrence count is 1. =A0*/ > + =A0 =A0 =A0FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1) > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 if (rf1->repr && rf1->count > 0) > + =A0 =A0 =A0 =A0 =A0 break; > + =A0 =A0 =A0 } > + > + =A0 =A0 =A0if (j < vec_len) > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 power =3D rf1->count; > + > + =A0 =A0 =A0 =A0 if (power =3D=3D 1) > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 iter_result =3D rf1->repr; > + > + =A0 =A0 =A0 =A0 =A0 =A0 if (dump_file && (dump_flags & TDF_DETAILS)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 unsigned elt; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 repeat_factor_t rf; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fputs ("Multiplying by cached product "= , dump_file); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (elt =3D j; elt < vec_len; elt++) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rf =3D VEC_index (repeat_factor= , repeat_factor_vec, elt); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 print_generic_expr (dump_file, = rf->factor, 0); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (elt < vec_len - 1) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fputs (" * ", dump_file); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fputs ("\n", dump_file); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 else > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 iter_result =3D get_reassoc_pow_ssa_name (targe= t, type); > + =A0 =A0 =A0 =A0 =A0 =A0 pow_stmt =3D gimple_build_call (powi_fndecl, 2,= rf1->repr, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 build_int_cst (integer_type_node, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0power)); > + =A0 =A0 =A0 =A0 =A0 =A0 gimple_call_set_lhs (pow_stmt, iter_result); > + =A0 =A0 =A0 =A0 =A0 =A0 gimple_set_location (pow_stmt, gimple_location = (stmt)); > + =A0 =A0 =A0 =A0 =A0 =A0 gsi_insert_before (&gsi, pow_stmt, GSI_SAME_STM= T); > + > + =A0 =A0 =A0 =A0 =A0 =A0 if (dump_file && (dump_flags & TDF_DETAILS)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 unsigned elt; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 repeat_factor_t rf; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fputs ("Building __builtin_pow call for= cached product (", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0dump_file); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 for (elt =3D j; elt < vec_len; elt++) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rf =3D VEC_index (repeat_factor= , repeat_factor_vec, elt); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 print_generic_expr (dump_file, = rf->factor, 0); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (elt < vec_len - 1) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fputs (" * ", dump_file); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fprintf (dump_file, ")^%ld\n", power); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 } > + =A0 =A0 =A0else > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 /* Otherwise, find the first factor in the repeated fac= tor > + =A0 =A0 =A0 =A0 =A0 =A0vector whose occurrence count is at least 2. =A0= If no such > + =A0 =A0 =A0 =A0 =A0 =A0factor exists, there are no builtin_powi opportu= nities > + =A0 =A0 =A0 =A0 =A0 =A0remaining. =A0*/ > + =A0 =A0 =A0 =A0 FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, = rf1) > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 if (rf1->count >=3D 2) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > + =A0 =A0 =A0 =A0 =A0 } > + > + =A0 =A0 =A0 =A0 if (j >=3D vec_len) > + =A0 =A0 =A0 =A0 =A0 break; > + > + =A0 =A0 =A0 =A0 power =3D rf1->count; > + > + =A0 =A0 =A0 =A0 if (dump_file && (dump_flags & TDF_DETAILS)) > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 unsigned elt; > + =A0 =A0 =A0 =A0 =A0 =A0 repeat_factor_t rf; > + =A0 =A0 =A0 =A0 =A0 =A0 fputs ("Building __builtin_pow call for (", dum= p_file); > + =A0 =A0 =A0 =A0 =A0 =A0 for (elt =3D j; elt < vec_len; elt++) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rf =3D VEC_index (repeat_factor, repeat= _factor_vec, elt); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 print_generic_expr (dump_file, rf->fact= or, 0); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (elt < vec_len - 1) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fputs (" * ", dump_file); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 fprintf (dump_file, ")^%ld\n", power); > + =A0 =A0 =A0 =A0 =A0 } > + > + =A0 =A0 =A0 =A0 reassociate_stats.pows_created++; > + > + =A0 =A0 =A0 =A0 /* Visit each element of the vector in reverse order (s= o that > + =A0 =A0 =A0 =A0 =A0 =A0high-occurrence elements are visited first, and = within the > + =A0 =A0 =A0 =A0 =A0 =A0same occurrence count, lower-ranked elements are= visited > + =A0 =A0 =A0 =A0 =A0 =A0first). =A0Form a linear product of all elements= in this order > + =A0 =A0 =A0 =A0 =A0 =A0whose occurrencce count is at least that of elem= ent J. > + =A0 =A0 =A0 =A0 =A0 =A0Record the SSA name representing the product of = each element > + =A0 =A0 =A0 =A0 =A0 =A0with all subsequent elements in the vector. =A0*/ > + =A0 =A0 =A0 =A0 if (j =3D=3D vec_len - 1) > + =A0 =A0 =A0 =A0 =A0 rf1->repr =3D rf1->factor; > + =A0 =A0 =A0 =A0 else > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 for (ii =3D vec_len - 2; ii >=3D (int)j; ii--) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 tree op1, op2; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rf1 =3D VEC_index (repeat_factor, repea= t_factor_vec, ii); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rf2 =3D VEC_index (repeat_factor, repea= t_factor_vec, ii + 1); > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Init the last factor's representativ= e to be itself. =A0*/ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!rf2->repr) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rf2->repr =3D rf2->factor; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 op1 =3D rf1->factor; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 op2 =3D rf2->repr; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 target_ssa =3D get_reassoc_pow_ssa_name= (target, type); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 mul_stmt =3D gimple_build_assign_with_o= ps (MULT_EXPR, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0target_ssa, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0op1, op2); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gimple_set_location (mul_stmt, gimple_l= ocation (stmt)); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gsi_insert_before (&gsi, mul_stmt, GSI_= SAME_STMT); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rf1->repr =3D target_ssa; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Don't reprocess the multiply we just= introduced. =A0*/ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 gimple_set_visited (mul_stmt, true); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 } > + > + =A0 =A0 =A0 =A0 /* Form a call to __builtin_powi for the maximum product > + =A0 =A0 =A0 =A0 =A0 =A0just formed, raised to the power obtained earlie= r. =A0*/ > + =A0 =A0 =A0 =A0 rf1 =3D VEC_index (repeat_factor, repeat_factor_vec, j); > + =A0 =A0 =A0 =A0 iter_result =3D get_reassoc_pow_ssa_name (target, type); > + =A0 =A0 =A0 =A0 pow_stmt =3D gimple_build_call (powi_fndecl, 2, rf1->re= pr, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 build_int_cst (integer_type_node, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0power)); > + =A0 =A0 =A0 =A0 gimple_call_set_lhs (pow_stmt, iter_result); > + =A0 =A0 =A0 =A0 gimple_set_location (pow_stmt, gimple_location (stmt)); > + =A0 =A0 =A0 =A0 gsi_insert_before (&gsi, pow_stmt, GSI_SAME_STMT); > + =A0 =A0 =A0 } > + > + =A0 =A0 =A0/* Append the result of this iteration to the ops vector. = =A0*/ > + =A0 =A0 =A0add_to_ops_vec (ops, iter_result); > + > + =A0 =A0 =A0/* Decrement the occurrence count of each element in the pro= duct > + =A0 =A0 =A0 =A0by the count found above, and remove this many copies of= each > + =A0 =A0 =A0 =A0factor from OPS. =A0*/ > + =A0 =A0 =A0for (i =3D j; i < vec_len; i++) > + =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 unsigned k =3D power; > + =A0 =A0 =A0 =A0 unsigned n; > + > + =A0 =A0 =A0 =A0 rf1 =3D VEC_index (repeat_factor, repeat_factor_vec, i); > + =A0 =A0 =A0 =A0 rf1->count -=3D power; > + > + =A0 =A0 =A0 =A0 FOR_EACH_VEC_ELT_REVERSE (operand_entry_t, *ops, n, oe) > + =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 if (oe->op =3D=3D rf1->factor) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (oe->count <=3D k) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 VEC_ordered_remove (operand_ent= ry_t, *ops, n); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 k -=3D oe->count; > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (k =3D=3D 0) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 else > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 oe->count -=3D k; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 =A0 =A0 } > + =A0 =A0 =A0 } > + =A0 =A0} > + > + =A0/* At this point all elements in the repeated factor vector have a > + =A0 =A0 remaining occurrence count of 0 or 1, and those with a count of= 1 > + =A0 =A0 don't have cached representatives. =A0Re-sort the ops vector and > + =A0 =A0 clean up. =A0*/ > + =A0VEC_qsort (operand_entry_t, *ops, sort_by_operand_rank); > + =A0VEC_free (repeat_factor, heap, repeat_factor_vec); > +} > + > =A0/* Reassociate expressions in basic block BB and its post-dominator as > =A0 =A0children. =A0*/ > > @@ -2823,6 +3326,7 @@ reassociate_bb (basic_block bb) > =A0{ > =A0 gimple_stmt_iterator gsi; > =A0 basic_block son; > + =A0tree target =3D NULL_TREE; > > =A0 for (gsi =3D gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) > =A0 =A0 { > @@ -2904,6 +3408,11 @@ reassociate_bb (basic_block bb) > =A0 =A0 =A0 =A0 =A0 =A0 =A0if (rhs_code =3D=3D BIT_IOR_EXPR || rhs_code = =3D=3D BIT_AND_EXPR) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0optimize_range_tests (rhs_code, &ops); > > + =A0 =A0 =A0 =A0 =A0 =A0 if (first_pass_instance > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 && rhs_code =3D=3D MULT_EXPR > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 && flag_unsafe_math_optimizations) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 attempt_builtin_powi (stmt, &ops, &target); > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0if (VEC_length (operand_entry_t, ops) =3D=3D 1) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0{ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (dump_file && (dump_flags & TDF_DET= AILS)) > @@ -3054,6 +3563,10 @@ fini_reassoc (void) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0reassociate_stats.= ops_eliminated); > =A0 statistics_counter_event (cfun, "Statements rewritten", > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0reassociate_stats.= rewritten); > + =A0statistics_counter_event (cfun, "Built-in pow[i] calls encountered", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 reassociate_stats.p= ows_encountered); > + =A0statistics_counter_event (cfun, "Built-in powi calls created", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 reassociate_stats.p= ows_created); > > =A0 pointer_map_destroy (operand_rank); > =A0 free_alloc_pool (operand_entry_pool); > >