From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 490 invoked by alias); 27 May 2011 10:24:22 -0000 Received: (qmail 470 invoked by uid 22791); 27 May 2011 10:24:20 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,TW_FN,TW_HF,TW_TM X-Spam-Check-By: sourceware.org Received: from mail-wy0-f175.google.com (HELO mail-wy0-f175.google.com) (74.125.82.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 27 May 2011 10:24:05 +0000 Received: by wye20 with SMTP id 20so1327764wye.20 for ; Fri, 27 May 2011 03:24:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.180.130 with SMTP id bu2mr1933369wbb.71.1306491843632; Fri, 27 May 2011 03:24:03 -0700 (PDT) Received: by 10.227.37.152 with HTTP; Fri, 27 May 2011 03:24:03 -0700 (PDT) In-Reply-To: <1306432259.4821.73.camel@L3G5336.ibm.com> References: <1306345412.4821.56.camel@L3G5336.ibm.com> <1306432259.4821.73.camel@L3G5336.ibm.com> Date: Fri, 27 May 2011 13:10:00 -0000 Message-ID: Subject: Re: [PATCH] More pow(x,c) expansions in cse_sincos pass (PR46728, patch 3) From: Richard Guenther To: "William J. Schmidt" Cc: gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg02148.txt.bz2 On Thu, May 26, 2011 at 7:50 PM, William J. Schmidt wrote: > > On Thu, 2011-05-26 at 11:22 +0200, Richard Guenther wrote: > > > > >> There are several extra pre-conditions in the original code in >> builtins.c as well as commens for reasonings (yes, there seem >> to be duplicates of several of the transforms there ...). =A0Please >> try to preserve them. =A0I noticed especially >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Optimize pow (x, 0.25) into sqrt (sqrt (x= )). =A0Assume on most >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0machines that a builtin sqrt instruct= ion is smaller than a >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0call to pow with 0.25, so do this opt= imization even if >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-Os. =A0*/ >> >> and >> >> =A0 =A0 =A0 /* Check whether we can do cbrt insstead of pow (x, 1./3.) a= nd >> =A0 =A0 =A0 =A0 =A0cbrt/sqrts instead of pow (x, 1./6.). =A0*/ >> =A0 =A0 =A0 if (cbrtfn && ! op >> =A0 =A0 =A0 =A0 =A0 && (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (m= ode))) >> >> where you omit the non-negative check and the check for NANs. >> Note that -funsafe-math-optimizations does not mean you can >> ignore nans or signed zero issues. =A0pow(x,1/3) -> cbrt(x) does not >> need a -funsafe-math-optimization check either. > > OK. =A0Good catch on the non-negative/NaN check -- that just slipped > through the cracks. =A0I've fixed this and beefed up the comments as you > suggest. > >> >> It would be nice if you could re-arrange this function so that before >> each individual replacement all preconditions are checked (even if >> that means duplicating some checks and code) - that way >> reviewing and later maintainance should be much easier. >> > > Done -- you're right, the result is much cleaner. =A0Hopefully this > version is easier to follow. > > Thanks, > Bill > > > 2011-05-25 =A0Bill Schmidt =A0 > > =A0 =A0 =A0 =A0PR tree-optimization/46728 > =A0 =A0 =A0 =A0* tree-ssa-math-opts.c (powi_as_mults_1): Add gimple_set_l= ocation. > =A0 =A0 =A0 =A0(powi_as_mults): Add gimple_set_location. > =A0 =A0 =A0 =A0(build_and_insert_call): New. > =A0 =A0 =A0 =A0(gimple_expand_builtin_pow): Add handling for pow(x,y) whe= n y is > =A0 =A0 =A0 =A00.5, 0.25, 0.75, 1./3., or 1./6. > > > Index: gcc/tree-ssa-math-opts.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- gcc/tree-ssa-math-opts.c =A0 =A0(revision 174199) > +++ gcc/tree-ssa-math-opts.c =A0 =A0(working copy) > @@ -965,6 +965,7 @@ powi_as_mults_1 (gimple_stmt_iterator *gsi, locati > =A0 =A0 } > > =A0 mult_stmt =3D gimple_build_assign_with_ops (MULT_EXPR, ssa_target, op= 0, op1); > + =A0gimple_set_location (mult_stmt, loc); > =A0 gsi_insert_before (gsi, mult_stmt, GSI_SAME_STMT); > > =A0 return ssa_target; > @@ -999,6 +1000,7 @@ powi_as_mults (gimple_stmt_iterator *gsi, location > =A0 div_stmt =3D gimple_build_assign_with_ops (RDIV_EXPR, target, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 build_real (type, dconst1), > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 result); > + =A0gimple_set_location (div_stmt, loc); > =A0 gsi_insert_before (gsi, div_stmt, GSI_SAME_STMT); > > =A0 return target; > @@ -1024,6 +1026,34 @@ gimple_expand_builtin_powi (gimple_stmt_iterator * > =A0 return NULL_TREE; > =A0} > > +/* Build a gimple call statement that calls FN with argument ARG. > + =A0 Set the lhs of the call statement to a fresh SSA name for > + =A0 variable VAR. =A0If VAR is NULL, first allocate it. =A0Insert the > + =A0 statement prior to GSI's current position, and return the fresh > + =A0 SSA name. =A0*/ > + > +static tree > +build_and_insert_call (gimple_stmt_iterator *gsi, tree fn, tree arg, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tree *var, location_t loc) > +{ > + =A0gimple call_stmt; > + =A0tree ssa_target; > + > + =A0if (!*var) > + =A0 =A0{ > + =A0 =A0 =A0*var =3D create_tmp_var (TREE_TYPE (arg), "powroot"); > + =A0 =A0 =A0add_referenced_var (*var); > + =A0 =A0} > + > + =A0call_stmt =3D gimple_build_call (fn, 1, arg); > + =A0ssa_target =3D make_ssa_name (*var, NULL); > + =A0gimple_set_lhs (call_stmt, ssa_target); > + =A0gimple_set_location (call_stmt, loc); > + =A0gsi_insert_before (gsi, call_stmt, GSI_SAME_STMT); > + > + =A0return ssa_target; > +} > + > =A0/* ARG0 and ARG1 are the two arguments to a pow builtin call in GSI > =A0 =A0with location info LOC. =A0If possible, create an equivalent and > =A0 =A0less expensive sequence of statements prior to GSI, and return an > @@ -1033,16 +1063,21 @@ static tree > =A0gimple_expand_builtin_pow (gimple_stmt_iterator *gsi, location_t loc, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 tree arg0, tree arg1) > =A0{ > - =A0REAL_VALUE_TYPE c, cint; > + =A0REAL_VALUE_TYPE c, cint, dconst1_4, dconst3_4, dconst1_3, dconst1_6; > =A0 HOST_WIDE_INT n; > + =A0tree type, sqrtfn, cbrtfn, sqrt_arg0, sqrt_sqrt, ssa_target; > + =A0tree target =3D NULL_TREE; > + =A0enum machine_mode mode; > + =A0bool hw_sqrt_exists; > + =A0gimple mult_stmt; > > =A0 /* If the exponent isn't a constant, there's nothing of interest > =A0 =A0 =A0to be done. =A0*/ > =A0 if (TREE_CODE (arg1) !=3D REAL_CST) > =A0 =A0 return NULL_TREE; > > - =A0/* If the exponent is equivalent to an integer, expand it into > - =A0 =A0 multiplies when profitable. =A0*/ > + =A0/* If the exponent is equivalent to an integer, expand to an optimal > + =A0 =A0 multiplication sequence when profitable. =A0*/ > =A0 c =3D TREE_REAL_CST (arg1); > =A0 n =3D real_to_integer (&c); > =A0 real_from_integer (&cint, VOIDmode, n, n < 0 ? -1 : 0, 0); > @@ -1054,6 +1089,97 @@ gimple_expand_builtin_pow (gimple_stmt_iterator *g > =A0 =A0 =A0 =A0 =A0 =A0 =A0&& powi_cost (n) <=3D POWI_MAX_MULTS))) > =A0 =A0 return gimple_expand_builtin_powi (gsi, loc, arg0, n); > > + =A0/* Attempt various optimizations using sqrt and cbrt. =A0*/ > + =A0type =3D TREE_TYPE (arg0); > + =A0mode =3D TYPE_MODE (type); > + =A0sqrtfn =3D mathfn_built_in (type, BUILT_IN_SQRT); > + > + =A0/* Optimize pow(x,0.5) =3D sqrt(x). =A0This replacement is always sa= fe > + =A0 =A0 unless signed zeros must be maintained. =A0pow(-0,0.5) =3D +0, = while > + =A0 =A0 sqrt(-0) =3D -0. =A0*/ > + =A0if (sqrtfn > + =A0 =A0 =A0&& REAL_VALUES_EQUAL (c, dconsthalf) > + =A0 =A0 =A0&& (flag_unsafe_math_optimizations > + =A0 =A0 =A0 =A0 || !HONOR_SIGNED_ZEROS (mode))) Drop flag_unsafe_math_optimizations - the replacement isn't safe for -funsafe-math-optimizations -fsigned-zeros. > + =A0 =A0return build_and_insert_call (gsi, sqrtfn, arg0, &target, loc); > + > + =A0/* Optimize pow(x,0.25) =3D sqrt(sqrt(x)). =A0Assume on most machine= s that > + =A0 =A0 a builtin sqrt instruction is smaller than a call to pow with 0= .25, > + =A0 =A0 so do this optimization even if -Os. =A0Don't do this optimizat= ion > + =A0 =A0 if we don't have a hardware sqrt insn. =A0*/ > + =A0dconst1_4 =3D dconst1; > + =A0SET_REAL_EXP (&dconst1_4, REAL_EXP (&dconst1_4) - 2); > + =A0hw_sqrt_exists =3D optab_handler(sqrt_optab, mode) !=3D CODE_FOR_not= hing; > + > + =A0if (flag_unsafe_math_optimizations > + =A0 =A0 =A0&& sqrtfn > + =A0 =A0 =A0&& REAL_VALUES_EQUAL (c, dconst1_4) > + =A0 =A0 =A0&& hw_sqrt_exists) > + =A0 =A0{ > + =A0 =A0 =A0/* sqrt(x) =A0*/ > + =A0 =A0 =A0sqrt_arg0 =3D build_and_insert_call (gsi, sqrtfn, arg0, &tar= get, loc); > + > + =A0 =A0 =A0/* sqrt(sqrt(x)) =A0*/ > + =A0 =A0 =A0return build_and_insert_call (gsi, sqrtfn, sqrt_arg0, &targe= t, loc); > + =A0 =A0} > + > + =A0/* Optimize pow(x,0.75) =3D sqrt(x) * sqrt(sqrt(x)) unless we are > + =A0 =A0 optimizing for space. =A0Don't do this optimization if we don't= have > + =A0 =A0 a hardware sqrt insn. =A0*/ > + =A0real_from_integer (&dconst3_4, VOIDmode, 3, 0, 0); > + =A0SET_REAL_EXP (&dconst3_4, REAL_EXP (&dconst3_4) - 2); > + > + =A0if (flag_unsafe_math_optimizations > + =A0 =A0 =A0&& sqrtfn > + =A0 =A0 =A0&& optimize_function_for_speed_p (cfun) > + =A0 =A0 =A0&& REAL_VALUES_EQUAL (c, dconst3_4) > + =A0 =A0 =A0&& hw_sqrt_exists) > + =A0 =A0{ > + =A0 =A0 =A0/* sqrt(x) =A0*/ > + =A0 =A0 =A0sqrt_arg0 =3D build_and_insert_call (gsi, sqrtfn, arg0, &tar= get, loc); > + > + =A0 =A0 =A0/* sqrt(sqrt(x)) =A0*/ > + =A0 =A0 =A0sqrt_sqrt =3D build_and_insert_call (gsi, sqrtfn, sqrt_arg0,= &target, loc); > + > + =A0 =A0 =A0/* sqrt(x) * sqrt(sqrt(x)) =A0*/ > + =A0 =A0 =A0ssa_target =3D make_ssa_name (target, NULL); > + =A0 =A0 =A0mult_stmt =3D gimple_build_assign_with_ops (MULT_EXPR, ssa_t= arget, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 sqrt_arg0, sqrt_sqrt); > + =A0 =A0 =A0gimple_set_location (mult_stmt, loc); > + =A0 =A0 =A0gsi_insert_before (gsi, mult_stmt, GSI_SAME_STMT); > + > + =A0 =A0 =A0return ssa_target; > + =A0 =A0} > + > + =A0/* Optimize pow(x,1./3.) =3D cbrt(x). =A0This is always safe. =A0*/ > + =A0cbrtfn =3D mathfn_built_in (type, BUILT_IN_CBRT); > + =A0dconst1_3 =3D real_value_truncate (mode, dconst_third ()); > + > + =A0if (cbrtfn > + =A0 =A0 =A0&& (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (mode)) > + =A0 =A0 =A0&& REAL_VALUES_EQUAL (c, dconst1_3)) > + =A0 =A0return build_and_insert_call (gsi, cbrtfn, arg0, &target, loc); As Joseph said this needs to be conditional on flag_unsafe_math_optimization because of the inexactness of 1./3.. Please add rationale as explained by Joseph for the HONOR_NANS check. Now, the tree_expr_nonnegative_p check will always return false because arg0 is a gimple value. We would need to implement either some propagation engine to derive properties for floating point entities or mimic what tree_expr_nonnegative_p does by looking at defining statements. A very simple variant could be like bool gimple_val_nonnegative_p (tree val) { if (tree_expr_nonnegative_p (val)) return true; if (TREE_CODE (val) =3D=3D SSA_NAME) { gimple def_stmt =3D SSA_NAME_DEF_STMT (val); if (is_gimple_assign (def_stmt)) return gimple_assign_rhs_code (def_stmt) =3D=3D ABS_EXPR; else if (is_gimple_call (def_stmt)) { tree fndecl =3D gimple_call_fndecl (def_stmt); if (fndecl && DECL_BUILT_IN_CLASS (fndecl) =3D=3D BUILT_IN_NORMAL) { switch (DECL_FUNCTION_CODE (fndecl)) { CASE_FLT_FN (BUILT_IN_FABS): return true; default:; } } } } return false; } suitable for gimple-fold.c. Alternatively you can drop the call and add a FIXME comment. > + > + =A0/* Optimize pow(x,1./6.) =3D cbrt(sqrt(x)). =A0Don't do this optimiz= ation > + =A0 =A0 if we don't have a hardware sqrt insn. =A0*/ > + =A0dconst1_6 =3D dconst1_3; > + =A0SET_REAL_EXP (&dconst1_6, REAL_EXP (&dconst1_6) - 1); > + > + =A0if (flag_unsafe_math_optimizations > + =A0 =A0 =A0&& sqrtfn > + =A0 =A0 =A0&& cbrtfn > + =A0 =A0 =A0&& (tree_expr_nonnegative_p (arg0) || !HONOR_NANS (mode)) Likewise. Ok with these changes. Thanks, Richard. > + =A0 =A0 =A0&& optimize_function_for_speed_p (cfun) > + =A0 =A0 =A0&& hw_sqrt_exists > + =A0 =A0 =A0&& REAL_VALUES_EQUAL (c, dconst1_6)) > + =A0 =A0{ > + =A0 =A0 =A0/* sqrt(x) =A0*/ > + =A0 =A0 =A0sqrt_arg0 =3D build_and_insert_call (gsi, sqrtfn, arg0, &tar= get, loc); > + > + =A0 =A0 =A0/* cbrt(sqrt(x)) =A0*/ > + =A0 =A0 =A0return build_and_insert_call (gsi, cbrtfn, sqrt_arg0, &targe= t, loc); > + =A0 =A0} > + > =A0 return NULL_TREE; > =A0}