From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 104492 invoked by alias); 24 Oct 2016 07:28:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 104454 invoked by uid 89); 24 Oct 2016 07:28:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=relationship X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 24 Oct 2016 07:28:06 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7C0C0AAC7; Mon, 24 Oct 2016 07:28:03 +0000 (UTC) Date: Mon, 24 Oct 2016 07:28:00 -0000 From: Richard Biener To: Jeff Law cc: Prathamesh Kulkarni , gcc Patches , Kugan , Jim Wilson Subject: Re: RFC [1/3] divmod transform v2 In-Reply-To: <68bac5a0-bd0f-e2fe-5ffd-9a9618659a3b@redhat.com> Message-ID: References: <531bedc4-db23-4a4d-319d-c424ec48c251@redhat.com> <68bac5a0-bd0f-e2fe-5ffd-9a9618659a3b@redhat.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2016-10/txt/msg01858.txt.bz2 On Fri, 21 Oct 2016, Jeff Law wrote: > On 10/21/2016 04:34 AM, Prathamesh Kulkarni wrote: > > On 20 October 2016 at 15:02, Richard Biener wrote: > > > On Wed, 19 Oct 2016, Jeff Law wrote: > > > > > > > On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote: > > > > > Hi, > > > > > After approval from Bernd Schmidt, I committed the patch to remove > > > > > optab functions for > > > > > sdivmod_optab and udivmod_optab in optabs.def, which removes the block > > > > > for divmod patch. > > > > > > > > > > This patch is mostly the same as previous one, except it drops > > > > > targeting __udivmoddi4() because > > > > > it gave undefined reference link error for calling __udivmoddi4() on > > > > > aarch64-linux-gnu. > > > > > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4() > > > > > isn't needed for the target > > > > > (it was a bug in my patch that called __udivmoddi4() even though > > > > > aarch64 supported hardware div). > > > > > > > > > > However this makes me wonder if it's guaranteed that __udivmoddi4() > > > > > will be available for a target if it doesn't have hardware div and > > > > > divmod insn and doesn't have target-specific libfunc for > > > > > DImode divmod ? To be conservative, the attached patch doesn't > > > > > generate call to __udivmoddi4. > > > > > > > > > > Passes bootstrap+test on x86_64-unknown-linux. > > > > > Cross-tested on arm*-*-*, aarch64*-*-*. > > > > > Verified that there are no regressions with SPEC2006 on > > > > > x86_64-unknown-linux-gnu. > > > > > OK to commit ? > > > > > > > > > > Thanks, > > > > > Prathamesh > > > > > > > > > > > > > > > divmod-v2-3-main.txt > > > > > > > > > > > > > > > 2016-10-15 Prathamesh Kulkarni > > > > > Kugan Vivekanandarajah > > > > > Jim Wilson > > > > > > > > > > * target.def: New hook expand_divmod_libfunc. > > > > > * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC > > > > > * doc/tm.texi: Regenerate. > > > > > * internal-fn.def: Add new entry for DIVMOD ifn. > > > > > * internal-fn.c (expand_DIVMOD): New. > > > > > * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h, > > > > > targhooks.h. > > > > > (widen_mul_stats): Add new field divmod_calls_inserted. > > > > > (target_supports_divmod_p): New. > > > > > (divmod_candidate_p): Likewise. > > > > > (convert_to_divmod): Likewise. > > > > > (pass_optimize_widening_mul::execute): Call > > > > > calculate_dominance_info(), renumber_gimple_stmt_uids() at > > > > > beginning of function. Call convert_to_divmod() > > > > > and record stats for divmod. > > > > Starting with some high level design comments. If these conflict with > > > > comments from others, let me know and we'll work through the issues. > > > > > > > > I don't really like introducing code conditional on the target > > > > capabilities > > > > this early in the gimple optimization pipeline. > > > > > > It's basically done right before RTL expansion > > > (pass_optimize_widening_mul). > > > > > > > Would it be possible to always do the transformation to divmod in the > > > > gimple > > > > optimizers, regardless of the target capabilities. Then in the > > > > gimple->RTL > > > > expanders make a final decision about divmod insn, libcall, or using > > > > div/mod > > > > insns? > > > > > > The issue is that it hoists one or both the division or the modulo and > > > if we don't do the transform we'd want to undo that code motion. > > > > > > > That would move all the target dependencies out of the gimple optimizers > > > > and > > > > into the gimple->rtl expansion phase, which is the preferred place to > > > > start > > > > introducing this kind of target dependency. > > > > > > > > With that background, I'm going to focus more on the identification of > > > > divmod > > > > opportunities than the expansion bits. > > > > > > > > > > > > > > > > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > > > > > index a4a8e49..866c368 100644 > > > > > --- a/gcc/doc/tm.texi > > > > > +++ b/gcc/doc/tm.texi > > > > > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 > > > > > targets, > > > > > please refer to > > > > > the hook implementation for how different fusion types are supported. > > > > > @end deftypefn > > > > > > > > > > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx > > > > > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, > > > > > rtx > > > > > *@var{quot}, rtx *@var{rem}) > > > > > +Define this hook for enabling divmod transform if the port does not > > > > > have > > > > > +hardware divmod insn but defines target-specific divmod libfuncs. > > > > > +@end deftypefn > > > > > + > > > > > @node Sections > > > > > @section Dividing the Output into Sections (Texts, Data, @dots{}) > > > > > @c the above section title is WAY too long. maybe cut the part > > > > > between > > > > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > > > > > index 265f1be..c4c387b 100644 > > > > > --- a/gcc/doc/tm.texi.in > > > > > +++ b/gcc/doc/tm.texi.in > > > > > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first. > > > > > > > > > > @hook TARGET_SCHED_FUSION_PRIORITY > > > > > > > > > > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC > > > > > + > > > > > @node Sections > > > > > @section Dividing the Output into Sections (Texts, Data, @dots{}) > > > > > @c the above section title is WAY too long. maybe cut the part > > > > > between > > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > > > > > index 0b32d5f..42c6973 100644 > > > > > --- a/gcc/internal-fn.c > > > > > +++ b/gcc/internal-fn.c > > > > > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, > > > > > gcall > > > > > *call) > > > > > expand_ifn_atomic_compare_exchange (call); > > > > > } > > > > > > > > > > +/* Expand DIVMOD() using: > > > > In general, we do not use () when referring to objects in comments. > > > > > > > > > + a) optab handler for udivmod/sdivmod if it is available. > > > > > + b) If optab_handler doesn't exist, generate call to > > > > > + target-specific divmod libfunc. */ > > > > > + > > > > > +static void > > > > > +expand_DIVMOD (internal_fn, gcall *call_stmt) > > > > In general, don't use upper case for function names. Lower case please. > > > > But > > > > I believe we should be pushing all the expansion stuff into the > > > > gimple->rtl > > > > expansion code, so I haven't looked at this closely yet on the > > > > expectation > > > > it'll likely move. > > > > > > > > > > > > > > > > > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > > > > > index 0cea1a8..4f7bdb2 100644 > > > > > --- a/gcc/tree-ssa-math-opts.c > > > > > +++ b/gcc/tree-ssa-math-opts.c > > > > > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator > > > > > *gsi, > > > > > gimple *stmt, > > > > > return true; > > > > > } > > > > > > > > > > +/* Return true if target has support for divmod. */ > > > > > + > > > > > +static bool > > > > > +target_supports_divmod_p (optab divmod_optab, optab div_optab, > > > > > machine_mode > > > > > mode) > > > > So this is the kind of code I expect to avoid in the gimple optimizers. > > > > Instead the decision about whether to use a divmod insn, divmod libfunc > > > > or div > > > > + synthesized mod, or separate div and mod insns should happen at > > > > gimple->rtl > > > > expansion time. > > > > > > > > > > > > > + > > > > > +/* This function looks for: > > > > > + t1 = a TRUNC_DIV_EXPR b; > > > > > + t2 = a TRUNC_MOD_EXPR b; > > > > > + and transforms it to the following sequence: > > > > > + complex_tmp = DIVMOD (a, b); > > > > > + t1 = REALPART_EXPR(a); > > > > > + t2 = IMAGPART_EXPR(b); > > > > > + For conditions enabling the transform see divmod_candidate_p(). > > > > > + > > > > > + The pass works in two phases: > > > > > + 1) Walk through all immediate uses of stmt's operand and find a > > > > > + TRUNC_DIV_EXPR with matching operands and if such a stmt is > > > > > found add > > > > > + it to stmts vector. > > > > > + 2) Insert DIVMOD call before first div/mod stmt in top_bb (basic > > > > > block > > > > > that > > > > > + dominates other div/mod stmts with same operands) and update > > > > > entries > > > > > in > > > > > + stmts vector to use return value of DIMOVD (REALEXPR_PART for > > > > > div, > > > > > + IMAGPART_EXPR for mod). */ > > > > > + > > > > > +static bool > > > > > +convert_to_divmod (gassign *stmt) > > > > > +{ > > > > > + if (!divmod_candidate_p (stmt)) > > > > > + return false; > > > > > + > > > > > + tree op1 = gimple_assign_rhs1 (stmt); > > > > > + tree op2 = gimple_assign_rhs2 (stmt); > > > > > + > > > > > + imm_use_iterator use_iter; > > > > > + gimple *use_stmt; > > > > > + auto_vec stmts; > > > > > + > > > > > + gimple *top_stmt = stmt; > > > > > + basic_block top_bb = gimple_bb (stmt); > > > > > + > > > > > + /* Try to set top_stmt to "topmost" stmt > > > > > + with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as > > > > > stmt. > > > > > */ > > > > > + > > > > > + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1) > > > > > + { > > > > > + if (is_gimple_assign (use_stmt) > > > > > + && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR > > > > > + || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR) > > > > > + && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0) > > > > > + && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0)) > > > > So why check for TRUNC_MOD_EXPR here? ISTM that stmt is always going to > > > > be > > > > TRUNC_MOD_EXPR and thus you're only interested in looking for a matching > > > > TRUNC_DIV_EXPR statement. > > > > > > > > The only way I could see TRUNC_MOD_EXPR being useful here would be if > > > > there is > > > > a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some > > > > other > > > > optimizer hasn't done its job. Have you seen this to be useful in > > > > practice? > > > > > > Note that I've reviewed this parts already and we restructured things > > > in this way, requiring to look for TRUNC_MOD_EXPR to properly handle > > > finding a dominating trunc _or_ div and interatively doing so > > > correctly if we have more than one pair. > > > > > > > > + { > > > > > + if (stmt_can_throw_internal (use_stmt)) > > > > > + continue; > > > > > + > > > > > + basic_block bb = gimple_bb (use_stmt); > > > > > + > > > > > + if (bb == top_bb) > > > > > + { > > > > > + if (gimple_uid (use_stmt) < gimple_uid (top_stmt)) > > > > > + top_stmt = use_stmt; > > > > > + } > > > > > + else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb)) > > > > > + { > > > > > + top_bb = bb; > > > > > + top_stmt = use_stmt; > > > > > + } > > > > Do you want to break out of the immediate use loop once you've found a > > > > TRUNC_DIV_EXPR statement with matching operands? > > > > > > > > I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the > > > > TRUNC_MOD_EXPR, but that would mean that we've got redundant > > > > TRUNC_DIV_EXPR > > > > statements in the IL, which presumably doesn't happen if the rest of the > > > > optimizers are doing their job. > > > > > > > > > > > > > > > > > + } > > > > > + } > > > > > + > > > > > + if (top_stmt == stmt && stmt_can_throw_internal (top_stmt)) > > > > > + return false; > > > > Don't you just want > > > > if (stmt_can_throw_internal (stmt)) > > > > return false; > > > > > > > > Before the loop over the immediate uses, thus avoiding the loop if we've > > > > got a > > > > TRUNC_MOD_EXPR that we can't optimize? > > > > > > > > > > > > > > > > > + > > > > > + tree top_op1 = gimple_assign_rhs1 (top_stmt); > > > > > + tree top_op2 = gimple_assign_rhs2 (top_stmt); > > > > > + > > > > > + stmts.safe_push (top_stmt); > > > > > + bool div_seen = (gimple_assign_rhs_code (top_stmt) == > > > > > TRUNC_DIV_EXPR); > > > > > + > > > > > + /* Ensure that gimple_bb (use_stmt) is dominated by top_bb. */ > > > > ?!? It's not clear why you'd need to do another immediate use loop > > > > here. > > > > > > > > ISTM at this point you've found two statements, one a TRUNC_DIV the > > > > other a > > > > TRUNC_MOD with the same operands. Given that, ISTM you just need to > > > > check the > > > > dominance relationship of the blocks containing those statements. > > > > > > > > It really feels like you're going out of your way to handle cases where > > > > we > > > > have redundant expressions in the IL. Maybe I'm wrong. BUt if I'm > > > > right, I'd > > > > rather see that time spent optimizing away those redundant expressions > > > > in > > > > FRE/PRE/DOM. > > > > > > > > > + > > > > > + /* Create libcall to internal fn DIVMOD: > > > > > + divmod_tmp = DIVMOD (op1, op2). */ > > > > > + > > > > > + gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, > > > > > op2); > > > > > + tree res = make_temp_ssa_name ( > > > > > + build_complex_type (TREE_TYPE (op1)), > > > > > + call_stmt, "divmod_tmp"); > > > > Please review the the GCC formatting guidelines. This looks wrong. > > > > > > > > tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)), > > > > call_stmt, "divmod_tmp"); > > > > > > > > Seems like the right formatting to me. > > > > > > > > > + > > > > > + /* Update all statements in stmts. > > > > > + if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = > > > > > REALPART_EXPR > > > > > + if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = > > > > > IMAGPART_EXPR. */ > > > > I'd just emit a copy from RES to the appropriate lhs operand just after > > > > the > > > > divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR > > > > statements. > > > > > > That sounds like a good idea. > > Um sorry, not sure if I understood this part. > > For eg: > > t1 = x / y; > > t2 = x % y; > > > > do we want to transform it to: > > complex_tmp = DIVMOD (x, y); > > div_tmp = REALPART_EXPR > > mod_tmp = IMAGPART_EXPR > > complex_tmp = DIVMOD (x, y) > t1 = REALPART_EXPR (complex_tmp) > t2 = IMAGPART_EXPR (compex_tmp) > > Then remove the > t1 = x/y > t2 = x%y > > statements OTOH implementation-wise that's more complicated. Richard.