From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 63442 invoked by alias); 21 May 2019 03:48:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 63404 invoked by uid 89); 21 May 2019 03:48:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.6 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=this! X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 21 May 2019 03:48:44 +0000 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x4L3l9m8110598 for ; Mon, 20 May 2019 23:48:42 -0400 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0a-001b2d01.pphosted.com with ESMTP id 2sm87fapbv-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 20 May 2019 23:48:41 -0400 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 21 May 2019 04:48:39 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 21 May 2019 04:48:36 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x4L3mZ7k60817568 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 21 May 2019 03:48:36 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D357011C054; Tue, 21 May 2019 03:48:35 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5210C11C050; Tue, 21 May 2019 03:48:33 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.25]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Tue, 21 May 2019 03:48:33 +0000 (GMT) Subject: Re: [PATCH v3 2/3] Add predict_doloop_p target hook To: Richard Biener Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, wschmidt@linux.ibm.com, bin.cheng@linux.alibaba.com, jakub@redhat.com References: <1558064130-111037-1-git-send-email-linkw@linux.ibm.com> From: "Kewen.Lin" Date: Tue, 21 May 2019 03:48:00 -0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit x-cbid: 19052103-0028-0000-0000-0000036FCDC3 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19052103-0029-0000-0000-0000242F75AC Message-Id: X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg01343.txt.bz2 on 2019/5/20 下午5:28, Richard Biener wrote: > On Thu, 16 May 2019, linkw@linux.ibm.com wrote: > >> From: Kewen Lin >> >> Hi, >> >> Previous version link: >> https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00654.html >> >> Comparing with the previous version, I moved the generic >> parts of rs6000 target hook to IVOPTs. But I still kept >> the target hook as previous which checks some target >> specific criteria like innermost, max iteration counts >> etc, and checks for invalid stmt in loop. The reason >> I decided not to move this part to generic is they are >> not generic enough. >> >> 1) For the part of target specific criteria, if we want >> to put it in generic, we can call the hook >> targetm.can_use_doloop_p, which requires us to >> prepare those four parameters, but each target only needs >> one or two parameters, it means we will evaluate some >> things which aren't required for that target. So I'd like >> to leave this part to target hook. >> 2) For the other part of target invalid stmt check, as the >> hook invalid_within_doloop grep data shows, no all targets >> need to check whether invalid instructions exist in doloop. >> If we scan all stmts as generic, it can waste time for those >> targets which don't need to check. Besides, the scope of >> the current check on SWITCH in rs6000 hook is wide, later >> if we want it more exact, we may need to check more stmts >> instead of single. To let target hook scan the BBs/stmts >> by itself is also more flexible. >> >> Bootstrapped and regression testing ongoing on powerpc64le. >> >> Any more comments? >> >> >> - >> +/* Check whether there are some instructions preventing doloop transformation >> + inside loop body, mainly for instructions which are possible to kill CTR. >> + >> + Return true if some invalid insn exits, otherwise return false. */ >> + >> +static bool >> +invalid_insn_for_doloop_p (struct loop *loop) >> +{ >> + basic_block *body = get_loop_body (loop); > > You leak memory here. I still think all of this function should > be in generic code for now. > Thanks a lot for catching this! I didn't notice I should free it by myself. >> + gimple_stmt_iterator gsi; >> + >> + for (unsigned i = 0; i < loop->num_nodes; i++) >> + for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi)) >> + { >> + gimple *stmt = gsi_stmt (gsi); >> + if (is_gimple_call (stmt) && !gimple_call_internal_p (stmt) >> + && !is_inexpensive_builtin (gimple_call_fndecl (stmt))) >> + { >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, >> + "predict doloop failure due to finding call.\n"); >> + return true; >> + } >> + if (computed_goto_p (stmt)) >> + { >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, "predict doloop failure due to" >> + "finding computed jump.\n"); >> + return true; >> + } >> + >> + /* TODO: Now this hook is expected to be called in ivopts, which is >> + before switchlower1/switchlower2. Checking for SWITCH at this point >> + will eliminate some good candidates. But since there are only a few >> + cases having _a_ switch statement in the innermost loop, it can be a >> + low priority enhancement. */ >> + if (gimple_code (stmt) == GIMPLE_SWITCH) >> + { >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, >> + "predict doloop failure due to finding switch.\n"); >> + return true; >> + } >> + } >> + >> + return false; >> +} >> + >> +/* Predict whether the given loop in gimple will be transformed in the RTL >> + doloop_optimize pass. This is for rs6000 target specific. */ >> + >> +static bool >> +rs6000_predict_doloop_p (struct loop *loop) >> +{ >> + gcc_assert (loop); >> + >> + /* On rs6000, targetm.can_use_doloop_p is actually >> + can_use_doloop_if_innermost. Just ensure it's innermost. */ > > So the point is that ppc only has a single counter register, right? > > So the better way would be to expose that via a target hook somehow. > Or simply restrict IVOPTs processing to innermost loops for now. > It's to answer whether the target supports doloop or not. I guess to check innermost is stricter and simpler. There is another target hook invalid_within_doloop to check counter register clobber. >> + if (loop->inner != NULL) >> + { >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, "predict doloop failure due to" >> + "no innermost.\n"); >> + return false; >> + } >> + >> + /* Similar to doloop_optimize targetm.invalid_within_doloop, check for >> + CALL, tablejump, computed_jump. */ >> + if (invalid_insn_for_doloop_p (loop)) >> + return false; >> + >> + return true; >> +} >> + >> struct gcc_target targetm = TARGET_INITIALIZER; >> >> #include "gt-rs6000.h" >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> index 8c8978b..e595b8b 100644 >> --- a/gcc/doc/tm.texi >> +++ b/gcc/doc/tm.texi >> @@ -11603,6 +11603,14 @@ function version at run-time for a given set of function versions. >> body must be generated. >> @end deftypefn >> >> +@deftypefn {Target Hook} bool TARGET_PREDICT_DOLOOP_P (struct loop *@var{loop}) >> +Return true if we can predict it is possible to use low-overhead loops >> +for a particular loop. The parameter @var{loop} is a pointer to the loop >> +which is going to be checked. This target hook is required only when the >> +target supports low-overhead loops, and will help some earlier middle-end >> +passes to make some decisions. >> +@end deftypefn >> + >> @deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int @var{&iterations}, const widest_int @var{&iterations_max}, unsigned int @var{loop_depth}, bool @var{entered_at_top}) >> Return true if it is possible to use low-overhead loops (@code{doloop_end} >> and @code{doloop_begin}) for a particular loop. @var{iterations} gives the >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in >> index fe1194e..e047734 100644 >> --- a/gcc/doc/tm.texi.in >> +++ b/gcc/doc/tm.texi.in >> @@ -7942,6 +7942,8 @@ to by @var{ce_info}. >> >> @hook TARGET_GENERATE_VERSION_DISPATCHER_BODY >> >> +@hook TARGET_PREDICT_DOLOOP_P >> + >> @hook TARGET_CAN_USE_DOLOOP_P >> >> @hook TARGET_INVALID_WITHIN_DOLOOP >> diff --git a/gcc/target.def b/gcc/target.def >> index 66cee07..0a80de3 100644 >> --- a/gcc/target.def >> +++ b/gcc/target.def >> @@ -4227,6 +4227,15 @@ DEFHOOK >> rtx, (machine_mode mode, rtx result, rtx val, rtx failval), >> default_speculation_safe_value) >> >> +DEFHOOK >> +(predict_doloop_p, >> + "Return true if we can predict it is possible to use low-overhead loops\n\ >> +for a particular loop. The parameter @var{loop} is a pointer to the loop\n\ >> +which is going to be checked. This target hook is required only when the\n\ >> +target supports low-overhead loops, and will help some earlier middle-end\n\ >> +passes to make some decisions.", >> + bool, (struct loop *loop), >> + default_predict_doloop_p) >> >> DEFHOOK >> (can_use_doloop_p, >> diff --git a/gcc/targhooks.c b/gcc/targhooks.c >> index 318f7e9..56ed4cc 100644 >> --- a/gcc/targhooks.c >> +++ b/gcc/targhooks.c >> @@ -643,6 +643,19 @@ default_has_ifunc_p (void) >> return HAVE_GNU_INDIRECT_FUNCTION; >> } >> >> +/* True if we can predict this loop is possible to be transformed to a >> + low-overhead loop, otherwise returns false. >> + >> + By default, false is returned, as this hook's applicability should be >> + verified for each target. Target maintainers should re-define the hook >> + if the target can take advantage of it. */ >> + >> +bool >> +default_predict_doloop_p (struct loop *loop ATTRIBUTE_UNUSED) >> +{ >> + return false; >> +} >> + >> /* NULL if INSN insn is valid within a low-overhead loop, otherwise returns >> an error message. >> >> diff --git a/gcc/targhooks.h b/gcc/targhooks.h >> index 5943627..70bfb17 100644 >> --- a/gcc/targhooks.h >> +++ b/gcc/targhooks.h >> @@ -85,6 +85,7 @@ extern bool default_fixed_point_supported_p (void); >> >> extern bool default_has_ifunc_p (void); >> >> +extern bool default_predict_doloop_p (struct loop *); >> extern const char * default_invalid_within_doloop (const rtx_insn *); >> >> extern tree default_builtin_vectorized_function (unsigned int, tree, tree); >> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c >> index a44b4cb..ed7f2a5 100644 >> --- a/gcc/tree-ssa-loop-ivopts.c >> +++ b/gcc/tree-ssa-loop-ivopts.c >> @@ -3776,6 +3776,111 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data) >> return NULL_TREE; >> } >> >> +/* Check whether number of iteration computation is too costly for doloop >> + transformation. It expands the gimple sequence to equivalent RTL insn >> + sequence, then evaluate the cost. >> + >> + Return true if it's costly, otherwise return false. */ >> + >> +static bool >> +costly_iter_for_doloop_p (struct loop *loop, tree niters) >> +{ > > I already said this but I expect IVOPTs to already cost initial > value computation of new IVs. I also doubt it's worth > going into RTL details here, estimate_num_insns should be > enough, like via using expression_expensive_p from SCEV analysis > which is already used by IVOPTs. > > You try to be perfect here but end up adding more > magic numbers (PARAM_MAX_ITERATIONS_COMPUTATION_COST). > Yes, I noticed there is something used for cost computation. But it can not be used for this niter cost directly. I tried to make similar thing like what we have in doloop_optimize. The magic number is also from doloop_optimize. I'm fine to use those existing ways, I don't have data for whether it's not important to be exact. Thanks Kewen >> + tree type = TREE_TYPE (niters); >> + unsigned cost = 0; >> + bool speed = optimize_loop_for_speed_p (loop); >> + int regno = LAST_VIRTUAL_REGISTER + 1; >> + walk_tree (&niters, prepare_decl_rtl, ®no, NULL); >> + start_sequence (); >> + expand_expr (niters, NULL_RTX, TYPE_MODE (type), EXPAND_NORMAL); >> + rtx_insn *seq = get_insns (); >> + end_sequence (); >> + >> + for (; seq; seq = NEXT_INSN (seq)) >> + { >> + if (!INSN_P (seq)) >> + continue; >> + rtx body = PATTERN (seq); >> + if (GET_CODE (body) == SET) >> + { >> + rtx set_val = XEXP (body, 1); >> + enum rtx_code code = GET_CODE (set_val); >> + enum rtx_class cls = GET_RTX_CLASS (code); >> + /* For now, we only consider these two RTX classes, to match what we >> + get in doloop_optimize, excluding operations like zero/sign extend. */ >> + if (cls == RTX_BIN_ARITH || cls == RTX_COMM_ARITH) >> + cost += set_src_cost (set_val, GET_MODE (set_val), speed); >> + } >> + } >> + unsigned max_cost >> + = COSTS_N_INSNS (PARAM_VALUE (PARAM_MAX_ITERATIONS_COMPUTATION_COST)); >> + if (cost > max_cost) >> + return true; >> + >> + return false; >> +} >> + >> +/* Predict whether the given loop will be transformed in the RTL >> + doloop_optimize pass. Attempt to duplicate as many doloop_optimize checks >> + as possible. This is only for target independent checks, see >> + targetm.predict_doloop_p for the target dependent ones. >> + >> + Some RTL specific checks seems unable to be checked in gimple, if any new >> + checks or easy checks _are_ missing here, please add them. */ >> + >> +static bool >> +generic_predict_doloop_p (struct ivopts_data *data) >> +{ >> + struct loop *loop = data->current_loop; >> + >> + /* Call target hook for target dependent checks. */ >> + if (!targetm.predict_doloop_p (loop)) >> + { >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, "predict doloop failure due to" >> + "target specific checks.\n"); >> + return false; >> + } >> + >> + /* Similar to doloop_optimize, check iteration description to know it's >> + suitable or not. */ >> + edge exit = loop_latch_edge (loop); >> + struct tree_niter_desc *niter_desc = niter_for_exit (data, exit); >> + if (niter_desc == NULL) >> + { >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, "predict doloop failure due to" >> + "unexpected niters.\n"); >> + return false; >> + } >> + >> + /* Similar to doloop_optimize, check whether iteration count too small >> + and not profitable. */ >> + HOST_WIDE_INT est_niter = get_estimated_loop_iterations_int (loop); >> + if (est_niter == -1) >> + est_niter = get_likely_max_loop_iterations_int (loop); >> + if (est_niter >= 0 && est_niter < 3) >> + { >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, >> + "predict doloop failure due to" >> + "too few iterations (%u).\n", >> + (unsigned int) est_niter); >> + return false; >> + } >> + >> + /* Similar to doloop_optimize, check whether number of iterations too costly >> + to compute. */ >> + if (costly_iter_for_doloop_p (loop, niter_desc->niter)) >> + { >> + if (dump_file && (dump_flags & TDF_DETAILS)) >> + fprintf (dump_file, "predict doloop failure due to" >> + "costly niter computation.\n"); >> + return false; >> + } >> + >> + return true; >> +} >> + >> /* Determines cost of the computation of EXPR. */ >> >> static unsigned >> >