From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 39753 invoked by alias); 20 Jun 2019 12:17:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 39735 invoked by uid 89); 20 Jun 2019 12:17:02 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-19.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,MIME_CHARSET_FARAWAY,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=niters X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 20 Jun 2019 12:17:00 +0000 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x5KCEZ9s060664 for ; Thu, 20 Jun 2019 08:16:57 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 2t89pn8x7n-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Jun 2019 08:16:57 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 20 Jun 2019 13:16:55 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 20 Jun 2019 13:16:53 +0100 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x5KCGqqr59637810 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 20 Jun 2019 12:16:52 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6418B11C054; Thu, 20 Jun 2019 12:16:52 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A5ACE11C05E; Thu, 20 Jun 2019 12:16:46 +0000 (GMT) Received: from 192.168.10.101 (unknown [9.197.246.56]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 20 Jun 2019 12:16:46 +0000 (GMT) Subject: Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts From: "Kewen.Lin" To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com, bin.cheng@linux.alibaba.com, rguenther@suse.de, jakub@redhat.com, Jeff Law , Kugan Vivekanandarajah References: <1557803406-123657-1-git-send-email-linkw@linux.ibm.com> <2d897dc2-a01c-5005-6973-aad0c5930aa8@linux.ibm.com> <20190620090859.GU7313@gate.crashing.org> <32eb9bfd-c996-821d-730c-7c83002cf345@linux.ibm.com> Date: Thu, 20 Jun 2019 12:17:00 -0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.7.1 MIME-Version: 1.0 In-Reply-To: <32eb9bfd-c996-821d-730c-7c83002cf345@linux.ibm.com> Content-Type: multipart/mixed; boundary="------------5C77551EC20836528E9612B7" x-cbid: 19062012-0012-0000-0000-0000032ADDBC X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19062012-0013-0000-0000-000021640391 Message-Id: <0e839d7c-7848-a7fa-2a4d-12d8616e031c@linux.ibm.com> X-IsSubscribed: yes X-SW-Source: 2019-06/txt/msg01225.txt.bz2 This is a multi-part message in MIME format. --------------5C77551EC20836528E9612B7 Content-Type: text/plain; charset=gbk Content-Transfer-Encoding: 8bit Content-length: 1044 Hi, Sorry, the previous patch is incomplete. New one attached. Sorry for inconvenience. on 2019/6/20 ÏÂÎç8:08, Kewen.Lin wrote: > Hi Segher, > >> On Wed, Jun 19, 2019 at 07:47:34PM +0800, Kewen.Lin wrote: >>> +/* Return true if count register for branch is supported. */ >>> + >>> +static bool >>> +rs6000_have_count_reg_decr_p () >>> +{ >>> + return flag_branch_on_count_reg; >>> +} >> >> rs6000 unconditionally supports these instructions, not just when that >> flag is set. If you need to look at the flag, the *caller* of this new >> hook should, not every implementation of the hook. So just "return true" >> here? > > Good point! Updated it as hookpod. > >>> +/* For doloop use, if the algothrim selects some candidate which invalid for >> >> "algorithm", "which is invalid". > >>> + some cost like zero rather than original inifite cost. The point is to >> >> "infinite" >> > > Thanks for catching! I should run spelling check next time. :) > > New version attached with comments addressed. > > > Thanks, > Kewen > --------------5C77551EC20836528E9612B7 Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="ivopts.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ivopts.diff" Content-length: 10673 diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 6667cd0..e98aba9 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1912,6 +1912,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_PREDICT_DOLOOP_P #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p +#undef TARGET_HAVE_COUNT_REG_DECR_P +#define TARGET_HAVE_COUNT_REG_DECR_P true + #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c2aa4d0..5477294 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11618,6 +11618,14 @@ loops, and will help ivopts to make some decisions. The default version of this hook returns false. @end deftypefn +@deftypevr {Target Hook} bool TARGET_HAVE_COUNT_REG_DECR_P +Return true if the target supports hardware count register for decrement +and branch. This count register can't be used as general register since +moving to/from a general register from/to it is very expensive. +For the targets with this support, ivopts can take doloop use as zero cost. +The default value is false. +@end deftypevr + @deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int @var{&iterations}, const widest_int @var{&iterations_max}, unsigned int @var{loop_depth}, bool @var{entered_at_top}) Return true if it is possible to use low-overhead loops (@code{doloop_end} and @code{doloop_begin}) for a particular loop. @var{iterations} gives the diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b4d57b8..5f43b27 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7946,6 +7946,8 @@ to by @var{ce_info}. @hook TARGET_PREDICT_DOLOOP_P +@hook TARGET_HAVE_COUNT_REG_DECR_P + @hook TARGET_CAN_USE_DOLOOP_P @hook TARGET_INVALID_WITHIN_DOLOOP diff --git a/gcc/target.def b/gcc/target.def index 71b6972..8a64e5b 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4246,6 +4246,16 @@ The default version of this hook returns false.", bool, (struct loop *loop), default_predict_doloop_p) +DEFHOOKPOD +(have_count_reg_decr_p, + "Return true if the target supports hardware count register for decrement\n\ +and branch. This count register can't be used as general register since\n\ +moving to/from a general register from/to it is very expensive.\n\ +For the targets with this support, ivopts can take doloop use as zero cost.\n\ +The default value is false.", + bool, false) + + DEFHOOK (can_use_doloop_p, "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c index 7d5859b..71d7f67 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c @@ -17,6 +17,7 @@ f1 (char *p, uintptr_t i, uintptr_t n) while (i < n); } -/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" } } */ -/* { dg-final { scan-tree-dump-times "PHI vuses; }; @@ -612,6 +614,9 @@ struct ivopts_data /* Whether the loop body can only be exited via single exit. */ bool loop_single_exit_p; + + /* Whether the loop has doloop comparison use. */ + bool doloop_use_p; }; /* An assignment of iv candidates to uses. */ @@ -1528,6 +1533,7 @@ record_group (struct ivopts_data *data, enum use_type type) group->type = type; group->related_cands = BITMAP_ALLOC (NULL); group->vuses.create (1); + group->bind_cand = NULL; data->vgroups.safe_push (group); return group; @@ -3724,7 +3730,7 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data) Some RTL specific checks seems unable to be checked in gimple, if any new checks or easy checks _are_ missing here, please add them. */ -static bool ATTRIBUTE_UNUSED +static bool generic_predict_doloop_p (struct ivopts_data *data) { struct loop *loop = data->current_loop; @@ -5291,6 +5297,21 @@ determine_group_iv_cost_cond (struct ivopts_data *data, return !cost.infinite_cost_p (); } +/* Set no cost for pair between doloop iv use GROUP and iv cand CAND. */ + +static bool +adjust_group_iv_cost_for_doloop (struct ivopts_data *data, + struct iv_group *group, struct iv_cand *cand) +{ + struct cost_pair *cp = get_group_iv_cost (data, group, cand); + if (!cp) + set_group_iv_cost (data, group, cand, no_cost, NULL, NULL_TREE, ERROR_MARK, + NULL); + else + cp->cost = no_cost; + return true; +} + /* Determines cost of computing uses in GROUP with CAND. Returns false if USE cannot be represented with CAND. */ @@ -5308,7 +5329,12 @@ determine_group_iv_cost (struct ivopts_data *data, return determine_group_iv_cost_address (data, group, cand); case USE_COMPARE: - return determine_group_iv_cost_cond (data, group, cand); + { + bool finite_cost_p = determine_group_iv_cost_cond (data, group, cand); + if (data->doloop_use_p && group->bind_cand) + finite_cost_p = adjust_group_iv_cost_for_doloop (data, group, cand); + return finite_cost_p; + } default: gcc_unreachable (); @@ -6723,6 +6749,29 @@ find_optimal_iv_set_1 (struct ivopts_data *data, bool originalp) return set; } +/* For doloop use, if the algorithm selects some candidate which is invalid for + later rewrite, fix it up with bind_cand. */ + +static void +fixup_doloop_groups (struct ivopts_data *data, struct iv_ca *set) +{ + for (unsigned i = 0; i < data->vgroups.length (); i++) + { + struct iv_group *group = data->vgroups[i]; + if (group->bind_cand) + { + struct cost_pair *cp = iv_ca_cand_for_group (set, group); + gcc_assert (cp); + if (cp->cand != group->bind_cand && cp->value == NULL_TREE) + { + struct cost_pair *bind_cp + = get_group_iv_cost (data, group, group->bind_cand); + iv_ca_set_cp (data, set, group, bind_cp); + } + } + } +} + static struct iv_ca * find_optimal_iv_set (struct ivopts_data *data) { @@ -6760,6 +6809,9 @@ find_optimal_iv_set (struct ivopts_data *data) else if (origset) iv_ca_free (&origset); + if (data->doloop_use_p) + fixup_doloop_groups (data, set); + for (i = 0; i < data->vgroups.length (); i++) { struct iv_group *group = data->vgroups[i]; @@ -7568,6 +7620,72 @@ determine_scaling_factor (struct ivopts_data *data, basic_block *body) } } +/* Find doloop comparison use and set its related bind_cand. We adjust the + doloop use group cost against various IV cands, it's possible to assign + some cost like zero rather than original infinite cost. The point is to + give more chances to consider other IV cands instead of BIV. The cost + originally given on doloop use can affect optimal decision because it can + become dead and get eliminated but considered too much here. + + So it's possible that doloop use is assigned one invalid IV cand to rewrite. + In this case, we need bind_cand to fix up. Even if the bind_cand doesn't + exist in final iv_ca set, it won't affect optimal decision since it gets + eliminated along with doloop use. */ + +static bool +find_doloop_use_and_its_bind (struct ivopts_data *data) +{ + struct loop *loop = data->current_loop; + + for (unsigned i = 0; i < data->vgroups.length (); i++) + { + struct iv_group *group = data->vgroups[i]; + if (group->type == USE_COMPARE) + { + gcc_assert (group->vuses.length () == 1); + struct iv_use *use = group->vuses[0]; + gimple *stmt = use->stmt; + if (gimple_code (stmt) == GIMPLE_COND) + { + basic_block bb = gimple_bb (stmt); + edge true_edge, false_edge; + extract_true_false_edges_from_block (bb, &true_edge, &false_edge); + /* This comparison is used for loop latch. Require latch is empty + for now. */ + if ((loop->latch == true_edge->dest + || loop->latch == false_edge->dest) + && empty_block_p (loop->latch)) + { + for (unsigned j = 0; j < data->vcands.length (); j++) + { + if (bitmap_bit_p (group->related_cands, j)) + { + struct iv_cand *cand = data->vcands[j]; + tree op = use->iv->ssa_name; + if (op == cand->var_before || op == cand->var_after) + { + group->bind_cand = cand; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Doloop cmp iv use: "); + print_gimple_stmt (dump_file, stmt, + TDF_DETAILS); + dump_cand (dump_file, cand); + } + break; + } + } + } + if (group->bind_cand) + return true; + } + } + } + } + + return false; +} + /* Optimizes the LOOP. Returns true if anything changed. */ static bool @@ -7580,6 +7698,7 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop, basic_block *body; gcc_assert (!data->niters); + data->doloop_use_p = false; data->current_loop = loop; data->loop_loc = find_loop_location (loop).get_location_t (); data->speed = optimize_loop_for_speed_p (loop); @@ -7625,6 +7744,19 @@ tree_ssa_iv_optimize_loop (struct ivopts_data *data, struct loop *loop, /* Finds candidates for the induction variables (item 2). */ find_iv_candidates (data); + if (flag_branch_on_count_reg && targetm.have_count_reg_decr_p + && generic_predict_doloop_p (data)) + { + data->doloop_use_p = find_doloop_use_and_its_bind (data); + if (data->doloop_use_p && dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, + "Predict loop %d can perform doloop optimization later.\n", + loop->num); + flow_loop_dump (loop, dump_file, NULL, 1); + } + } + /* Calculates the costs (item 3, part 1). */ determine_iv_costs (data); determine_group_iv_costs (data); --------------5C77551EC20836528E9612B7--