From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 47020 invoked by alias); 24 Aug 2019 14:50:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 47012 invoked by uid 89); 24 Aug 2019 14:50:55 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-21.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=deserve, H*i:sk:CAHFci2, H*i:sk:cL7Cc02, can_autoinc X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 24 Aug 2019 14:50:53 +0000 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x7OElOfO105270 for ; Sat, 24 Aug 2019 10:50:51 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 2uk0j8gubh-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sat, 24 Aug 2019 10:50:51 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 24 Aug 2019 15:50:49 +0100 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Sat, 24 Aug 2019 15:50:48 +0100 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x7OEol2x36045052 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 24 Aug 2019 14:50:47 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EC462A4057; Sat, 24 Aug 2019 14:50:46 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8981DA4055; Sat, 24 Aug 2019 14:50:44 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.197.239.96]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Sat, 24 Aug 2019 14:50:44 +0000 (GMT) Subject: Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts To: "Bin.Cheng" Cc: gcc-patches List , Segher Boessenkool , Bill Schmidt , Richard Guenther References: <1557803406-123657-1-git-send-email-linkw@linux.ibm.com> <2d897dc2-a01c-5005-6973-aad0c5930aa8@linux.ibm.com> <9d622cb7-2c1f-91bf-a61e-0239aa2ea8bf@linux.ibm.com> <6d0b4b11-1c33-fbd5-604d-293c01b1c285@linux.ibm.com> <43a3b3c1-9c41-82fa-a5db-1a7a1a5ceae1@linux.ibm.com> From: "Kewen.Lin" Date: Sat, 24 Aug 2019 22:43:00 -0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit x-cbid: 19082414-0012-0000-0000-000003428830 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19082414-0013-0000-0000-0000217CB923 Message-Id: X-IsSubscribed: yes X-SW-Source: 2019-08/txt/msg01710.txt.bz2 Hi Bin, on 2019/8/23 下午5:43, Bin.Cheng wrote: > On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin wrote: >> >> Hi Bin >> >> on 2019/8/23 上午10:19, Bin.Cheng wrote: >>> On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin wrote: >>>> >>>> Hi Bin, >>>> >>>> on 2019/8/22 下午1:46, Bin.Cheng wrote: >>>>> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin wrote: >>>>>> >>>>>> Hi Bin, >>>>>> >>>>>> Thanks for your time! >>>>>> >>>>>> on 2019/8/21 下午8:32, Bin.Cheng wrote: >>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin wrote: >>>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> Comparing to the previous versions of implementation mainly based on the >>>>>>>> existing IV cands but zeroing the related group/use cost, this new one is based >>>>>>>> on Richard and Segher's suggestion introducing one doloop dedicated IV cand. >>>>>>>> >>>>>>>> Some key points are listed below: >>>>>>>> 1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV cand. >>>>>>>> 2) Special name "doloop" assigned. >>>>>>>> 3) Doloop IV cand with form (niter+1, +, -1) >>>>>>>> 4) For doloop IV cand, no extra one cost like BIV, assign zero cost for step. >>>>>>>> 5) Support may_be_zero (regressed PR is in this case), the base of doloop IV >>>>>>>> can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv. >>>>>>>> 6) Add more expr support in force_expr_to_var_cost for reasonable cost >>>>>>>> calculation on the IV base with may_be_zero (like COND_EXPR). >>>>>>>> 7) Set zero cost when using doloop IV cand for doloop use. >>>>>>>> 8) Add three hooks (should we merge _generic and _address?). >>>>>>>> *) have_count_reg_decr_p, is to indicate the target has special hardware >>>>>>>> count register, we shouldn't consider the impact of doloop IV when >>>>>>>> calculating register pressures. >>>>>>>> *) doloop_cost_for_generic, is the extra cost when using doloop IV cand for >>>>>>>> generic type IV use. >>>>>>>> *) doloop_cost_for_address, is the extra cost when using doloop IV cand for >>>>>>>> address type IV use. >> >>>> The new patch addressing the comments is attached. >>>> Could you please have a look again? Thanks in advance! >>> Thanks for working on this. A bit more nit-pickings. >>> >>> - add_candidate_1 (data, base, step, important, >>> - IP_NORMAL, use, NULL, orig_iv); >>> - if (ip_end_pos (data->current_loop) >>> + add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, doloop, >>> + orig_iv); >>> + if (!doloop && ip_end_pos (data->current_loop) >>> Could you add some comments elaborating why ip_end_pos candidate >>> shouldn't be added for doloop case? Because the increment position is >>> wrong. >>> >>> Also if you make doloop the last default parameter of add_candidate_1, >>> you can save more unnecessary changes to calls to add_candidate? >>> >>> - cost = get_computation_cost (data, use, cand, false, >>> - &inv_vars, NULL, &inv_expr); >>> + { >>> + cost = get_computation_cost (data, use, cand, false, &inv_vars, NULL, >>> + &inv_expr); >>> + if (cand->doloop_p) >>> + cost += targetm.doloop_cost_for_generic; >>> + } >>> This adjustment >>> >>> cost = get_computation_cost (data, use, cand, true, >>> &inv_vars, &can_autoinc, &inv_expr); >>> >>> + if (cand->doloop_p) >>> + cost += targetm.doloop_cost_for_address; >>> + >>> and this adjustment can be moved into get_computation_cost where all >>> cost adjustments are done. >>> >>> + /* For doloop candidate/use pair, adjust to zero cost. */ >>> + if (group->doloop_p && cand->doloop_p) >>> + cost = no_cost; >>> Note above code handles comparing against zero case and decreases the >>> cost by one (which prefers the same kind candidate as doloop one), >>> it's very possible to have -1 cost for doloop cand here. how about >>> just set to no_cost if it's positive? Your call. >>> >>> +/* For the targets which support doloop, to predict whether later RTL doloop >>> + transformation will perform on this loop, further detect the doloop use and >>> + mark the flag doloop_use_p if predicted. */ >>> + >>> +void >>> +predict_and_process_doloop (struct ivopts_data *data) >>> A better name here? Sorry I don't have another candidate in mind... >>> >>> + data->doloop_use_p = false; >>> This can be moved to the beginning of above >>> 'predict_and_process_doloop' function. >>> >>> Lastly, could you please add some brief description/comment about >>> doloop handling as a subsection in the file head comment? >>> >>> Otherwise, the ivopt changes look good to me. >>> >>> Thanks, >>> bin >>> >> >> Thanks for your prompt reply! I've updated the code as your comments, >> the updated version is attached. Looking forward to your review again. > > Sorry to bother. > > - return get_scaled_computation_cost_at (data, at, cost); > + cost = get_scaled_computation_cost_at (data, at, cost); > + /* For doloop IV cand, add on the extra cost. */ > + cost += cand->doloop_p ? targetm.doloop_cost_for_address : 0; > + return cost; > Here the cost is adjusted after scaling, while: > > + /* For doloop IV cand, add on the extra cost. */ > + if (cand->doloop_p && use->type == USE_NONLINEAR_EXPR) > + cost += targetm.doloop_cost_for_generic; > + > return get_scaled_computation_cost_at (data, at, cost); > is adjusted before scaling. Please work consistently. > Thanks for catching! > + /* Simply use 1.5 * add cost for now, FIXME if there is some > more accurate > + cost evaluation way. */ > + cost = comp_cost (1.5 * add_cost (speed, mode), 0); > + break; > Is 1.5 important for some test cases? Can we simply use 1 instead? > Or at least use xxx * 2 / 3 in order to avoid floating number. > No, I was thinking they may deserve a bit more than the add since the cost was a high value before this patch, two was too much for some cases in my initial prototype, then I just chose 1.5. I think it should be fine to use 1 here. The appended diff: diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 88e7890..31ab858 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -4353,9 +4353,9 @@ force_expr_to_var_cost (tree expr, bool speed) case LTGT_EXPR: case MAX_EXPR: case MIN_EXPR: - /* Simply use 1.5 * add cost for now, FIXME if there is some more accurate - cost evaluation way. */ - cost = comp_cost (1.5 * add_cost (speed, mode), 0); + /* Simply use add cost for now, FIXME if there is some more accurate cost + evaluation way. */ + cost = comp_cost (add_cost (speed, mode), 0); break; default: @@ -4786,11 +4786,13 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use, if (comp_inv && !integer_zerop (comp_inv)) cost += add_cost (speed, TYPE_MODE (utype)); + cost = get_scaled_computation_cost_at (data, at, cost); + /* For doloop IV cand, add on the extra cost. */ if (cand->doloop_p && use->type == USE_NONLINEAR_EXPR) cost += targetm.doloop_cost_for_generic; - return get_scaled_computation_cost_at (data, at, cost); + return cost; } > Not sure if non-ivopts parts are already approved? If so, the patch > is okay with above issues addressed. > > Thanks very much for your time! > Thanks a lot for your time and helpful comments as well!!! Thanks, Kewen