From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14740 invoked by alias); 23 Aug 2019 09:43:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 14732 invoked by uid 89); 23 Aug 2019 09:43:47 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-io1-f66.google.com Received: from mail-io1-f66.google.com (HELO mail-io1-f66.google.com) (209.85.166.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 23 Aug 2019 09:43:45 +0000 Received: by mail-io1-f66.google.com with SMTP id b10so9614096ioj.2 for ; Fri, 23 Aug 2019 02:43:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=RMh9V7hPYS2y6WIuJUq0/uztIIQOqQtN+1YoV0M/RCg=; b=TuGBop57oGuzG3tH3FNeYr0gDRm+X8M9wnZk3Wa7XNQRi4+jc5yqc/51ehwAIKVR0Y zIQDyAQFKvN0IYC+2m0/cgOJP8Q+ARp0N+mJIPT6NOD3m36EONx/EnKwuGRwHbYApbUS m3Hw36UGEk4TcTs4tkMipuHUiOyPquYyKlA47X9k9IUMZX9zStqA8ffmxCe0rjaC+/V6 vTSQhtfEiT1HGQPugDfntrIYmVTIaHZf3UGGSxfbGH0c9v019jtdYIko4PlFlDe+Pnhj VXV4upeYC+QatvNpHGJjPWPSSo8i77/qGKp1n1Aap5/maJcNd41zEU46YdxQpoLUf1T9 fgNg== MIME-Version: 1.0 References: <1557803406-123657-1-git-send-email-linkw@linux.ibm.com> <2d897dc2-a01c-5005-6973-aad0c5930aa8@linux.ibm.com> <9d622cb7-2c1f-91bf-a61e-0239aa2ea8bf@linux.ibm.com> <6d0b4b11-1c33-fbd5-604d-293c01b1c285@linux.ibm.com> <43a3b3c1-9c41-82fa-a5db-1a7a1a5ceae1@linux.ibm.com> In-Reply-To: From: "Bin.Cheng" Date: Fri, 23 Aug 2019 10:43:00 -0000 Message-ID: Subject: Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts To: "Kewen.Lin" Cc: gcc-patches List , Segher Boessenkool , Bill Schmidt , Richard Guenther Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2019-08/txt/msg01623.txt.bz2 On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin wrote: > > Hi Bin > > on 2019/8/23 =E4=B8=8A=E5=8D=8810:19, Bin.Cheng wrote: > > On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin wrote: > >> > >> Hi Bin, > >> > >> on 2019/8/22 =E4=B8=8B=E5=8D=881:46, Bin.Cheng wrote: > >>> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin wrot= e: > >>>> > >>>> Hi Bin, > >>>> > >>>> Thanks for your time! > >>>> > >>>> on 2019/8/21 =E4=B8=8B=E5=8D=888:32, Bin.Cheng wrote: > >>>>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin wro= te: > >>>>>> > >>>>>> Hi! > >>>>>> > >>>>>> Comparing to the previous versions of implementation mainly based = on the > >>>>>> existing IV cands but zeroing the related group/use cost, this new= one is based > >>>>>> on Richard and Segher's suggestion introducing one doloop dedicate= d IV cand. > >>>>>> > >>>>>> Some key points are listed below: > >>>>>> 1) New field doloop_p in struct iv_cand to indicate doloop dedic= ated IV cand. > >>>>>> 2) Special name "doloop" assigned. > >>>>>> 3) Doloop IV cand with form (niter+1, +, -1) > >>>>>> 4) For doloop IV cand, no extra one cost like BIV, assign zero c= ost for step. > >>>>>> 5) Support may_be_zero (regressed PR is in this case), the base = of doloop IV > >>>>>> can be COND_EXPR, add handlings in cand_value_at and may_elim= inate_iv. > >>>>>> 6) Add more expr support in force_expr_to_var_cost for reasonabl= e cost > >>>>>> calculation on the IV base with may_be_zero (like COND_EXPR). > >>>>>> 7) Set zero cost when using doloop IV cand for doloop use. > >>>>>> 8) Add three hooks (should we merge _generic and _address?). > >>>>>> *) have_count_reg_decr_p, is to indicate the target has specia= l hardware > >>>>>> count register, we shouldn't consider the impact of doloop = IV when > >>>>>> calculating register pressures. > >>>>>> *) doloop_cost_for_generic, is the extra cost when using doloo= p IV cand for > >>>>>> generic type IV use. > >>>>>> *) doloop_cost_for_address, is the extra cost when using doloo= p IV cand for > >>>>>> address type IV use. > > >> The new patch addressing the comments is attached. > >> Could you please have a look again? Thanks in advance! > > Thanks for working on this. A bit more nit-pickings. > > > > - add_candidate_1 (data, base, step, important, > > - IP_NORMAL, use, NULL, orig_iv); > > - if (ip_end_pos (data->current_loop) > > + add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL= , doloop, > > + orig_iv); > > + if (!doloop && ip_end_pos (data->current_loop) > > Could you add some comments elaborating why ip_end_pos candidate > > shouldn't be added for doloop case? Because the increment position is > > wrong. > > > > Also if you make doloop the last default parameter of add_candidate_1, > > you can save more unnecessary changes to calls to add_candidate? > > > > - cost =3D get_computation_cost (data, use, cand, false, > > - &inv_vars, NULL, &inv_expr); > > + { > > + cost =3D get_computation_cost (data, use, cand, false, &inv_vars= , NULL, > > + &inv_expr); > > + if (cand->doloop_p) > > + cost +=3D targetm.doloop_cost_for_generic; > > + } > > This adjustment > > > > cost =3D get_computation_cost (data, use, cand, true, > > &inv_vars, &can_autoinc, &inv_expr); > > > > + if (cand->doloop_p) > > + cost +=3D targetm.doloop_cost_for_address; > > + > > and this adjustment can be moved into get_computation_cost where all > > cost adjustments are done. > > > > + /* For doloop candidate/use pair, adjust to zero cost. */ > > + if (group->doloop_p && cand->doloop_p) > > + cost =3D no_cost; > > Note above code handles comparing against zero case and decreases the > > cost by one (which prefers the same kind candidate as doloop one), > > it's very possible to have -1 cost for doloop cand here. how about > > just set to no_cost if it's positive? Your call. > > > > +/* For the targets which support doloop, to predict whether later RTL = doloop > > + transformation will perform on this loop, further detect the doloop= use and > > + mark the flag doloop_use_p if predicted. */ > > + > > +void > > +predict_and_process_doloop (struct ivopts_data *data) > > A better name here? Sorry I don't have another candidate in mind... > > > > + data->doloop_use_p =3D false; > > This can be moved to the beginning of above > > 'predict_and_process_doloop' function. > > > > Lastly, could you please add some brief description/comment about > > doloop handling as a subsection in the file head comment? > > > > Otherwise, the ivopt changes look good to me. > > > > Thanks, > > bin > > > > Thanks for your prompt reply! I've updated the code as your comments, > the updated version is attached. Looking forward to your review again. Sorry to bother. - return get_scaled_computation_cost_at (data, at, cost); + cost =3D get_scaled_computation_cost_at (data, at, cost); + /* For doloop IV cand, add on the extra cost. */ + cost +=3D cand->doloop_p ? targetm.doloop_cost_for_address : 0; + return cost; Here the cost is adjusted after scaling, while: + /* For doloop IV cand, add on the extra cost. */ + if (cand->doloop_p && use->type =3D=3D USE_NONLINEAR_EXPR) + cost +=3D targetm.doloop_cost_for_generic; + return get_scaled_computation_cost_at (data, at, cost); is adjusted before scaling. Please work consistently. + /* Simply use 1.5 * add cost for now, FIXME if there is some more accurate + cost evaluation way. */ + cost =3D comp_cost (1.5 * add_cost (speed, mode), 0); + break; Is 1.5 important for some test cases? Can we simply use 1 instead? Or at least use xxx * 2 / 3 in order to avoid floating number. Not sure if non-ivopts parts are already approved? If so, the patch is okay with above issues addressed. Thanks very much for your time! Thanks, bin > > > Thanks, > Kewen > > ----- > > gcc/ChangeLog > > 2019-08-23 Kewen Lin > > PR middle-end/80791 > * config/rs6000/rs6000.c (TARGET_HAVE_COUNT_REG_DECR_P): New macr= o. > (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. > (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. > * target.def (have_count_reg_decr_p): New hook. > (doloop_cost_for_generic): Likewise. > (doloop_cost_for_address): Likewise. > * doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): Likewise. > (TARGET_DOLOOP_COST_FOR_GENERIC): Likewise. > (TARGET_DOLOOP_COST_FOR_ADDRESS): Likewise. > * doc/tm.texi: Regenerate. > * tree-ssa-loop-ivopts.c (comp_cost::operator+=3D): Consider infi= nite cost > addend. > (record_group): Init doloop_p. > (add_candidate_1): Add optional argument doloop, change the handl= ings > accordingly. > (add_candidate): Likewise. > (generic_predict_doloop_p): Update attribute. > (force_expr_to_var_cost): Add costing for expressions COND_EXPR/L= T_EXPR/ > LE_EXPR/GT_EXPR/GE_EXPR/EQ_EXPR/NE_EXPR/UNORDERED_EXPR/ORDERED_EX= PR/ > UNLT_EXPR/UNLE_EXPR/UNGT_EXPR/UNGE_EXPR/UNEQ_EXPR/LTGT_EXPR/MAX_E= XPR/ > MIN_EXPR. > (get_computation_cost): Update for doloop IV cand extra cost. > (determine_group_iv_cost_cond): Update for doloop IV cand. > (determine_iv_cost): Likewise. > (ivopts_estimate_reg_pressure): Likewise. > (may_eliminate_iv): Update handlings for doloop IV cand. > (add_iv_candidate_for_doloop): New function. > (find_iv_candidates): Call function add_iv_candidate_for_doloop. > (iv_ca_set_no_cp): Update for doloop IV cand. > (iv_ca_set_cp): Likewise. > (iv_ca_dump): Dump register cost. > (find_doloop_use): New function. > (analyze_and_mark_doloop_use): Likewise. > (tree_ssa_iv_optimize_loop): Call function analyze_and_mark_doloo= p_use. > > gcc/testsuite/ChangeLog > > 2019-08-23 Kewen Lin > > PR middle-end/80791 > * gcc.dg/tree-ssa/ivopts-3.c: Adjust for doloop change. > * gcc.dg/tree-ssa/ivopts-lt.c: Likewise. > * gcc.dg/tree-ssa/pr32044.c: Likewise. > >