From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-507665-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 47020 invoked by alias); 24 Aug 2019 14:50:55 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 47012 invoked by uid 89); 24 Aug 2019 14:50:55 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-21.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=deserve, H*i:sk:CAHFci2, H*i:sk:cL7Cc02, can_autoinc
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 24 Aug 2019 14:50:53 +0000
Received: from pps.filterd (m0098409.ppops.net [127.0.0.1])	by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x7OElOfO105270	for <gcc-patches@gcc.gnu.org>; Sat, 24 Aug 2019 10:50:51 -0400
Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99])	by mx0a-001b2d01.pphosted.com with ESMTP id 2uk0j8gubh-1	(version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT)	for <gcc-patches@gcc.gnu.org>; Sat, 24 Aug 2019 10:50:51 -0400
Received: from localhost	by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <linkw@linux.ibm.com>;	Sat, 24 Aug 2019 15:50:49 +0100
Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195)	by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	(version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256)	Sat, 24 Aug 2019 15:50:48 +0100
Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59])	by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x7OEol2x36045052	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);	Sat, 24 Aug 2019 14:50:47 GMT
Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id EC462A4057;	Sat, 24 Aug 2019 14:50:46 +0000 (GMT)
Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id 8981DA4055;	Sat, 24 Aug 2019 14:50:44 +0000 (GMT)
Received: from KewenLins-MacBook-Pro.local (unknown [9.197.239.96])	by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP;	Sat, 24 Aug 2019 14:50:44 +0000 (GMT)
Subject: Re: [PATCH v6 3/3] PR80791 Consider doloop cmp use in ivopts
To: "Bin.Cheng" <amker.cheng@gmail.com>
Cc: gcc-patches List <gcc-patches@gcc.gnu.org>,        Segher Boessenkool <segher@kernel.crashing.org>,        Bill Schmidt <wschmidt@linux.ibm.com>,        Richard Guenther <rguenther@suse.de>
References: <1557803406-123657-1-git-send-email-linkw@linux.ibm.com> <2d897dc2-a01c-5005-6973-aad0c5930aa8@linux.ibm.com> <CAHFci29x5MNq3SAVQ5WkMVQEN_ekBhHU6GA9rAfjveD75NXydQ@mail.gmail.com> <9d622cb7-2c1f-91bf-a61e-0239aa2ea8bf@linux.ibm.com> <f48de819-7ac9-17d7-ef9c-eea3c781a482@linux.ibm.com> <6d0b4b11-1c33-fbd5-604d-293c01b1c285@linux.ibm.com> <CAHFci2-rKPp1KBgs_VN=kob8NMcb5Qh8GW+BcV_NfDYRJ6UujA@mail.gmail.com> <cbeed505-4998-033d-c212-d909d13a28dc@linux.ibm.com> <CAHFci28q5sK2KYEyNy7MgtpJ8f42itH4kKOFvnjHr1Pb8a_8QQ@mail.gmail.com> <43a3b3c1-9c41-82fa-a5db-1a7a1a5ceae1@linux.ibm.com> <CAHFci2-hNV-P5+gx7H+iQUKcE+y=MTncu7+YJV=BsZCWypy51Q@mail.gmail.com> <ca838609-51dc-37b0-f2d4-f6299aa99e8d@linux.ibm.com> <CAHFci2_xTa20gDMmf3omvDxki=cL7Cc02iFX_8P3bMGoiFSisQ@mail.gmail.com>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Date: Sat, 24 Aug 2019 22:43:00 -0000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <CAHFci2_xTa20gDMmf3omvDxki=cL7Cc02iFX_8P3bMGoiFSisQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
x-cbid: 19082414-0012-0000-0000-000003428830
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 19082414-0013-0000-0000-0000217CB923
Message-Id: <fdb52beb-359e-4d8e-2426-20d64b3da4d0@linux.ibm.com>
X-IsSubscribed: yes
X-SW-Source: 2019-08/txt/msg01710.txt.bz2

Hi Bin,

on 2019/8/23 ä¸å5:43, Bin.Cheng wrote:
> On Fri, Aug 23, 2019 at 4:27 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Bin
>>
>> on 2019/8/23 ä¸å10:19, Bin.Cheng wrote:
>>> On Thu, Aug 22, 2019 at 3:09 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi Bin,
>>>>
>>>> on 2019/8/22 ä¸å1:46, Bin.Cheng wrote:
>>>>> On Thu, Aug 22, 2019 at 11:18 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>
>>>>>> Hi Bin,
>>>>>>
>>>>>> Thanks for your time!
>>>>>>
>>>>>> on 2019/8/21 ä¸å8:32, Bin.Cheng wrote:
>>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> Comparing to the previous versions of implementation mainly based on the
>>>>>>>> existing IV cands but zeroing the related group/use cost, this new one is based
>>>>>>>> on Richard and Segher's suggestion introducing one doloop dedicated IV cand.
>>>>>>>>
>>>>>>>> Some key points are listed below:
>>>>>>>>   1) New field doloop_p in struct iv_cand to indicate doloop dedicated IV cand.
>>>>>>>>   2) Special name "doloop" assigned.
>>>>>>>>   3) Doloop IV cand with form (niter+1, +, -1)
>>>>>>>>   4) For doloop IV cand, no extra one cost like BIV, assign zero cost for step.
>>>>>>>>   5) Support may_be_zero (regressed PR is in this case), the base of doloop IV
>>>>>>>>      can be COND_EXPR, add handlings in cand_value_at and may_eliminate_iv.
>>>>>>>>   6) Add more expr support in force_expr_to_var_cost for reasonable cost
>>>>>>>>      calculation on the IV base with may_be_zero (like COND_EXPR).
>>>>>>>>   7) Set zero cost when using doloop IV cand for doloop use.
>>>>>>>>   8) Add three hooks (should we merge _generic and _address?).
>>>>>>>>     *) have_count_reg_decr_p, is to indicate the target has special hardware
>>>>>>>>        count register, we shouldn't consider the impact of doloop IV when
>>>>>>>>        calculating register pressures.
>>>>>>>>     *) doloop_cost_for_generic, is the extra cost when using doloop IV cand for
>>>>>>>>        generic type IV use.
>>>>>>>>     *) doloop_cost_for_address, is the extra cost when using doloop IV cand for
>>>>>>>>        address type IV use.
>>
>>>> The new patch addressing the comments is attached.
>>>> Could you please have a look again?  Thanks in advance!
>>> Thanks for working on this.  A bit more nit-pickings.
>>>
>>> -    add_candidate_1 (data, base, step, important,
>>> -                    IP_NORMAL, use, NULL, orig_iv);
>>> -  if (ip_end_pos (data->current_loop)
>>> +    add_candidate_1 (data, base, step, important, IP_NORMAL, use, NULL, doloop,
>>> +                    orig_iv);
>>> +  if (!doloop && ip_end_pos (data->current_loop)
>>> Could you add some comments elaborating why ip_end_pos candidate
>>> shouldn't be added for doloop case?  Because the increment position is
>>> wrong.
>>>
>>> Also if you make doloop the last default parameter of add_candidate_1,
>>> you can save more unnecessary changes to calls to add_candidate?
>>>
>>> -    cost = get_computation_cost (data, use, cand, false,
>>> -                                &inv_vars, NULL, &inv_expr);
>>> +    {
>>> +      cost = get_computation_cost (data, use, cand, false, &inv_vars, NULL,
>>> +                                  &inv_expr);
>>> +      if (cand->doloop_p)
>>> +       cost += targetm.doloop_cost_for_generic;
>>> +    }
>>> This adjustment
>>>
>>>    cost = get_computation_cost (data, use, cand, true,
>>>                                &inv_vars, &can_autoinc, &inv_expr);
>>>
>>> +  if (cand->doloop_p)
>>> +    cost += targetm.doloop_cost_for_address;
>>> +
>>> and this adjustment can be moved into get_computation_cost where all
>>> cost adjustments are done.
>>>
>>> +      /* For doloop candidate/use pair, adjust to zero cost.  */
>>> +      if (group->doloop_p && cand->doloop_p)
>>> +       cost = no_cost;
>>> Note above code handles comparing against zero case and decreases the
>>> cost by one (which prefers the same kind candidate as doloop one),
>>> it's very possible to have -1 cost for doloop cand here.  how about
>>> just set to no_cost if it's positive?  Your call.
>>>
>>> +/* For the targets which support doloop, to predict whether later RTL doloop
>>> +   transformation will perform on this loop, further detect the doloop use and
>>> +   mark the flag doloop_use_p if predicted.  */
>>> +
>>> +void
>>> +predict_and_process_doloop (struct ivopts_data *data)
>>> A better name here? Sorry I don't have another candidate in mind...
>>>
>>> +  data->doloop_use_p = false;
>>> This can be moved to the beginning of above
>>> 'predict_and_process_doloop' function.
>>>
>>> Lastly, could you please add some brief description/comment about
>>> doloop handling as a subsection in the file head comment?
>>>
>>> Otherwise, the ivopt changes look good to me.
>>>
>>> Thanks,
>>> bin
>>>
>>
>> Thanks for your prompt reply!  I've updated the code as your comments,
>> the updated version is attached.  Looking forward to your review again.
> 
> Sorry to bother.
> 
> -      return get_scaled_computation_cost_at (data, at, cost);
> +      cost = get_scaled_computation_cost_at (data, at, cost);
> +      /* For doloop IV cand, add on the extra cost.  */
> +      cost += cand->doloop_p ? targetm.doloop_cost_for_address : 0;
> +      return cost;
> Here the cost is adjusted after scaling, while:
> 
> +  /* For doloop IV cand, add on the extra cost.  */
> +  if (cand->doloop_p && use->type == USE_NONLINEAR_EXPR)
> +    cost += targetm.doloop_cost_for_generic;
> +
>    return get_scaled_computation_cost_at (data, at, cost);
> is adjusted before scaling.  Please work consistently.
> 

Thanks for catching!

> +      /* Simply use 1.5 * add cost for now, FIXME if there is some
> more accurate
> +        cost evaluation way.  */
> +      cost = comp_cost (1.5 * add_cost (speed, mode), 0);
> +      break;
> Is 1.5 important for some test cases?  Can we simply use 1 instead?
> Or at least use xxx * 2 / 3 in order to avoid floating number.
> 

No, I was thinking they may deserve a bit more than the add since
the cost was a high value before this patch, two was too much for some
cases in my initial prototype, then I just chose 1.5.
I think it should be fine to use 1 here.


The appended diff:

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 88e7890..31ab858 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4353,9 +4353,9 @@ force_expr_to_var_cost (tree expr, bool speed)
     case LTGT_EXPR:
     case MAX_EXPR:
     case MIN_EXPR:
-      /* Simply use 1.5 * add cost for now, FIXME if there is some more accurate
-        cost evaluation way.  */
-      cost = comp_cost (1.5 * add_cost (speed, mode), 0);
+      /* Simply use add cost for now, FIXME if there is some more accurate cost
+        evaluation way.  */
+      cost = comp_cost (add_cost (speed, mode), 0);
       break;

     default:
@@ -4786,11 +4786,13 @@ get_computation_cost (struct ivopts_data *data, struct iv_use *use,
   if (comp_inv && !integer_zerop (comp_inv))
     cost += add_cost (speed, TYPE_MODE (utype));

+  cost = get_scaled_computation_cost_at (data, at, cost);
+
   /* For doloop IV cand, add on the extra cost.  */
   if (cand->doloop_p && use->type == USE_NONLINEAR_EXPR)
     cost += targetm.doloop_cost_for_generic;

-  return get_scaled_computation_cost_at (data, at, cost);
+  return cost;
 }


> Not sure if non-ivopts parts are already approved?  If so, the patch
> is okay with above issues addressed.
> 
> Thanks very much for your time!
> 

Thanks a lot for your time and helpful comments as well!!!


Thanks,
Kewen