From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-505402-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 27674 invoked by alias); 22 Jul 2019 03:55:17 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 27666 invoked by uid 89); 22 Jul 2019 03:55:17 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-6.7 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy=again!
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 22 Jul 2019 03:55:15 +0000
Received: from pps.filterd (m0098414.ppops.net [127.0.0.1])	by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6M3qLsl012982	for <gcc-patches@gcc.gnu.org>; Sun, 21 Jul 2019 23:55:13 -0400
Received: from e06smtp04.uk.ibm.com (e06smtp04.uk.ibm.com [195.75.94.100])	by mx0b-001b2d01.pphosted.com with ESMTP id 2tvvc001c4-1	(version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT)	for <gcc-patches@gcc.gnu.org>; Sun, 21 Jul 2019 23:55:12 -0400
Received: from localhost	by e06smtp04.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <linkw@linux.ibm.com>;	Mon, 22 Jul 2019 04:55:11 +0100
Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196)	by e06smtp04.uk.ibm.com (192.168.101.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	(version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256)	Mon, 22 Jul 2019 04:55:09 +0100
Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160])	by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x6M3t8dn45088832	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);	Mon, 22 Jul 2019 03:55:08 GMT
Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id 363CFA4062;	Mon, 22 Jul 2019 03:55:08 +0000 (GMT)
Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1])	by IMSVA (Postfix) with ESMTP id E1E91A4054;	Mon, 22 Jul 2019 03:55:05 +0000 (GMT)
Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.146.96])	by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP;	Mon, 22 Jul 2019 03:55:05 +0000 (GMT)
Subject: Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts
To: "Bin.Cheng" <amker.cheng@gmail.com>
Cc: gcc-patches List <gcc-patches@gcc.gnu.org>, segher@kernel.crashing.org,        Bill Schmidt <wschmidt@linux.ibm.com>,        Richard Guenther <rguenther@suse.de>
References: <1557803406-123657-1-git-send-email-linkw@linux.ibm.com> <2d897dc2-a01c-5005-6973-aad0c5930aa8@linux.ibm.com> <CAHFci29x5MNq3SAVQ5WkMVQEN_ekBhHU6GA9rAfjveD75NXydQ@mail.gmail.com>
From: "Kewen.Lin" <linkw@linux.ibm.com>
Date: Mon, 22 Jul 2019 05:42:00 -0000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <CAHFci29x5MNq3SAVQ5WkMVQEN_ekBhHU6GA9rAfjveD75NXydQ@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
x-cbid: 19072203-0016-0000-0000-00000294FB34
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 19072203-0017-0000-0000-000032F2E2FA
Message-Id: <9d622cb7-2c1f-91bf-a61e-0239aa2ea8bf@linux.ibm.com>
X-IsSubscribed: yes
X-SW-Source: 2019-07/txt/msg01370.txt.bz2

Hi Bin,

on 2019/7/21 ä¸å11:07, Bin.Cheng wrote:
> On Wed, Jun 19, 2019 at 7:47 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi all,
>>
>> This is the following patch after https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00910.html
>>
>> Main steps:
>>   1) Identify the doloop cmp type iv use and record its bind_cand (explain it later).
>>   2) Set zero cost for pairs between this use and any iv cand.
>>   3) IV cand set selecting algorithm runs as usual.
>>   4) Fix up the selected iv cand for doloop use if need.
>>
>> It only focuses on the targets like Power which has specific count register.
>> target hook have_count_reg_decr_p is proposed for it.
>>
>> Some notes:
>>
>> *) Why we need zero cost?  How about just decrease the cost for the pair
>>    between doloop use and its original iv cand?  How about just decrease
>>    the cost for the pair between doloop use and one selected iv cand?
>>
>>    Since some target supports hardware count register for decrement and
>>    branch, it doesn't need the general instruction sequence for decr, cmp and
>>    branch in general registers.  The cost of moving count register to GPR
>>    is generally high, so it's standalone and can't be shared with other iv
>>    uses.  It means IVOPTs can take doloop use as invisible (zero cost).
>>
>>    Let's take a look at PR80791 for example.
>>
>>                             original biv (cand 4)  use derived iv (cand 6)
>>      generic use:                   4                  0
>>      comp use (doloop use):         0                 infinite
>>
>>     For iv cost, original biv has cost 4 while use derived iv has cost 5.
>>     When IVOPTs considers doloop use, the optimal cost is 8 (original biv
>>     iv cost 4 + use cost 4).  Unfortunately it's not actually optimal, since
>>     later doloop transformation updates loop closing by count register,
>>     original biv (and its update) won't be needed in loop closing any more.
>>     The generic use become the only use for original biv.  That means, if we
>>     know the doloop will perform later, we shouldn't consider the doloop use
>>     when determining IV set.  If we don't consider it, the algorithm will
>>     choose iv cand 6 with total cost 5 (iv cost 5 + use cost 0).
>>
>>     From the above, we can see that to decrease the cost for the pair between
>>     doloop use and original biv doesn't work.  Meanwhile it's hard to predict
>>     one good iv cand in final optimal set here and pre-update the cost
>>     between it and doloop use.  The analysis would be heavy and imperfect.
>>
>> *) Why we need bind_cand?
>>
>>     As above, we assign zero cost for pairs between doloop use and each iv
>>     cand.  It's possible that doloop use gets assigned one iv cand which is
>>     invalid to be used during later rewrite.  Then we have to fix it up with iv
>>     cand originally used for it.  It's fine whatever this iv cand exists in
>>     final iv cand set or not, even if it's not in the set, it will be
>>     eliminated in doloop transformation.
>>
>> By the way, I was thinking whether we can replace the hook have_count_reg_decr_p
>> with flag_branch_on_count_reg.  As the description of the "no-" option, "Disable
>> the optimization pass that scans for opportunities to use 'decrement and branch'
>> instructions on a count register instead of instruction sequences that decrement
>> a register, compare it against zero, and then branch based upon the result.", it
>> implicitly says it has count register support.  But I noticed that the gate of
>> doloop_optimize checks this flag, as what I got from the previous discussions, some
>> targets which can perform doloop_optimize don't have specific count register, so it
>> sounds we can't make use of the flag, is it correct?
>>
>> Bootstrapped on powerpcle, also ran regression testing on powerpcle, got one failure
>> which is exposed by this patch and the root cause is duplicate of PR62147.
>> case is gcc.target/powerpc/20050830-1.c
>>
>> Is it OK for trunk?
> Sorry for the delaying.
> 
> I am not in favor of the approach very much.  When rewriting the pass
> last time, we tried to reuse as much code as possible between cost
> computation and iv_use rewriting.  we also followed guideline when
> finite cost computed for cand/use pair, the use should be rewritten
> using the cand successfully.  However, the patch adjust infinite cost
> to zero cost causing cand can't be used to rewrite iv_use selected,
> this is a backward step IMHO.

Thanks a lot for your time and comments.

V2: https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00655.html

The previous version 2 (above link) used the way to teach selection 
algorithm to be aware of the group with bind_cand, it didn't zeroing 
the cost of doloop IV use, but both of them are equivalent to ignore
this doloop IV use in selection. 

Then I was thinking as granted that it changed many places to take care 
of this bind_cand group, worsen the readability and seems invasive to
the existing algorithm too much.  For example, affected functions were: 
set_group_iv_cost/iv_ca_dump/try_add_cand_for/iv_ca_extend/iv_ca_narrow
/iv_ca_replace.  

At that time, I thought version 3 doesn't need to teach the existing algorithm
anything, leaves it to go as before and only need some fixups when it needs.
It has better readability and well fit in current handlings.
> 
> I am not sure if this is only useful for doloop cases, or for general cases?
> 

Not sure either.

> Comment mentioned the point is to give more chances to consider other
> IV cands instead of BIV.  If current algorithm relies on zeroing cost
> of impossible cand/use pair to select optimal result, I suspect it's a
> bug which should be fixed in candidate selection algorithm.  Do you
> have a test case showing the issue? We should fix it as a standalone
> problem, while the approach is covering the problem and not that
> sound.

The best case is the one which caused PR80791.  It has two IV uses, one 
is generic (use 0) and the other is compare (doloop use, use 1).  
The best optimal set is to assign cand 6 to use 0 and assign whatever 
but excepting for infinite cost cand to use 1.  The actual selection set
is to assign BIV to both uses.

The wording "to give more chances to consider other IV cands instead of
BIV" is for the use 0.  If we don't make anything special for use 1, cand
4 is always considered in the optimal set, then use 0 can't have any chances
to use cand 6 (extra iv cost).  Zeroing the cost to make selection not
consider use 1 at all.

You may have the question why not just zeroing those finite cost ones,
then in this case, only cand 1->5 can be selected for use 1 then BIV still
wins since it have best iv cost, it's selected for use 0 instead of cand 6.
If we consider infinite cost cand, bring cand 6 in and get optimal set cand 6.
btw, under TARGET_HAVE_COUNT_REG_DECR_P, the doloop use will be eliminated.
Zeroing it also matches this.

The original cost mapping without my patch:

<Group-candidate Costs>:
Group 0:
  cand>-cost>---compl.>-inv.expr.>------inv.vars
  0>----8>------0>------1;>-----NIL;
  1>----8>------0>------1;>-----NIL;
  2>----4>------0>------NIL;>---NIL;
  3>----8>------0>------1;>-----NIL;
  4>----4>------0>------NIL;>---NIL;
  5>----8>------0>------NIL;>---NIL;
  6>----0>------0>------NIL;>---NIL;

Group 1:
  cand>-cost>---compl.>-inv.expr.>------inv.vars
  0>----8>------0>------NIL;>---1
  1>----8>------0>------NIL;>---1
  2>----0>------0>------NIL;>---NIL;
  3>----4>------0>------NIL;>---1
  4>----0>------0>------NIL;>---NIL;
  5>----4>------0>------NIL;>---NIL;

I'm not sure it can be taken as a bug or not, better to say doloop use
only?  For doloop use, we can teach the algorithm not to take iv cost
into account if that iv cand is only for doloop use.  Since the doloop
use will be eliminated, it's fine.  But for the generic one, we need to
have the iv cost there.

I expect it doesn't need to teach many places. I'll give it a try.  :)

> 
> However, I think the patch can be changed that only finite cost should
> be adjusted to zero.  Thus guarantee any cand selected is valid to
> rewrite iv_use.
> 

Agreed if the above is taught.

Thanks again!


Kewen