Re: [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Kewen.Lin" <linkw@linux.ibm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: "bin.cheng" <bin.cheng@linux.alibaba.com>,
	Richard Guenther <rguenther@suse.de>,
	Bill Schmidt <wschmidt@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	"Bin.Cheng" <amker.cheng@gmail.com>
Subject: Re: [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling
Date: Wed, 3 Jun 2020 11:18:33 +0800	[thread overview]
Message-ID: <a1c6a204-8c42-734c-912e-7d4cf282d434@linux.ibm.com> (raw)
In-Reply-To: <mpto8q2ym63.fsf@arm.com>

Hi Richard,

on 2020/6/2 下午3:14, Richard Sandiford wrote:
> "Kewen.Lin" <linkw@linux.ibm.com> writes:
>> Hi Richard,
>>
>> Thanks for the comments!
>>
>> on 2020/6/2 上午1:59, Richard Sandiford wrote:
>>> Could you go into more detail about this choice of cost calculation?
>>> It looks like we first calculate per-group flags, which are true only if
>>> the unrolled offsets are valid for all uses in the group.  Then we create
>>> per-candidate flags when associating candidates with groups.
>>>
>>
>> Sure.  It checks every address type IV group to determine whether this
>> group is valid to use reg offset addressing mode.  Here we only need to
>> check the first one and the last one, since the intermediates should 
>> have been handled by split_address_groups.  With unrolling the
>> displacement of the address can be offset-ed by (UF-1)*step, check the
>> address with this max offset whether still valid.  If the check finds
>> it's valid to use reg offset mode for the whole group, we flag this
>> group.  Later, when we create IV candidate for address group flagged,
>> we flag the candidate further.  This flag is mainly for iv cand
>> costing, we don't need to scale up iv cand's step cost for this kind
>> of candidate.
> 
> But AIUI, this is calculating whether the uses in their original form
> support all unrolled offsets.  For ivopts, I think the question is really
> whether the uses support all unrolled offsets when based on a given IV
> candidate (which might not be the original IV).
> 

Good point!  Indeed, the patch only flags the IV cands derived from the
address group flagged with reg_offset_p, it has the possibility that
we miss some other candidates with same basic but different offset
which can satisfy addr_offset_valid_p.

How about to update the current approach to: instead of flag the derived
iv cand, when we determine the cost for iv cand, we check whether this
iv cand has the same basic object as the one of any reg_offset_p vuse[0]
(both should be stripped their offsets), then further check the offset
can satisfy addr_offset_valid_p, if all checks expected, update the step
cost without uf consideration.

I would expect this kind of address based iv cand will mainly be used
for address type group and compare type group, for the address type 
group, it can be only applied for those with same basic object, in most
cases they are put in the same address group.  If it's used for generic
much, the step cost tweaking might not be fixed at the beginning but
varies according to the iv set members.  It looks too much for the
existing framework.

> E.g. there might be another IV candidate at a constant offset
> from the original one, and the offsets might all be in range
> for that offset too.
> 
>> Imagining this loop is being unrolled, all the statements will be
>> duplicated by UF.  For the cost modeling against iv group, it's
>> scaling up the cost by UF (here I simply excluded the compare_type
>> since in most cases it for loop ending check).  For the cost modeling
>> against iv candidate, it's to focus on step costs, for an iv candidate
>> we flagged before, it's taken as one time step cost, for the others,
>> it's scaling up the step cost since the unrolling make step 
>> calculation become UF times.
>>
>> This cost modeling is trying to simulate cost change after the
>> unrolling, scaling up the costs accordingly.  There are somethings
>> to be improved like distinguish the loop ending compare or else,
>> whether need to tweak the other costs somehow since the scaling up
>> probably cause existing cost framework imbalance, but during
>> benchmarking I didn't find these matter, so take it as simple as 
>> possible for now.
>>
>>
>>> Instead, couldn't we take this into account in get_address_cost,
>>> which calculates the cost of an address use for a given candidate?
>>> E.g. after the main if-else at the start of the function,
>>> perhaps it would make sense to add the worst-case offset to
>>> the address in “parts”, check whether that too is a valid address,
>>> and if not, increase var_cost by the cost of one add instruction.
>>>
>>
>> IIUC, what you suggest is to tweak the iv group cost, if we find
>> one address group is valid for reg offset mode, we price more on
>> the pairs between this group and other non address-based iv cands.
>> The question is how do we decide this add-on cost.  For the test
>> case I was working on initially, adding one cost (of add) doesn't
>> work, the normal iv still wined.  We can price it more like two
>> but what's the justification on this value, by heuristics?
> 
> Yeah, I was thinking of adding one instance of add_cost.  If that
> doesn't work, it'd be interesting to know why in more detail.
> 

The case is like:

            for (i = 0; i < SIZE; i++)
              y[i] = a * x[i] + z[i];

It has three array access in the loop body, after vectorization,
it looks like

  vect__1.7_15 = MEM <vector(2) double> [(double *)vectp_x.5_20];
  vect__3.8_13 = vect__1.7_15 * vect_cst__14;
  vect__4.11_19 = MEM <vector(2) double> [(double *)vectp_z.9_7];
  vect__5.12_23 = vect__3.8_13 + vect__4.11_19;
  MEM <vector(2) double> [(double *)vectp_y.13_24] = vect__5.12_23;

we expect to use reg_offset for those vector load/store when
unrolling factor is big, but without unrolling or unrolling factor
smaller like 2, the reg_index should outperform reg_offset.

IIUC, the proposed costing change would look like (the left is before
change, while the right is after change).  Those zero costs are for
those iv cands based on address objects.  Group 0,1,2 are address
groups, group 3 is compare group (omitted).  One insn add cost is
counted as 4 on ppc64le.


  <Group-candidate Costs>:                \  <Group-candidate Costs>:
  Group 0:                                \  Group 0:
    cand  cost    compl.  inv.expr.       \    cand  cost    compl.  inv.expr.       inv.vars
    1     5       1       1;      NIL;    \    1     9       1       1;      NIL;
    4     1       1       1;      NIL;    \    4     5       1       1;      NIL;
    7     0       1       NIL;    NIL;    \    7     0       1       NIL;    NIL;
    8     0       0       NIL;    NIL;    \    8     0       0       NIL;    NIL;
    9     0       0       NIL;    NIL;    \    9     0       0       NIL;    NIL;
    13    5       1       1;      NIL;    \    13    9       1       1;      NIL;
    14    5       1       3;      NIL;    \    14    9       1       3;      NIL;
                                          \
  Group 1:                                \  Group 1:
    cand  cost    compl.  inv.expr.       \    cand  cost    compl.  inv.expr.       inv.vars
    1     5       1       4;      NIL;    \    1     9       1       4;      NIL;
    3     0       1       NIL;    NIL;    \    3     0       1       NIL;    NIL;
    4     1       1       4;      NIL;    \    4     5       1       4;      NIL;
    5     0       0       NIL;    NIL;    \    5     0       0       NIL;    NIL;
    6     0       0       NIL;    NIL;    \    6     0       0       NIL;    NIL;
    13    5       1       4;      NIL;    \    13    9       1       4;      NIL;
    14    5       1       6;      NIL;    \    14    9       1       6;      NIL;
                                          \
  Group 2:                                \  Group 2:
    cand  cost    compl.  inv.expr.       \    cand  cost    compl.  inv.expr.       inv.vars
    1     5       1       7;      NIL;    \    1     9       1       7;      NIL;
    4     1       1       7;      NIL;    \    4     5       1       7;      NIL;
    10    0       0       NIL;    NIL;    \    10    4       0       NIL;    NIL;
    11    0       0       NIL;    NIL;    \    11    4       0       NIL;    NIL;
    12    0       1       NIL;    NIL;    \    12    4       1       NIL;    NIL;
    13    5       1       7;      NIL;    \    13    9       1       7;      NIL;
    14    5       1       9;      NIL;    \    14    9       1       9;      NIL;

  Initial set of candidates:              \  Initial set of candidates:
    cost: 13 (complexity 3)               \    cost: 25 (complexity 0)
    reg_cost: 5                           \    reg_cost: 6
    cand_cost: 5                          \    cand_cost: 15
    cand_group_cost: 3 (complexity 3)     \    cand_group_cost: 4 (complexity 0)
    candidates: 4                         \    candidates: 5, 8, 10
     group:0 --> iv_cand:4, cost=(1,1)    \     group:0 --> iv_cand:8, cost=(0,0)
     group:1 --> iv_cand:4, cost=(1,1)    \     group:1 --> iv_cand:5, cost=(0,0)
     group:2 --> iv_cand:4, cost=(1,1)    \     group:2 --> iv_cand:10, cost=(4,0)
     group:3 --> iv_cand:4, cost=(0,0)    \     group:3 --> iv_cand:5, cost=(0,0)
    invariant variables:                  \    invariant variables:
    invariant expressions: 1, 4, 7        \    invariant expressions:
                                          \
  Original cost 21 (complexity 0)         \  Original cost 25 (complexity 0)
                                          \
  Final cost 13 (complexity 3)            \  Final cost 25 (complexity 0)

When unrolling factor is 8, we expect to use reg offset modes for group 0, 1, 2,
this proposal works well.  But for unrolling factor is 2, we expect to still
use reg index modes for group 0, 1, 2 (with iv_cand 4 as before), this proposal
looks sticked to use iv_cand 8,5,10.  It looks we will over-blame the other 
non reg offset candidates.  sorry that I have no idea how to leverage uf in this
proposal, but it changes the focus to iv_group cost, seems not easy to scale up
or down this with good justification.

BR,
Kewen

>>> I guess there are two main sources of inexactness if we do that:
>>>
>>> (1) It might underestimate the cost because it assumes that vuse[0]
>>>     stands for all vuses in the group.
>>>
>>
>> Do you mean we don't need one check function like mark_reg_offset_groups?
>> If without it, vuse[0] might be not enough since we can't ensure the
>> others are fine with additional displacement from unrolling.  If we still
>> have it, I think it's fine to just use vuse[0].
>>
>>> (2) It might overestimates the cost because it treats all unrolled
>>>     iterations as having the cost of the final unrolled iteration.
>>>
>>> (1) could perhaps be avoided by adding a flag to the iv_use to say
>>> whether it wants this treatment.  I think the flag approach suffers
>>> from (2) too, and I'd be surprised if it makes a difference in practice.
>>>
>>
>> Sorry, I didn't have the whole picture how to deal with uf for your proposal.
>> But the flag approach considers uf in iv group cost calculation as well as
>> iv cand step cost calculation.
>>
>> BR,
>> Kewen
>>
>>> Thanks,
>>> Richard
>>>

next prev parent reply	other threads:[~2020-06-03  3:18 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 12:17 [PATCH 0/4] IVOPTs consider step cost for different forms when unrolling Kewen.Lin
2020-05-28 12:19 ` [PATCH 1/4] unroll: Add middle-end unroll factor estimation Kewen.Lin
2020-08-31  5:49   ` PING " Kewen.Lin
2020-09-15  7:44     ` PING^2 " Kewen.Lin
2020-10-13  7:06       ` PING^3 " Kewen.Lin
2020-11-02  9:13         ` PING^4 " Kewen.Lin
2020-11-19  5:50           ` PING^5 " Kewen.Lin
2020-12-17  2:58             ` PING^6 " Kewen.Lin
2021-01-14  2:36               ` PING^7 " Kewen.Lin
2021-01-21 21:45   ` Segher Boessenkool
2021-01-22 12:50     ` Richard Sandiford
2021-01-22 13:47     ` Richard Biener
2021-01-22 21:37       ` Segher Boessenkool
2021-01-25  7:56         ` Richard Biener
2021-01-25 17:59           ` Richard Sandiford
2021-01-25 20:37             ` Segher Boessenkool
2021-01-26  8:53               ` Kewen.Lin
2021-01-26 17:31                 ` Segher Boessenkool
2021-01-26  8:43             ` Kewen.Lin
2021-01-26 10:47             ` Richard Biener
2021-01-26 17:54               ` Segher Boessenkool
2021-01-26  8:36           ` Kewen.Lin
2021-01-26 10:53             ` Richard Biener
2021-01-27  9:43               ` Kewen.Lin
2021-03-01  2:45                 ` Kewen.Lin
2020-05-28 12:23 ` [PATCH 2/4] param: Introduce one param to control ivopts reg-offset consideration Kewen.Lin
2020-05-28 12:24 ` [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling Kewen.Lin
2020-06-01 17:59   ` Richard Sandiford
2020-06-02  3:39     ` Kewen.Lin
2020-06-02  7:14       ` Richard Sandiford
2020-06-03  3:18         ` Kewen.Lin [this message]
2020-08-08  8:01   ` Bin.Cheng
2020-08-10  4:27     ` Kewen.Lin
2020-08-10 12:38       ` Bin.Cheng
2020-08-10 14:41         ` Kewen.Lin
2020-08-16  3:59           ` Bin.Cheng
2020-08-18  9:02             ` [PATCH 3/4 v2] " Kewen.Lin
2020-08-22  5:11               ` Bin.Cheng
2020-08-25 12:46                 ` [PATCH 3/4 v3] " Kewen.Lin
2020-08-31 19:41                   ` Segher Boessenkool
2020-09-02  3:16                     ` Kewen.Lin
2020-09-02 10:25                       ` Segher Boessenkool
2020-09-03  2:24                         ` Kewen.Lin
2020-09-03 22:37                           ` Segher Boessenkool
2020-09-04  8:27                             ` Bin.Cheng
2020-09-04 13:53                               ` Segher Boessenkool
2020-09-04  8:47                             ` Kewen.Lin
2020-09-04 14:16                               ` Segher Boessenkool
2020-09-04 15:47                                 ` Kewen.Lin
2020-09-17 23:12                             ` Jeff Law
2020-09-17 23:46                               ` Segher Boessenkool
2020-09-01 11:19                   ` Bin.Cheng
2020-09-02  3:50                     ` Kewen.Lin
2020-09-02  3:55                       ` Bin.Cheng
2020-09-02  4:51                         ` Kewen.Lin
2020-09-06  2:47                     ` Hans-Peter Nilsson
2020-09-15  7:41                       ` Kewen.Lin
2020-06-02 11:38 ` [PATCH 0/4] IVOPTs consider step cost for different forms when unrolling Richard Biener
2020-06-03  3:46   ` Kewen.Lin
2020-06-03  7:07     ` Richard Biener
2020-06-03  7:58       ` Kewen.Lin
2020-06-03  9:27         ` Richard Biener
2020-06-03 10:47           ` Kewen.Lin
2020-06-03 11:08             ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1c6a204-8c42-734c-912e-7d4cf282d434@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=amker.cheng@gmail.com \
    --cc=bin.cheng@linux.alibaba.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).