* [PATCH] vect: Add extraction cost for slp reduc
@ 2021-08-16 6:03 Kewen.Lin
2021-08-16 6:49 ` Richard Biener
0 siblings, 1 reply; 4+ messages in thread
From: Kewen.Lin @ 2021-08-16 6:03 UTC (permalink / raw)
To: GCC Patches
Cc: Richard Biener, Richard Sandiford, Segher Boessenkool, Bill Schmidt
Hi,
IIUC, the function vectorizable_bb_reduc_epilogue missed to
consider the cost to extract the final value from the vector
for reduc operations. This patch is to add one time of
vec_to_scalar cost for extracting.
Bootstrapped & regtested on powerpc64le-linux-gnu P9.
The testing on x86_64 and aarch64 is ongoing.
Is it ok for trunk?
BR,
Kewen
-----
gcc/ChangeLog:
* tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
value extraction.
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b9d88c2d943..841a0872afa 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
return false;
/* There's no way to cost a horizontal vector reduction via REDUC_FN so
- cost log2 vector operations plus shuffles. */
+ cost log2 vector operations plus shuffles and one extraction. */
unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
vectype, 0, vect_body);
record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
vectype, 0, vect_body);
+ record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
+ vectype, 0, vect_body);
return true;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] vect: Add extraction cost for slp reduc
2021-08-16 6:03 [PATCH] vect: Add extraction cost for slp reduc Kewen.Lin
@ 2021-08-16 6:49 ` Richard Biener
2021-08-16 7:15 ` Kewen.Lin
0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2021-08-16 6:49 UTC (permalink / raw)
To: Kewen.Lin
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Bill Schmidt
On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi,
>
> IIUC, the function vectorizable_bb_reduc_epilogue missed to
> consider the cost to extract the final value from the vector
> for reduc operations. This patch is to add one time of
> vec_to_scalar cost for extracting.
>
> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
> The testing on x86_64 and aarch64 is ongoing.
>
> Is it ok for trunk?
There's no such instruction necessary, the way the costing works
the result is in lane zero already. Note the optabs are defined
to reduce to a scalar already. So if your arch implements those and
requires such move then the backend costing needs to handle that.
That said, ideally we'd simply cost the IFN_REDUC_* in the backend
but for BB reductions we don't actually build a SLP node with such
representative stmt to pass down (yet).
I guess you're running into a integer reduction where there's
a vector -> gpr move missing in costing? I suppose costing
vec_to_scalar works for that but in the end we should maybe
find a way to cost the IFN_REDUC_* ...
Richard.
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
> value extraction.
>
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index b9d88c2d943..841a0872afa 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
> return false;
>
> /* There's no way to cost a horizontal vector reduction via REDUC_FN so
> - cost log2 vector operations plus shuffles. */
> + cost log2 vector operations plus shuffles and one extraction. */
> unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
> record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
> vectype, 0, vect_body);
> record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
> vectype, 0, vect_body);
> + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
> + vectype, 0, vect_body);
> return true;
> }
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] vect: Add extraction cost for slp reduc
2021-08-16 6:49 ` Richard Biener
@ 2021-08-16 7:15 ` Kewen.Lin
2021-08-16 13:27 ` Richard Biener
0 siblings, 1 reply; 4+ messages in thread
From: Kewen.Lin @ 2021-08-16 7:15 UTC (permalink / raw)
To: Richard Biener
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Bill Schmidt
Hi Richi,
Thanks for the comments!
on 2021/8/16 下午2:49, Richard Biener wrote:
> On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> IIUC, the function vectorizable_bb_reduc_epilogue missed to
>> consider the cost to extract the final value from the vector
>> for reduc operations. This patch is to add one time of
>> vec_to_scalar cost for extracting.
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
>> The testing on x86_64 and aarch64 is ongoing.
>>
>> Is it ok for trunk?
>
> There's no such instruction necessary, the way the costing works
> the result is in lane zero already. Note the optabs are defined
> to reduce to a scalar already. So if your arch implements those and
> requires such move then the backend costing needs to handle that.
>
Yes, these reduc_<operation>_scal_<mode> should have made the
operand[0] as the final scalar result.
> That said, ideally we'd simply cost the IFN_REDUC_* in the backend
> but for BB reductions we don't actually build a SLP node with such
> representative stmt to pass down (yet).
>
OK, thanks for the explanation. It explains why we cost the
IFN_REDUC_* as one vect_stmt in loop vect but cost it as
conservative (shuffle and reduc_op) as possible here.
> I guess you're running into a integer reduction where there's
> a vector -> gpr move missing in costing? I suppose costing
> vec_to_scalar works for that but in the end we should maybe
> find a way to cost the IFN_REDUC_* ...
Yeah, it's a reduction on plus, initially I wanted to adjust backend
costing for various IFN_REDUC* (since for some variants Power has more
than one instructions for them), then I noticed we cost the reduction
as shuffle and reduc_op during SLP for now, I guess it's good to get
vec_to_scalar considered here for consistency? Then it can be removed
together when we have a better modeling in the end?
BR,
Kewen
>
> Richard.
>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
>> value extraction.
>>
>> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>> index b9d88c2d943..841a0872afa 100644
>> --- a/gcc/tree-vect-slp.c
>> +++ b/gcc/tree-vect-slp.c
>> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
>> return false;
>>
>> /* There's no way to cost a horizontal vector reduction via REDUC_FN so
>> - cost log2 vector operations plus shuffles. */
>> + cost log2 vector operations plus shuffles and one extraction. */
>> unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
>> record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
>> vectype, 0, vect_body);
>> record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
>> vectype, 0, vect_body);
>> + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
>> + vectype, 0, vect_body);
>> return true;
>> }
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] vect: Add extraction cost for slp reduc
2021-08-16 7:15 ` Kewen.Lin
@ 2021-08-16 13:27 ` Richard Biener
0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2021-08-16 13:27 UTC (permalink / raw)
To: Kewen.Lin
Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Bill Schmidt
On Mon, Aug 16, 2021 at 9:16 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/8/16 下午2:49, Richard Biener wrote:
> > On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> IIUC, the function vectorizable_bb_reduc_epilogue missed to
> >> consider the cost to extract the final value from the vector
> >> for reduc operations. This patch is to add one time of
> >> vec_to_scalar cost for extracting.
> >>
> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
> >> The testing on x86_64 and aarch64 is ongoing.
> >>
> >> Is it ok for trunk?
> >
> > There's no such instruction necessary, the way the costing works
> > the result is in lane zero already. Note the optabs are defined
> > to reduce to a scalar already. So if your arch implements those and
> > requires such move then the backend costing needs to handle that.
> >
>
> Yes, these reduc_<operation>_scal_<mode> should have made the
> operand[0] as the final scalar result.
>
> > That said, ideally we'd simply cost the IFN_REDUC_* in the backend
> > but for BB reductions we don't actually build a SLP node with such
> > representative stmt to pass down (yet).
> >
>
> OK, thanks for the explanation. It explains why we cost the
> IFN_REDUC_* as one vect_stmt in loop vect but cost it as
> conservative (shuffle and reduc_op) as possible here.
>
> > I guess you're running into a integer reduction where there's
> > a vector -> gpr move missing in costing? I suppose costing
> > vec_to_scalar works for that but in the end we should maybe
> > find a way to cost the IFN_REDUC_* ...
>
> Yeah, it's a reduction on plus, initially I wanted to adjust backend
> costing for various IFN_REDUC* (since for some variants Power has more
> than one instructions for them), then I noticed we cost the reduction
> as shuffle and reduc_op during SLP for now, I guess it's good to get
> vec_to_scalar considered here for consistency? Then it can be removed
> together when we have a better modeling in the end?
Yeah, I guess that works for now.
Thanks,
Richard.
> BR,
> Kewen
>
> >
> > Richard.
> >
> >> BR,
> >> Kewen
> >> -----
> >> gcc/ChangeLog:
> >>
> >> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
> >> value extraction.
> >>
> >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> >> index b9d88c2d943..841a0872afa 100644
> >> --- a/gcc/tree-vect-slp.c
> >> +++ b/gcc/tree-vect-slp.c
> >> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
> >> return false;
> >>
> >> /* There's no way to cost a horizontal vector reduction via REDUC_FN so
> >> - cost log2 vector operations plus shuffles. */
> >> + cost log2 vector operations plus shuffles and one extraction. */
> >> unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
> >> record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
> >> vectype, 0, vect_body);
> >> record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
> >> vectype, 0, vect_body);
> >> + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
> >> + vectype, 0, vect_body);
> >> return true;
> >> }
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-08-16 13:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-16 6:03 [PATCH] vect: Add extraction cost for slp reduc Kewen.Lin
2021-08-16 6:49 ` Richard Biener
2021-08-16 7:15 ` Kewen.Lin
2021-08-16 13:27 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).