* Prefer open-coding vector integer division
@ 2018-05-24 8:31 Richard Sandiford
2018-05-24 8:37 ` Jakub Jelinek
2018-06-08 13:36 ` Matthew Fortune
0 siblings, 2 replies; 6+ messages in thread
From: Richard Sandiford @ 2018-05-24 8:31 UTC (permalink / raw)
To: gcc-patches; +Cc: matthew.fortune
vect_recog_divmod_pattern currently bails out if the target has
native support for integer division, but I think in practice
it's always going to be better to open-code it anyway, just as
we usually open-code scalar divisions by constants.
I think the only currently affected target is MIPS MSA, where for:
void
foo (int *x)
{
for (int i = 0; i < 100; ++i)
x[i] /= 2;
}
we previously preferred to use division for powers of 2:
.set noreorder
bnz.w $w1,1f
div_s.w $w0,$w0,$w1
break 7
.set reorder
1:
(or just the div_s.w for -mno-check-zero-division), but after the patch
we open-code them using shifts:
clt_s.w $w1,$w0,$w2
subv.w $w0,$w0,$w1
srai.w $w0,$w0,1
I assume that's better. Matthew, is that right?
MSA doesn't define a high-part pattern, so it still uses a division
instruction for the non-power-of-2 case.
Richard B pointed out that this would disable SLP of division by
different amounts, but I think in practice that's a price worth paying,
since the current cost model can't really tell whether using a general
vector division is better than using open-coded scalar divisions.
The fix would be either to support SLP of mixed open-coded divisions
or to improve the cost model and try SLP again without the patterns.
The patch adds an XFAILed test for this.
Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu. OK to install?
Richard
2018-05-24 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-patterns.c (vect_recog_divmod_pattern): Remove check
for division support.
gcc/testsuite/
* gcc.dg/vect/bb-slp-div-1.c: New XFAILed test.
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100
+++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100
@@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *>
enum tree_code rhs_code;
stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
vec_info *vinfo = stmt_vinfo->vinfo;
- optab optab;
tree q;
int dummy_int, prec;
stmt_vec_info def_stmt_vinfo;
@@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *>
if (vectype == NULL_TREE)
return NULL;
- /* If the target can handle vectorized division or modulo natively,
- don't attempt to optimize this. */
- optab = optab_for_tree_code (rhs_code, vectype, optab_default);
- if (optab != unknown_optab)
- {
- machine_mode vec_mode = TYPE_MODE (vectype);
- int icode = (int) optab_handler (optab, vec_mode);
- if (icode != CODE_FOR_nothing)
- return NULL;
- }
-
prec = TYPE_PRECISION (itype);
if (integer_pow2p (oprnd1))
{
Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c
===================================================================
--- /dev/null 2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c 2018-05-24 09:18:10.444466986 +0100
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve } } */
+
+int x[8];
+
+void
+f (void)
+{
+ x[0] /= 2;
+ x[1] /= 3;
+ x[2] /= 4;
+ x[3] /= 5;
+ x[4] /= 6;
+ x[5] /= 7;
+ x[6] /= 8;
+ x[7] /= 9;
+}
+
+/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail *-*-* } } } */
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Prefer open-coding vector integer division
2018-05-24 8:31 Prefer open-coding vector integer division Richard Sandiford
@ 2018-05-24 8:37 ` Jakub Jelinek
2018-05-24 10:07 ` Richard Biener
2018-06-08 13:36 ` Matthew Fortune
1 sibling, 1 reply; 6+ messages in thread
From: Jakub Jelinek @ 2018-05-24 8:37 UTC (permalink / raw)
To: gcc-patches, matthew.fortune, richard.sandiford
On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote:
> vect_recog_divmod_pattern currently bails out if the target has
> native support for integer division, but I think in practice
> it's always going to be better to open-code it anyway, just as
> we usually open-code scalar divisions by constants.
>
> I think the only currently affected target is MIPS MSA, where for:
Isn't powerpcspe affected too? It has a divv2si3 pattern.
> --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100
> +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100
> @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *>
> enum tree_code rhs_code;
> stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
> vec_info *vinfo = stmt_vinfo->vinfo;
> - optab optab;
> tree q;
> int dummy_int, prec;
> stmt_vec_info def_stmt_vinfo;
> @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *>
> if (vectype == NULL_TREE)
> return NULL;
>
> - /* If the target can handle vectorized division or modulo natively,
> - don't attempt to optimize this. */
> - optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> - if (optab != unknown_optab)
> - {
> - machine_mode vec_mode = TYPE_MODE (vectype);
> - int icode = (int) optab_handler (optab, vec_mode);
> - if (icode != CODE_FOR_nothing)
> - return NULL;
> - }
> -
Shouldn't we instead keep it, but only do it if
optimize_bb_for_size_p (gimple_bb (last_stmt)) ?
I mean, a hw division is very likely shorter than what we replace it with...
Jakub
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Prefer open-coding vector integer division
2018-05-24 8:37 ` Jakub Jelinek
@ 2018-05-24 10:07 ` Richard Biener
2018-05-24 12:53 ` Richard Sandiford
0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2018-05-24 10:07 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: GCC Patches, matthew.fortune, Richard Sandiford
On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote:
> > vect_recog_divmod_pattern currently bails out if the target has
> > native support for integer division, but I think in practice
> > it's always going to be better to open-code it anyway, just as
> > we usually open-code scalar divisions by constants.
> >
> > I think the only currently affected target is MIPS MSA, where for:
> Isn't powerpcspe affected too? It has a divv2si3 pattern.
> > --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100
> > +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100
> > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *>
> > enum tree_code rhs_code;
> > stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
> > vec_info *vinfo = stmt_vinfo->vinfo;
> > - optab optab;
> > tree q;
> > int dummy_int, prec;
> > stmt_vec_info def_stmt_vinfo;
> > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *>
> > if (vectype == NULL_TREE)
> > return NULL;
> >
> > - /* If the target can handle vectorized division or modulo natively,
> > - don't attempt to optimize this. */
> > - optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> > - if (optab != unknown_optab)
> > - {
> > - machine_mode vec_mode = TYPE_MODE (vectype);
> > - int icode = (int) optab_handler (optab, vec_mode);
> > - if (icode != CODE_FOR_nothing)
> > - return NULL;
> > - }
> > -
> Shouldn't we instead keep it, but only do it if
> optimize_bb_for_size_p (gimple_bb (last_stmt)) ?
> I mean, a hw division is very likely shorter than what we replace it
with...
Good idea. While loops are checked for optimize_loop_nest_for_speed if
not forced vect BB vectorization has no such predicate.
Richard.
> Jakub
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Prefer open-coding vector integer division
2018-05-24 10:07 ` Richard Biener
@ 2018-05-24 12:53 ` Richard Sandiford
2018-05-25 7:27 ` Richard Biener
0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2018-05-24 12:53 UTC (permalink / raw)
To: Richard Biener; +Cc: Jakub Jelinek, GCC Patches, matthew.fortune
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote:
>> > vect_recog_divmod_pattern currently bails out if the target has
>> > native support for integer division, but I think in practice
>> > it's always going to be better to open-code it anyway, just as
>> > we usually open-code scalar divisions by constants.
>> >
>> > I think the only currently affected target is MIPS MSA, where for:
>
>> Isn't powerpcspe affected too? It has a divv2si3 pattern.
Ah, yeah. I forgot to update the description after you mentioned
that on IRC, sorry.
>> > --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100
>> > +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100
>> > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *>
>> > enum tree_code rhs_code;
>> > stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
>> > vec_info *vinfo = stmt_vinfo->vinfo;
>> > - optab optab;
>> > tree q;
>> > int dummy_int, prec;
>> > stmt_vec_info def_stmt_vinfo;
>> > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *>
>> > if (vectype == NULL_TREE)
>> > return NULL;
>> >
>> > - /* If the target can handle vectorized division or modulo natively,
>> > - don't attempt to optimize this. */
>> > - optab = optab_for_tree_code (rhs_code, vectype, optab_default);
>> > - if (optab != unknown_optab)
>> > - {
>> > - machine_mode vec_mode = TYPE_MODE (vectype);
>> > - int icode = (int) optab_handler (optab, vec_mode);
>> > - if (icode != CODE_FOR_nothing)
>> > - return NULL;
>> > - }
>> > -
>
>> Shouldn't we instead keep it, but only do it if
>> optimize_bb_for_size_p (gimple_bb (last_stmt)) ?
>> I mean, a hw division is very likely shorter than what we replace it
> with...
>
> Good idea. While loops are checked for optimize_loop_nest_for_speed if
> not forced vect BB vectorization has no such predicate.
Like this, if it passes testing?
Thanks,
Richard
2018-05-24 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-vect-patterns.c: Include predict.h.
(vect_recog_divmod_pattern): Restrict check for division support
to when optimizing for size.
gcc/testsuite/
* gcc.dg/vect/bb-slp-div-1.c: New XFAILed test.
Index: gcc/tree-vect-patterns.c
===================================================================
*** gcc/tree-vect-patterns.c 2018-05-24 13:50:21.656138942 +0100
--- gcc/tree-vect-patterns.c 2018-05-24 13:50:21.819134113 +0100
*************** vect_recog_divmod_pattern (vec<gimple *>
*** 2674,2688 ****
if (vectype == NULL_TREE)
return NULL;
! /* If the target can handle vectorized division or modulo natively,
! don't attempt to optimize this. */
! optab = optab_for_tree_code (rhs_code, vectype, optab_default);
! if (optab != unknown_optab)
{
! machine_mode vec_mode = TYPE_MODE (vectype);
! int icode = (int) optab_handler (optab, vec_mode);
! if (icode != CODE_FOR_nothing)
! return NULL;
}
prec = TYPE_PRECISION (itype);
--- 2674,2692 ----
if (vectype == NULL_TREE)
return NULL;
! if (optimize_bb_for_size_p (gimple_bb (last_stmt)))
{
! /* If the target can handle vectorized division or modulo natively,
! don't attempt to optimize this, since native division is likely
! to give smaller code. */
! optab = optab_for_tree_code (rhs_code, vectype, optab_default);
! if (optab != unknown_optab)
! {
! machine_mode vec_mode = TYPE_MODE (vectype);
! int icode = (int) optab_handler (optab, vec_mode);
! if (icode != CODE_FOR_nothing)
! return NULL;
! }
}
prec = TYPE_PRECISION (itype);
Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c
===================================================================
*** /dev/null 2018-04-20 16:19:46.369131350 +0100
--- gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c 2018-05-24 13:50:21.818134143 +0100
***************
*** 0 ****
--- 1,19 ----
+ /* { dg-do compile } */
+ /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve } } */
+
+ int x[8];
+
+ void
+ f (void)
+ {
+ x[0] /= 2;
+ x[1] /= 3;
+ x[2] /= 4;
+ x[3] /= 5;
+ x[4] /= 6;
+ x[5] /= 7;
+ x[6] /= 8;
+ x[7] /= 9;
+ }
+
+ /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail *-*-* } } } */
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Prefer open-coding vector integer division
2018-05-24 12:53 ` Richard Sandiford
@ 2018-05-25 7:27 ` Richard Biener
0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2018-05-25 7:27 UTC (permalink / raw)
To: Jakub Jelinek, GCC Patches, matthew.fortune, Richard Sandiford
On Thu, May 24, 2018 at 2:52 PM Richard Sandiford <
richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Thu, May 24, 2018 at 10:31 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >> On Thu, May 24, 2018 at 09:21:35AM +0100, Richard Sandiford wrote:
> >> > vect_recog_divmod_pattern currently bails out if the target has
> >> > native support for integer division, but I think in practice
> >> > it's always going to be better to open-code it anyway, just as
> >> > we usually open-code scalar divisions by constants.
> >> >
> >> > I think the only currently affected target is MIPS MSA, where for:
> >
> >> Isn't powerpcspe affected too? It has a divv2si3 pattern.
> Ah, yeah. I forgot to update the description after you mentioned
> that on IRC, sorry.
> >> > --- gcc/tree-vect-patterns.c 2018-05-16 12:48:59.115202362 +0100
> >> > +++ gcc/tree-vect-patterns.c 2018-05-24 09:18:10.445466941 +0100
> >> > @@ -2639,7 +2639,6 @@ vect_recog_divmod_pattern (vec<gimple *>
> >> > enum tree_code rhs_code;
> >> > stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
> >> > vec_info *vinfo = stmt_vinfo->vinfo;
> >> > - optab optab;
> >> > tree q;
> >> > int dummy_int, prec;
> >> > stmt_vec_info def_stmt_vinfo;
> >> > @@ -2674,17 +2673,6 @@ vect_recog_divmod_pattern (vec<gimple *>
> >> > if (vectype == NULL_TREE)
> >> > return NULL;
> >> >
> >> > - /* If the target can handle vectorized division or modulo
natively,
> >> > - don't attempt to optimize this. */
> >> > - optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> >> > - if (optab != unknown_optab)
> >> > - {
> >> > - machine_mode vec_mode = TYPE_MODE (vectype);
> >> > - int icode = (int) optab_handler (optab, vec_mode);
> >> > - if (icode != CODE_FOR_nothing)
> >> > - return NULL;
> >> > - }
> >> > -
> >
> >> Shouldn't we instead keep it, but only do it if
> >> optimize_bb_for_size_p (gimple_bb (last_stmt)) ?
> >> I mean, a hw division is very likely shorter than what we replace it
> > with...
> >
> > Good idea. While loops are checked for optimize_loop_nest_for_speed if
> > not forced vect BB vectorization has no such predicate.
> Like this, if it passes testing?
Looks good.
Thanks,
Richard.
> Thanks,
> Richard
> 2018-05-24 Richard Sandiford <richard.sandiford@linaro.org>
> gcc/
> * tree-vect-patterns.c: Include predict.h.
> (vect_recog_divmod_pattern): Restrict check for division support
> to when optimizing for size.
> gcc/testsuite/
> * gcc.dg/vect/bb-slp-div-1.c: New XFAILed test.
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> *** gcc/tree-vect-patterns.c 2018-05-24 13:50:21.656138942 +0100
> --- gcc/tree-vect-patterns.c 2018-05-24 13:50:21.819134113 +0100
> *************** vect_recog_divmod_pattern (vec<gimple *>
> *** 2674,2688 ****
> if (vectype == NULL_TREE)
> return NULL;
> ! /* If the target can handle vectorized division or modulo natively,
> ! don't attempt to optimize this. */
> ! optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> ! if (optab != unknown_optab)
> {
> ! machine_mode vec_mode = TYPE_MODE (vectype);
> ! int icode = (int) optab_handler (optab, vec_mode);
> ! if (icode != CODE_FOR_nothing)
> ! return NULL;
> }
> prec = TYPE_PRECISION (itype);
> --- 2674,2692 ----
> if (vectype == NULL_TREE)
> return NULL;
> ! if (optimize_bb_for_size_p (gimple_bb (last_stmt)))
> {
> ! /* If the target can handle vectorized division or modulo
natively,
> ! don't attempt to optimize this, since native division is likely
> ! to give smaller code. */
> ! optab = optab_for_tree_code (rhs_code, vectype, optab_default);
> ! if (optab != unknown_optab)
> ! {
> ! machine_mode vec_mode = TYPE_MODE (vectype);
> ! int icode = (int) optab_handler (optab, vec_mode);
> ! if (icode != CODE_FOR_nothing)
> ! return NULL;
> ! }
> }
> prec = TYPE_PRECISION (itype);
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c
> ===================================================================
> *** /dev/null 2018-04-20 16:19:46.369131350 +0100
> --- gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c 2018-05-24
13:50:21.818134143 +0100
> ***************
> *** 0 ****
> --- 1,19 ----
> + /* { dg-do compile } */
> + /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve
} } */
> +
> + int x[8];
> +
> + void
> + f (void)
> + {
> + x[0] /= 2;
> + x[1] /= 3;
> + x[2] /= 4;
> + x[3] /= 5;
> + x[4] /= 6;
> + x[5] /= 7;
> + x[6] /= 8;
> + x[7] /= 9;
> + }
> +
> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail
*-*-* } } } */
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Prefer open-coding vector integer division
2018-05-24 8:31 Prefer open-coding vector integer division Richard Sandiford
2018-05-24 8:37 ` Jakub Jelinek
@ 2018-06-08 13:36 ` Matthew Fortune
1 sibling, 0 replies; 6+ messages in thread
From: Matthew Fortune @ 2018-06-08 13:36 UTC (permalink / raw)
To: 'Richard Sandiford', gcc-patches
Richard Sandiford <richard.sandiford@linaro.org> writes:
> vect_recog_divmod_pattern currently bails out if the target has
> native support for integer division, but I think in practice
> it's always going to be better to open-code it anyway, just as
> we usually open-code scalar divisions by constants.
>
> I think the only currently affected target is MIPS MSA, where for:
>
> void
> foo (int *x)
> {
> for (int i = 0; i < 100; ++i)
> x[i] /= 2;
> }
>
> we previously preferred to use division for powers of 2:
>
> .set noreorder
> bnz.w $w1,1f
> div_s.w $w0,$w0,$w1
> break 7
> .set reorder
> 1:
>
> (or just the div_s.w for -mno-check-zero-division), but after the patch
> we open-code them using shifts:
>
> clt_s.w $w1,$w0,$w2
> subv.w $w0,$w0,$w1
> srai.w $w0,$w0,1
>
> I assume that's better. Matthew, is that right?
Sorry for extreme tardiness. Yes, the alternate sequence has a max latency
of 6. Although I don't have the range of latencies to hand for the FDIV, as
far as I remember 6 cycles is better than the fastest FDIV case at least for
i6400/i6500.
Matthew
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-06-08 13:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 8:31 Prefer open-coding vector integer division Richard Sandiford
2018-05-24 8:37 ` Jakub Jelinek
2018-05-24 10:07 ` Richard Biener
2018-05-24 12:53 ` Richard Sandiford
2018-05-25 7:27 ` Richard Biener
2018-06-08 13:36 ` Matthew Fortune
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).