* [RFC] non-unit stride loads for size power of 2.
@ 2016-01-12 14:51 Kumar, Venkataramanan
2016-01-13 13:55 ` Richard Biener
0 siblings, 1 reply; 2+ messages in thread
From: Kumar, Venkataramanan @ 2016-01-12 14:51 UTC (permalink / raw)
To: gcc-patches
Cc: Richard Beiner (richard.guenther@gmail.com),
Uros Bizjak (ubizjak@gmail.com)
Hi
The code below it looks like we always call “vect_permute_load_chain” to load non-unit strides of size powers of 2.
(---snip---)
/* If reassociation width for vector type is 2 or greater target machine can
execute 2 or more vector instructions in parallel. Otherwise try to
get chain for loads group using vect_shift_permute_load_chain. */
mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
|| exact_log2 (size) != -1
|| !vect_shift_permute_load_chain (dr_chain, size, stmt,
gsi, &result_chain))
vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
static bool
vect_shift_permute_load_chain (vec<tree> dr_chain,
unsigned int length,
gimple *stmt,
gimple_stmt_iterator *gsi,
vec<tree> *result_chain)
{
…...
…...
if (exact_log2 (length) != -1 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) ⇐ This is not used.
{
unsigned int j, log_length = exact_log2 (length);
for (i = 0; i < nelt / 2; ++i)
sel[i] = i * 2;
for (i = 0; i < nelt / 2; ++i)
sel[nelt / 2 + i] = i * 2 + 1;
(---snip------)
Is there any reason to do so?
I have not done any benchmarking, but tried simple test cases for -mavx targets with sizes 2, 4 and VF > 4 (short/char types).
Looks like using vect_shift_permute_load_chain seems better.
Should we change it to something like this ?
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index d0e20da..b0f0a02 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -5733,9 +5733,9 @@ vect_transform_grouped_load (gimple *stmt, vec<tree> dr_chain, int size,
get chain for loads group using vect_shift_permute_load_chain. */
mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
- || exact_log2 (size) != -1
- || !vect_shift_permute_load_chain (dr_chain, size, stmt,
- gsi, &result_chain))
+ || (!vect_shift_permute_load_chain (dr_chain, size, stmt,
+ gsi, &result_chain)
+ && exact_log2 (size) != -1))
vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
vect_record_grouped_load_vectors (stmt, result_chain);
result_chain.release ();
regards,
Venkat.
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [RFC] non-unit stride loads for size power of 2.
2016-01-12 14:51 [RFC] non-unit stride loads for size power of 2 Kumar, Venkataramanan
@ 2016-01-13 13:55 ` Richard Biener
0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2016-01-13 13:55 UTC (permalink / raw)
To: Kumar, Venkataramanan; +Cc: gcc-patches, Uros Bizjak (ubizjak@gmail.com)
On Tue, Jan 12, 2016 at 3:51 PM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi
>
> The code below it looks like we always call “vect_permute_load_chain” to load non-unit strides of size powers of 2.
>
> (---snip---)
> /* If reassociation width for vector type is 2 or greater target machine can
> execute 2 or more vector instructions in parallel. Otherwise try to
> get chain for loads group using vect_shift_permute_load_chain. */
> mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
>
> if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
> || exact_log2 (size) != -1
> || !vect_shift_permute_load_chain (dr_chain, size, stmt,
> gsi, &result_chain))
> vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
>
> static bool
> vect_shift_permute_load_chain (vec<tree> dr_chain,
> unsigned int length,
> gimple *stmt,
> gimple_stmt_iterator *gsi,
> vec<tree> *result_chain)
> {
> …...
> …...
> if (exact_log2 (length) != -1 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) ⇐ This is not used.
> {
> unsigned int j, log_length = exact_log2 (length);
> for (i = 0; i < nelt / 2; ++i)
> sel[i] = i * 2;
> for (i = 0; i < nelt / 2; ++i)
> sel[nelt / 2 + i] = i * 2 + 1;
> (---snip------)
>
>
> Is there any reason to do so?
No idea, benchmarking or history probably
(vect_shift_permute_load_chain not handlinging size != 3).
> I have not done any benchmarking, but tried simple test cases for -mavx targets with sizes 2, 4 and VF > 4 (short/char types).
> Looks like using vect_shift_permute_load_chain seems better.
>
> Should we change it to something like this ?
>
> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> index d0e20da..b0f0a02 100644
> --- a/gcc/tree-vect-data-refs.c
> +++ b/gcc/tree-vect-data-refs.c
> @@ -5733,9 +5733,9 @@ vect_transform_grouped_load (gimple *stmt, vec<tree> dr_chain, int size,
> get chain for loads group using vect_shift_permute_load_chain. */
> mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
> if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
> - || exact_log2 (size) != -1
> - || !vect_shift_permute_load_chain (dr_chain, size, stmt,
> - gsi, &result_chain))
> + || (!vect_shift_permute_load_chain (dr_chain, size, stmt,
> + gsi, &result_chain)
> + && exact_log2 (size) != -1))
Iff then the exact_log2 check should be simply dropped. It doesn't
make much sense with shift_permute_laod_chain
supporting power-of-two size.
Of course only benchmarking will tell ;)
Richard.
> vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
> vect_record_grouped_load_vectors (stmt, result_chain);
> result_chain.release ();
>
> regards,
> Venkat.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-01-13 13:55 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-12 14:51 [RFC] non-unit stride loads for size power of 2 Kumar, Venkataramanan
2016-01-13 13:55 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).