public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/96208] New: non-power-of-2 group size can be vectorized for 2-element vectors case
@ 2020-07-15 15:01 dpochepk at gmail dot com
2020-07-17 6:44 ` [Bug tree-optimization/96208] " rguenth at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: dpochepk at gmail dot com @ 2020-07-15 15:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208
Bug ID: 96208
Summary: non-power-of-2 group size can be vectorized for
2-element vectors case
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: dpochepk at gmail dot com
Target Milestone: ---
Created attachment 48879
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48879&action=edit
initial implementation
Current loop vectorizer only vectorize loops with groups size being power-of-2
or 3 due to vector permutation generation algorithm specifics.
However, in case of 2-element vectors, simple permutation schema can be used to
support any group size: insert each vector element into required position,
which leads to reasonable amount of operations in case of 2-element vectors.
Initial version is attached.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/96208] non-power-of-2 group size can be vectorized for 2-element vectors case
2020-07-15 15:01 [Bug tree-optimization/96208] New: non-power-of-2 group size can be vectorized for 2-element vectors case dpochepk at gmail dot com
@ 2020-07-17 6:44 ` rguenth at gcc dot gnu.org
2023-06-21 13:39 ` [Bug tree-optimization/96208] non-grouped load can be SLP " rguenth at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-07-17 6:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
Version|unknown |11.0
Blocks| |53947
Keywords| |missed-optimization
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note the code path you are changing will go away and "improving" it puts burden
onto the replacement implementation ...
The testcase suggests the issue is missing SLP support for the not grouped
load of *k, something I've been looking at recently.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/96208] non-grouped load can be SLP vectorized for 2-element vectors case
2020-07-15 15:01 [Bug tree-optimization/96208] New: non-power-of-2 group size can be vectorized for 2-element vectors case dpochepk at gmail dot com
2020-07-17 6:44 ` [Bug tree-optimization/96208] " rguenth at gcc dot gnu.org
@ 2023-06-21 13:39 ` rguenth at gcc dot gnu.org
2023-06-22 12:51 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-21 13:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2023-06-21
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Summary|non-power-of-2 group size |non-grouped load can be SLP
|can be vectorized for |vectorized for 2-element
|2-element vectors case |vectors case
Status|UNCONFIRMED |ASSIGNED
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
typedef struct {
double m1, m2, m3, m4, m5;
} the_struct_t;
double bar1 (the_struct_t*);
double foo (double* k, unsigned int n, the_struct_t* the_struct)
{
unsigned int u;
the_struct_t result;
for (u=0; u < n; u++, k--) {
result.m1 += (*k)*the_struct[u].m1;
result.m2 += (*k)*the_struct[u].m2;
result.m3 += (*k)*the_struct[u].m3;
result.m4 += (*k)*the_struct[u].m4;
}
return bar1 (&result);
}
This has come up again as part of PR110062
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/96208] non-grouped load can be SLP vectorized for 2-element vectors case
2020-07-15 15:01 [Bug tree-optimization/96208] New: non-power-of-2 group size can be vectorized for 2-element vectors case dpochepk at gmail dot com
2020-07-17 6:44 ` [Bug tree-optimization/96208] " rguenth at gcc dot gnu.org
2023-06-21 13:39 ` [Bug tree-optimization/96208] non-grouped load can be SLP " rguenth at gcc dot gnu.org
@ 2023-06-22 12:51 ` rguenth at gcc dot gnu.org
2023-06-27 7:48 ` cvs-commit at gcc dot gnu.org
2023-06-27 7:49 ` rguenth at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-22 12:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Smaller testcase, avoiding reductions and negative step:
void
test(double * __restrict a, double *b, double *k, int n)
{
for (int i = 0; i < n; ++i)
{
a[2*i] = b[2*i] * k[ONE*i];
a[2*i + 1] = b[2*i + 1] * k[ONE*i];
}
}
this is vectorized with interleaving with -DONE=1 as SLP discovery fails:
t.c:4:21: missed: Build SLP failed: not grouped load _9 = *_8;
.L4:
movupd (%r8,%rax,2), %xmm1
movupd (%rdx,%rax), %xmm2
movupd 16(%r8,%rax,2), %xmm0
movlpd 8(%r8,%rax,2), %xmm0
movhpd 16(%r8,%rax,2), %xmm1
mulpd %xmm2, %xmm1
mulpd %xmm2, %xmm0
movapd %xmm1, %xmm2
unpcklpd %xmm0, %xmm2
unpckhpd %xmm0, %xmm1
movups %xmm2, (%rdi,%rax,2)
movups %xmm1, 16(%rdi,%rax,2)
addq $16, %rax
cmpq %rax, %rsi
jne .L4
but when we make the access to 'k' non-unit-stride with -DONE=2
SLP works fine
t.c:4:21: note: Vectorizing SLP tree:
t.c:4:21: note: node 0x48580d0 (max_nunits=2, refcnt=1) vector(2) double
t.c:4:21: note: op template: *_8 = _9;
t.c:4:21: note: stmt 0 *_8 = _9;
t.c:4:21: note: stmt 1 *_14 = _15;
t.c:4:21: note: children 0x4858158
t.c:4:21: note: node 0x4858158 (max_nunits=2, refcnt=1) vector(2) double
t.c:4:21: note: op template: _9 = _5 * _7;
t.c:4:21: note: stmt 0 _9 = _5 * _7;
t.c:4:21: note: stmt 1 _15 = _7 * _13;
t.c:4:21: note: children 0x4858268 0x48582f0
t.c:4:21: note: node 0x4858268 (max_nunits=2, refcnt=1) vector(2) double
t.c:4:21: note: op template: _5 = *_4;
t.c:4:21: note: stmt 0 _5 = *_4;
t.c:4:21: note: stmt 1 _13 = *_12;
t.c:4:21: note: node 0x48582f0 (max_nunits=2, refcnt=1) vector(2) double
t.c:4:21: note: op template: _7 = *_6;
t.c:4:21: note: stmt 0 _7 = *_6;
t.c:4:21: note: stmt 1 _7 = *_6;
t.c:4:21: note: load permutation { 0 0 }
and we get
.L4:
movupd (%rdx,%rax), %xmm0
movupd (%rsi,%rax,2), %xmm2
unpcklpd %xmm0, %xmm0
mulpd %xmm2, %xmm0
movups %xmm0, (%rdi,%rax,2)
addq $8, %rax
cmpq %rax, %r8
jne .L4
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/96208] non-grouped load can be SLP vectorized for 2-element vectors case
2020-07-15 15:01 [Bug tree-optimization/96208] New: non-power-of-2 group size can be vectorized for 2-element vectors case dpochepk at gmail dot com
` (2 preceding siblings ...)
2023-06-22 12:51 ` rguenth at gcc dot gnu.org
@ 2023-06-27 7:48 ` cvs-commit at gcc dot gnu.org
2023-06-27 7:49 ` rguenth at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-27 7:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:dd86a5a69cbda40cf76388a65d3317c91cb2b501
commit r14-2117-gdd86a5a69cbda40cf76388a65d3317c91cb2b501
Author: Richard Biener <rguenther@suse.de>
Date: Thu Jun 22 11:40:46 2023 +0200
tree-optimization/96208 - SLP of non-grouped loads
The following extends SLP discovery to handle non-grouped loads
in loop vectorization in the case the same load appears in all
lanes.
Code generation is adjusted to mimick what we do for the case
of single element interleaving (when the load is not unit-stride)
which is already handled by SLP. There are some limits we
run into because peeling for gap cannot cover all cases and
we choose VMAT_CONTIGUOUS. The patch does not try to address
these issues yet.
The main obstacle is that these loads are not
STMT_VINFO_GROUPED_ACCESS and that's a new thing with SLP.
I know from the past that it's not a good idea to make them
grouped. Instead the following massages places to deal
with SLP loads that are not STMT_VINFO_GROUPED_ACCESS.
There's already a testcase testing for the case the PR
is after, just XFAILed, the following adjusts that instead
of adding another.
I do expect to have missed some so I don't plan to push this
on a Friday. Still there may be feedback, so posting this
now.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
PR tree-optimization/96208
* tree-vect-slp.cc (vect_build_slp_tree_1): Allow
a non-grouped load if it is the same for all lanes.
(vect_build_slp_tree_2): Handle not grouped loads.
(vect_optimize_slp_pass::remove_redundant_permutations):
Likewise.
(vect_transform_slp_perm_load_1): Likewise.
* tree-vect-stmts.cc (vect_model_load_cost): Likewise.
(get_group_load_store_type): Likewise. Handle
invariant accesses.
(vectorizable_load): Likewise.
* gcc.dg/vect/slp-46.c: Adjust for new vectorizations.
* gcc.dg/vect/bb-slp-pr65935.c: Adjust.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/96208] non-grouped load can be SLP vectorized for 2-element vectors case
2020-07-15 15:01 [Bug tree-optimization/96208] New: non-power-of-2 group size can be vectorized for 2-element vectors case dpochepk at gmail dot com
` (3 preceding siblings ...)
2023-06-27 7:48 ` cvs-commit at gcc dot gnu.org
@ 2023-06-27 7:49 ` rguenth at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-27 7:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96208
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-06-27 7:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 15:01 [Bug tree-optimization/96208] New: non-power-of-2 group size can be vectorized for 2-element vectors case dpochepk at gmail dot com
2020-07-17 6:44 ` [Bug tree-optimization/96208] " rguenth at gcc dot gnu.org
2023-06-21 13:39 ` [Bug tree-optimization/96208] non-grouped load can be SLP " rguenth at gcc dot gnu.org
2023-06-22 12:51 ` rguenth at gcc dot gnu.org
2023-06-27 7:48 ` cvs-commit at gcc dot gnu.org
2023-06-27 7:49 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).