public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization
@ 2022-07-29 11:43 christophm30 at gmail dot com
2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: christophm30 at gmail dot com @ 2022-07-29 11:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475
Bug ID: 106475
Summary: Loop vectorizer prevents vectorization
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: christophm30 at gmail dot com
Target Milestone: ---
Inspired by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106352 I've tested
GCC's behaviour after adding the restrict keyword as advised there.
This results in the following code:
```
#include <inttypes.h>
void
foo (uint8_t *restrict dst, int i_dst_stride,
uint8_t *src1, int i_src1_stride,
uint8_t *src2, int i_src2_stride,
int i_height)
{
for (int y = 0; y < i_height; y++)
{
for( int x = 0; x < 8; x++ )
dst[x] = (src1[x] + src2[x] + 1);
dst += i_dst_stride;
src1 += i_src1_stride;
src2 += i_src2_stride;
}
}
```
The issue is now, that this only gets vectorized, if we pass
`-O3 -fno-tree-loop-vectorize`, i.e. disable the loop vectorizer.
Obviously, what helps for this function is not necessarily beneficial
for the rest of the program. So a solution that does not need to disable
the loop vectorization to generate faster code would be preferred.
I have not found a GCC version that can do this, so this is not a
regression, but a limitation. I also have not found a similar ticket,
but I suspect this to be somehow a known issue.
Are there any ideas to improve this?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization
2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com
@ 2022-07-29 12:36 ` rguenth at gcc dot gnu.org
2022-07-29 13:38 ` christophm30 at gmail dot com
2022-07-29 13:50 ` christophm30 at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-29 12:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The loop seems to be vectorized just fine? The issue is just that we need a
runtime alias check because of the variable stride and the fact that we need a
VF of two to fill up to 16 byte vectors:
.L5:
movq (%rcx), %xmm1
movq (%rdx), %xmm0
addl $1, %esi
movhps (%rcx,%r10), %xmm1
movhps (%rdx,%r9), %xmm0
addq %r14, %rcx
addq %r13, %rdx
paddb %xmm1, %xmm0
paddb %xmm2, %xmm0
movq %xmm0, (%rax)
movhps %xmm0, (%rax,%rdi)
addq %r12, %rax
cmpl %esi, %r15d
jne .L5
movl %ebp, %eax
andl $-2, %eax
andl $1, %ebp
je .L1
.L4:
imulq %rax, %r10
imulq %rax, %r9
imulq %rdi, %rax
movq (%rbx,%r10), %xmm0
movq (%r8,%r9), %xmm1
paddb %xmm1, %xmm0
movq .LC1(%rip), %xmm1
paddb %xmm1, %xmm0
movq %xmm0, (%r11,%rax)
yes, the BB vectorization result is smaller but only uses half of a vector:
.L3:
movq (%r8), %xmm0
movq (%rdx), %xmm1
addl $1, %ecx
addq %rdi, %rdx
addq %rsi, %r8
paddb %xmm1, %xmm0
paddb %xmm2, %xmm0
movq %xmm0, (%rax)
addq %r10, %rax
cmpl %ecx, %r11d
jne .L3
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization
2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com
2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org
@ 2022-07-29 13:38 ` christophm30 at gmail dot com
2022-07-29 13:50 ` christophm30 at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: christophm30 at gmail dot com @ 2022-07-29 13:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475
--- Comment #2 from Christoph Müllner <christophm30 at gmail dot com> ---
Yes, you are right!
I haven't noticed that the longer sequence requires only half of the loop
iterations when compared to the shorter sequence.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization
2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com
2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org
2022-07-29 13:38 ` christophm30 at gmail dot com
@ 2022-07-29 13:50 ` christophm30 at gmail dot com
2 siblings, 0 replies; 4+ messages in thread
From: christophm30 at gmail dot com @ 2022-07-29 13:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475
Christoph Müllner <christophm30 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |INVALID
--- Comment #3 from Christoph Müllner <christophm30 at gmail dot com> ---
Closing as invalid.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-07-29 13:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com
2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org
2022-07-29 13:38 ` christophm30 at gmail dot com
2022-07-29 13:50 ` christophm30 at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).