public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization @ 2022-07-29 11:43 christophm30 at gmail dot com 2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: christophm30 at gmail dot com @ 2022-07-29 11:43 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475 Bug ID: 106475 Summary: Loop vectorizer prevents vectorization Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: christophm30 at gmail dot com Target Milestone: --- Inspired by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106352 I've tested GCC's behaviour after adding the restrict keyword as advised there. This results in the following code: ``` #include <inttypes.h> void foo (uint8_t *restrict dst, int i_dst_stride, uint8_t *src1, int i_src1_stride, uint8_t *src2, int i_src2_stride, int i_height) { for (int y = 0; y < i_height; y++) { for( int x = 0; x < 8; x++ ) dst[x] = (src1[x] + src2[x] + 1); dst += i_dst_stride; src1 += i_src1_stride; src2 += i_src2_stride; } } ``` The issue is now, that this only gets vectorized, if we pass `-O3 -fno-tree-loop-vectorize`, i.e. disable the loop vectorizer. Obviously, what helps for this function is not necessarily beneficial for the rest of the program. So a solution that does not need to disable the loop vectorization to generate faster code would be preferred. I have not found a GCC version that can do this, so this is not a regression, but a limitation. I also have not found a similar ticket, but I suspect this to be somehow a known issue. Are there any ideas to improve this? ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization 2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com @ 2022-07-29 12:36 ` rguenth at gcc dot gnu.org 2022-07-29 13:38 ` christophm30 at gmail dot com 2022-07-29 13:50 ` christophm30 at gmail dot com 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2022-07-29 12:36 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- The loop seems to be vectorized just fine? The issue is just that we need a runtime alias check because of the variable stride and the fact that we need a VF of two to fill up to 16 byte vectors: .L5: movq (%rcx), %xmm1 movq (%rdx), %xmm0 addl $1, %esi movhps (%rcx,%r10), %xmm1 movhps (%rdx,%r9), %xmm0 addq %r14, %rcx addq %r13, %rdx paddb %xmm1, %xmm0 paddb %xmm2, %xmm0 movq %xmm0, (%rax) movhps %xmm0, (%rax,%rdi) addq %r12, %rax cmpl %esi, %r15d jne .L5 movl %ebp, %eax andl $-2, %eax andl $1, %ebp je .L1 .L4: imulq %rax, %r10 imulq %rax, %r9 imulq %rdi, %rax movq (%rbx,%r10), %xmm0 movq (%r8,%r9), %xmm1 paddb %xmm1, %xmm0 movq .LC1(%rip), %xmm1 paddb %xmm1, %xmm0 movq %xmm0, (%r11,%rax) yes, the BB vectorization result is smaller but only uses half of a vector: .L3: movq (%r8), %xmm0 movq (%rdx), %xmm1 addl $1, %ecx addq %rdi, %rdx addq %rsi, %r8 paddb %xmm1, %xmm0 paddb %xmm2, %xmm0 movq %xmm0, (%rax) addq %r10, %rax cmpl %ecx, %r11d jne .L3 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization 2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com 2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org @ 2022-07-29 13:38 ` christophm30 at gmail dot com 2022-07-29 13:50 ` christophm30 at gmail dot com 2 siblings, 0 replies; 4+ messages in thread From: christophm30 at gmail dot com @ 2022-07-29 13:38 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475 --- Comment #2 from Christoph Müllner <christophm30 at gmail dot com> --- Yes, you are right! I haven't noticed that the longer sequence requires only half of the loop iterations when compared to the shorter sequence. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization 2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com 2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org 2022-07-29 13:38 ` christophm30 at gmail dot com @ 2022-07-29 13:50 ` christophm30 at gmail dot com 2 siblings, 0 replies; 4+ messages in thread From: christophm30 at gmail dot com @ 2022-07-29 13:50 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475 Christoph Müllner <christophm30 at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #3 from Christoph Müllner <christophm30 at gmail dot com> --- Closing as invalid. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-07-29 13:50 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com 2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org 2022-07-29 13:38 ` christophm30 at gmail dot com 2022-07-29 13:50 ` christophm30 at gmail dot com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).