[Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization
@ 2022-07-29 11:43 christophm30 at gmail dot com
  2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: christophm30 at gmail dot com @ 2022-07-29 11:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475

            Bug ID: 106475
           Summary: Loop vectorizer prevents vectorization
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: christophm30 at gmail dot com
  Target Milestone: ---

Inspired by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106352 I've tested 
GCC's behaviour after adding the restrict keyword as advised there.
This results in the following code:

```
#include <inttypes.h>
void
foo (uint8_t *restrict dst, int i_dst_stride,
     uint8_t *src1, int i_src1_stride,
     uint8_t *src2, int i_src2_stride,
     int i_height)
{
    for (int y = 0; y < i_height; y++)
      {
        for( int x = 0; x < 8; x++ )
          dst[x] = (src1[x] + src2[x] + 1);
        dst  += i_dst_stride;
        src1 += i_src1_stride;
        src2 += i_src2_stride;
      }
}
```

The issue is now, that this only gets vectorized, if we pass
`-O3 -fno-tree-loop-vectorize`, i.e. disable the loop vectorizer.

Obviously, what helps for this function is not necessarily beneficial
for the rest of the program. So a solution that does not need to disable
the loop vectorization to generate faster code would be preferred.

I have not found a GCC version that can do this, so this is not a
regression, but a limitation. I also have not found a similar ticket,
but I suspect this to be somehow a known issue.

Are there any ideas to improve this?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization
  2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com
@ 2022-07-29 12:36 ` rguenth at gcc dot gnu.org
  2022-07-29 13:38 ` christophm30 at gmail dot com
  2022-07-29 13:50 ` christophm30 at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-29 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The loop seems to be vectorized just fine?  The issue is just that we need a
runtime alias check because of the variable stride and the fact that we need a
VF of two to fill up to 16 byte vectors:

.L5:
        movq    (%rcx), %xmm1
        movq    (%rdx), %xmm0
        addl    $1, %esi
        movhps  (%rcx,%r10), %xmm1
        movhps  (%rdx,%r9), %xmm0
        addq    %r14, %rcx
        addq    %r13, %rdx
        paddb   %xmm1, %xmm0
        paddb   %xmm2, %xmm0
        movq    %xmm0, (%rax)
        movhps  %xmm0, (%rax,%rdi)
        addq    %r12, %rax
        cmpl    %esi, %r15d
        jne     .L5
        movl    %ebp, %eax
        andl    $-2, %eax
        andl    $1, %ebp
        je      .L1
.L4:
        imulq   %rax, %r10
        imulq   %rax, %r9
        imulq   %rdi, %rax
        movq    (%rbx,%r10), %xmm0
        movq    (%r8,%r9), %xmm1
        paddb   %xmm1, %xmm0
        movq    .LC1(%rip), %xmm1
        paddb   %xmm1, %xmm0
        movq    %xmm0, (%r11,%rax)

yes, the BB vectorization result is smaller but only uses half of a vector:

.L3:
        movq    (%r8), %xmm0
        movq    (%rdx), %xmm1
        addl    $1, %ecx
        addq    %rdi, %rdx
        addq    %rsi, %r8
        paddb   %xmm1, %xmm0
        paddb   %xmm2, %xmm0
        movq    %xmm0, (%rax)
        addq    %r10, %rax
        cmpl    %ecx, %r11d
        jne     .L3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization
  2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com
  2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org
@ 2022-07-29 13:38 ` christophm30 at gmail dot com
  2022-07-29 13:50 ` christophm30 at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: christophm30 at gmail dot com @ 2022-07-29 13:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475

--- Comment #2 from Christoph Müllner <christophm30 at gmail dot com> ---
Yes, you are right!
I haven't noticed that the longer sequence requires only half of the loop
iterations when compared to the shorter sequence.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/106475] Loop vectorizer prevents vectorization
  2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com
  2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org
  2022-07-29 13:38 ` christophm30 at gmail dot com
@ 2022-07-29 13:50 ` christophm30 at gmail dot com
  2 siblings, 0 replies; 4+ messages in thread
From: christophm30 at gmail dot com @ 2022-07-29 13:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475

Christoph Müllner <christophm30 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #3 from Christoph Müllner <christophm30 at gmail dot com> ---
Closing as invalid.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-29 13:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-29 11:43 [Bug tree-optimization/106475] New: Loop vectorizer prevents vectorization christophm30 at gmail dot com
2022-07-29 12:36 ` [Bug tree-optimization/106475] " rguenth at gcc dot gnu.org
2022-07-29 13:38 ` christophm30 at gmail dot com
2022-07-29 13:50 ` christophm30 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).