public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop
@ 2023-10-20 23:50 kamkaz at windowslive dot com
  2023-10-20 23:53 ` [Bug c++/111905] " pinskia at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: kamkaz at windowslive dot com @ 2023-10-20 23:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905

            Bug ID: 111905
           Summary: -O3 vectorization terribly pessimizes the code for an
                    already unrolled loop
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kamkaz at windowslive dot com
  Target Milestone: ---

https://godbolt.org/z/rK4nEWovc

With -O2, the code is the way you'd expect it to be - with performance benefit
for the code "manually unrolled", that processes data in chunks (using
assumption that w > 16).

With -O3 however, auto-vecorization kicks in for the already unrolled loop,
with the results being abysmal, a lot of unnecessary checks and (what I assume
to be) dead code.

Clang doesn't seem to have this problem.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug c++/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
  2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
@ 2023-10-20 23:53 ` pinskia at gcc dot gnu.org
  2023-10-21  0:08 ` [Bug target/111905] " pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-20 23:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 56163
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56163&action=edit
testcase

Please next time put the testcase inline or attach it rather than just link
godbolt.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
  2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
  2023-10-20 23:53 ` [Bug c++/111905] " pinskia at gcc dot gnu.org
@ 2023-10-21  0:08 ` pinskia at gcc dot gnu.org
  2023-10-21 10:24 ` kamkaz at windowslive dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-21  0:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|tree-optimization           |target
             Target|                            |x86_64-linux-gnu

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Why are you using `-mprefer-vector-width=512` here?

512 causes the loop to be needing to be unrolled once more and that is why the
confusion happening.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
  2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
  2023-10-20 23:53 ` [Bug c++/111905] " pinskia at gcc dot gnu.org
  2023-10-21  0:08 ` [Bug target/111905] " pinskia at gcc dot gnu.org
@ 2023-10-21 10:24 ` kamkaz at windowslive dot com
  2023-10-21 10:37 ` kamkaz at windowslive dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: kamkaz at windowslive dot com @ 2023-10-21 10:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905

--- Comment #3 from Kamil Kaznowski <kamkaz at windowslive dot com> ---
Created attachment 56166
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56166&action=edit
a smaller example

A smaller example.
Compilation flags:
-O2 -march=x86-64-v3 -std=c++23
vs
-O3 -march=x86-64-v3 -std=c++23

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
  2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
                   ` (2 preceding siblings ...)
  2023-10-21 10:24 ` kamkaz at windowslive dot com
@ 2023-10-21 10:37 ` kamkaz at windowslive dot com
  2023-10-23  9:14 ` rguenth at gcc dot gnu.org
  2023-10-24  8:08 ` kamkaz at windowslive dot com
  5 siblings, 0 replies; 7+ messages in thread
From: kamkaz at windowslive dot com @ 2023-10-21 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905

--- Comment #4 from Kamil Kaznowski <kamkaz at windowslive dot com> ---
(In reply to Andrew Pinski from comment #2)
> Why are you using `-mprefer-vector-width=512` here?
> 
> 512 causes the loop to be needing to be unrolled once more and that is why
> the confusion happening.

I don't think the preferred vector width mattered there? The vector width used
was already 512 (zmm registers).

I created a simpler example, no extra flags affecting preferred vector width,
chunks of 256 bit (8*32bit). Still the same issue appears - a lot of
weirdly-unrolled code, that I presume (hope?) is dead + some extra code at the
start I can't quite figure out.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
  2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
                   ` (3 preceding siblings ...)
  2023-10-21 10:37 ` kamkaz at windowslive dot com
@ 2023-10-23  9:14 ` rguenth at gcc dot gnu.org
  2023-10-24  8:08 ` kamkaz at windowslive dot com
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-23  9:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
For the original testcase and foo we do not perform extra unrolling during
vectorization - we just vectorize the already unrolled loop.  bar isn't
unrolled so we do as part of vectorization.

With -fopt-info you see

t.C:6:26: optimized: loop with 16 iterations completely unrolled (header
execution count 63136016)
t.C:7:14: optimized: basic block part vectorized using 32 byte vectors
t.C:56:14: optimized: loop vectorized using 32 byte vectors
t.C:56:14: optimized:  loop versioned for vectorization because of possible
aliasing
t.C:56:14: optimized: loop vectorized using 16 byte vectors
t.C:56:14: optimized: loop with 2 iterations completely unrolled (header
execution count 57270721)
t.C:51:6: optimized: loop turned into non-loop; it never loops

I'm also not seeing any "terrible" code?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
  2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
                   ` (4 preceding siblings ...)
  2023-10-23  9:14 ` rguenth at gcc dot gnu.org
@ 2023-10-24  8:08 ` kamkaz at windowslive dot com
  5 siblings, 0 replies; 7+ messages in thread
From: kamkaz at windowslive dot com @ 2023-10-24  8:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905

--- Comment #6 from Kamil Kaznowski <kamkaz at windowslive dot com> ---
<source>:42:13: optimized:  loop versioned for vectorization because of
possible aliasing

That's exactly the issue here! There should be no versioning here. There is no
possible aliasing, and with -O2 it doesn't attempt to version the loop. That's
the entire point of the function with __restrict arguments (as well as trying
to suggest the optimization with assumes, which seem to be completely ignored
in this case). There is a lot of extra code generated for trying to choose
between the correct "version" based on the size of `w`. These paths are dead
code.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-10-24  8:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
2023-10-20 23:53 ` [Bug c++/111905] " pinskia at gcc dot gnu.org
2023-10-21  0:08 ` [Bug target/111905] " pinskia at gcc dot gnu.org
2023-10-21 10:24 ` kamkaz at windowslive dot com
2023-10-21 10:37 ` kamkaz at windowslive dot com
2023-10-23  9:14 ` rguenth at gcc dot gnu.org
2023-10-24  8:08 ` kamkaz at windowslive dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).