public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop
@ 2023-10-20 23:50 kamkaz at windowslive dot com
2023-10-20 23:53 ` [Bug c++/111905] " pinskia at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: kamkaz at windowslive dot com @ 2023-10-20 23:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905
Bug ID: 111905
Summary: -O3 vectorization terribly pessimizes the code for an
already unrolled loop
Product: gcc
Version: 13.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kamkaz at windowslive dot com
Target Milestone: ---
https://godbolt.org/z/rK4nEWovc
With -O2, the code is the way you'd expect it to be - with performance benefit
for the code "manually unrolled", that processes data in chunks (using
assumption that w > 16).
With -O3 however, auto-vecorization kicks in for the already unrolled loop,
with the results being abysmal, a lot of unnecessary checks and (what I assume
to be) dead code.
Clang doesn't seem to have this problem.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug c++/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
@ 2023-10-20 23:53 ` pinskia at gcc dot gnu.org
2023-10-21 0:08 ` [Bug target/111905] " pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-20 23:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 56163
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56163&action=edit
testcase
Please next time put the testcase inline or attach it rather than just link
godbolt.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
2023-10-20 23:53 ` [Bug c++/111905] " pinskia at gcc dot gnu.org
@ 2023-10-21 0:08 ` pinskia at gcc dot gnu.org
2023-10-21 10:24 ` kamkaz at windowslive dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-21 0:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Component|tree-optimization |target
Target| |x86_64-linux-gnu
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Why are you using `-mprefer-vector-width=512` here?
512 causes the loop to be needing to be unrolled once more and that is why the
confusion happening.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
2023-10-20 23:53 ` [Bug c++/111905] " pinskia at gcc dot gnu.org
2023-10-21 0:08 ` [Bug target/111905] " pinskia at gcc dot gnu.org
@ 2023-10-21 10:24 ` kamkaz at windowslive dot com
2023-10-21 10:37 ` kamkaz at windowslive dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: kamkaz at windowslive dot com @ 2023-10-21 10:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905
--- Comment #3 from Kamil Kaznowski <kamkaz at windowslive dot com> ---
Created attachment 56166
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56166&action=edit
a smaller example
A smaller example.
Compilation flags:
-O2 -march=x86-64-v3 -std=c++23
vs
-O3 -march=x86-64-v3 -std=c++23
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
` (2 preceding siblings ...)
2023-10-21 10:24 ` kamkaz at windowslive dot com
@ 2023-10-21 10:37 ` kamkaz at windowslive dot com
2023-10-23 9:14 ` rguenth at gcc dot gnu.org
2023-10-24 8:08 ` kamkaz at windowslive dot com
5 siblings, 0 replies; 7+ messages in thread
From: kamkaz at windowslive dot com @ 2023-10-21 10:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905
--- Comment #4 from Kamil Kaznowski <kamkaz at windowslive dot com> ---
(In reply to Andrew Pinski from comment #2)
> Why are you using `-mprefer-vector-width=512` here?
>
> 512 causes the loop to be needing to be unrolled once more and that is why
> the confusion happening.
I don't think the preferred vector width mattered there? The vector width used
was already 512 (zmm registers).
I created a simpler example, no extra flags affecting preferred vector width,
chunks of 256 bit (8*32bit). Still the same issue appears - a lot of
weirdly-unrolled code, that I presume (hope?) is dead + some extra code at the
start I can't quite figure out.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
` (3 preceding siblings ...)
2023-10-21 10:37 ` kamkaz at windowslive dot com
@ 2023-10-23 9:14 ` rguenth at gcc dot gnu.org
2023-10-24 8:08 ` kamkaz at windowslive dot com
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-23 9:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
For the original testcase and foo we do not perform extra unrolling during
vectorization - we just vectorize the already unrolled loop. bar isn't
unrolled so we do as part of vectorization.
With -fopt-info you see
t.C:6:26: optimized: loop with 16 iterations completely unrolled (header
execution count 63136016)
t.C:7:14: optimized: basic block part vectorized using 32 byte vectors
t.C:56:14: optimized: loop vectorized using 32 byte vectors
t.C:56:14: optimized: loop versioned for vectorization because of possible
aliasing
t.C:56:14: optimized: loop vectorized using 16 byte vectors
t.C:56:14: optimized: loop with 2 iterations completely unrolled (header
execution count 57270721)
t.C:51:6: optimized: loop turned into non-loop; it never loops
I'm also not seeing any "terrible" code?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/111905] -O3 vectorization terribly pessimizes the code for an already unrolled loop
2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
` (4 preceding siblings ...)
2023-10-23 9:14 ` rguenth at gcc dot gnu.org
@ 2023-10-24 8:08 ` kamkaz at windowslive dot com
5 siblings, 0 replies; 7+ messages in thread
From: kamkaz at windowslive dot com @ 2023-10-24 8:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111905
--- Comment #6 from Kamil Kaznowski <kamkaz at windowslive dot com> ---
<source>:42:13: optimized: loop versioned for vectorization because of
possible aliasing
That's exactly the issue here! There should be no versioning here. There is no
possible aliasing, and with -O2 it doesn't attempt to version the loop. That's
the entire point of the function with __restrict arguments (as well as trying
to suggest the optimization with assumes, which seem to be completely ignored
in this case). There is a lot of extra code generated for trying to choose
between the correct "version" based on the size of `w`. These paths are dead
code.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-10-24 8:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-20 23:50 [Bug c++/111905] New: -O3 vectorization terribly pessimizes the code for an already unrolled loop kamkaz at windowslive dot com
2023-10-20 23:53 ` [Bug c++/111905] " pinskia at gcc dot gnu.org
2023-10-21 0:08 ` [Bug target/111905] " pinskia at gcc dot gnu.org
2023-10-21 10:24 ` kamkaz at windowslive dot com
2023-10-21 10:37 ` kamkaz at windowslive dot com
2023-10-23 9:14 ` rguenth at gcc dot gnu.org
2023-10-24 8:08 ` kamkaz at windowslive dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).