public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper()
@ 2021-03-12 16:36 andysem at mail dot ru
2021-03-12 17:39 ` [Bug target/99563] [10/11 Regression] " jakub at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: andysem at mail dot ru @ 2021-03-12 16:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
Bug ID: 99563
Summary: Code miscompilation caused by _mm256_zeroupper()
Product: gcc
Version: 10.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: andysem at mail dot ru
Target Milestone: ---
Consider the following code:
#include <immintrin.h>
constexpr unsigned int block_size = 8u;
float compute_generic(const double* data, unsigned int width, unsigned int
height);
inline __attribute__((always_inline))
float compute_avx(const double* data, unsigned int width, unsigned int height)
{
__m128d mm_res = _mm_setzero_pd();
unsigned long block_count = static_cast< unsigned long >((width +
block_size - 1) / block_size)
* static_cast< unsigned long >((height + block_size - 1) / block_size);
float res = static_cast< float >(_mm_cvtsd_f64(mm_res) / static_cast<
double >(block_count));
_mm256_zeroupper();
return res;
}
float compute(const double* data, unsigned int width, unsigned int height)
{
if (width >= 16 && height >= block_size)
{
return compute_avx(data, width, height);
}
else
{
return compute_generic(data, width, height);
}
}
$ g++ -O2 -march=sandybridge -mno-vzeroupper -o test.o test.cpp
https://gcc.godbolt.org/z/dhr7an
The code compiles to:
compute(double const*, unsigned int, unsigned int):
cmp esi, 15
jbe .L2
cmp edx, 7
jbe .L2
vzeroupper
ret
.L2:
jmp compute_generic(double const*, unsigned int, unsigned int)
which leaves the result of compute() uninitialized if AVX path is taken. The
problem disappears if one of the following is done:
- -O2 is replaced with -O1
- -mno-vzeroupper is removed
- _mm256_zeroupper(); call is removed (the upper bits of vector registers is
left dirty, though)
This is a regression in gcc 10 branch and later, gcc 9.x compiles this
correctly.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/99563] [10/11 Regression] Code miscompilation caused by _mm256_zeroupper()
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
@ 2021-03-12 17:39 ` jakub at gcc dot gnu.org
2021-03-15 12:51 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-03-12 17:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
Priority|P3 |P2
Summary|Code miscompilation caused |[10/11 Regression] Code
|by _mm256_zeroupper() |miscompilation caused by
| |_mm256_zeroupper()
Target Milestone|--- |10.3
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Started with my r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 change, will
have a look next week.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/99563] [10/11 Regression] Code miscompilation caused by _mm256_zeroupper()
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
2021-03-12 17:39 ` [Bug target/99563] [10/11 Regression] " jakub at gcc dot gnu.org
@ 2021-03-15 12:51 ` jakub at gcc dot gnu.org
2021-03-16 10:17 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-03-15 12:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
Ever confirmed|0 |1
Last reconfirmed| |2021-03-15
Status|UNCONFIRMED |ASSIGNED
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 50388
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50388&action=edit
gcc11-pr99563.patch
Untested fix.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/99563] [10/11 Regression] Code miscompilation caused by _mm256_zeroupper()
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
2021-03-12 17:39 ` [Bug target/99563] [10/11 Regression] " jakub at gcc dot gnu.org
2021-03-15 12:51 ` jakub at gcc dot gnu.org
@ 2021-03-16 10:17 ` cvs-commit at gcc dot gnu.org
2021-03-16 19:26 ` [Bug target/99563] [10 " jakub at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-03-16 10:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:82085eb3d44833bd1557fdd932c4738d987f559d
commit r11-7684-g82085eb3d44833bd1557fdd932c4738d987f559d
Author: Jakub Jelinek <jakub@redhat.com>
Date: Tue Mar 16 11:16:15 2021 +0100
i386: Fix up _mm256_vzeroupper() handling [PR99563]
My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for
vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling
when the implicit vzeroupper handling is disabled.
The epilogue_completed splitter for vzeroupper now adds clobbers for all
registers which don't have explicit sets in the pattern and the sets are
added during vzeroupper pass. Before my changes, for explicit user
vzeroupper, we just weren't modelling its effects at all, it was just
unspec that didn't tell that it clobbers the upper parts of all XMM <
%xmm16
registers. But now the splitter will even for those add clobbers and as
it has no sets, it will add clobbers for all registers, which means
we optimize away anything that lived across that vzeroupper.
The vzeroupper pass has two parts, one is the mode switching that computes
where to put the implicit vzeroupper calls and puts them there, and then
another that uses df to figure out what sets to add to all the vzeroupper.
The former part should be done only under the conditions we have in the
gate, but the latter as this PR shows needs to happen either if we perform
the implicit vzeroupper additions, or if there are (or could be) any
explicit vzeroupper instructions. As that function does df_analyze and
walks the whole IL, I think it would be too expensive to run it always
whenever TARGET_AVX, so this patch remembers if we've expanded at least
one __builtin_ia32_vzeroupper in the function and runs that part of the
vzeroupper pass both when the old condition is true or when this new
flag is set.
2021-03-16 Jakub Jelinek <jakub@redhat.com>
PR target/99563
* config/i386/i386.h (struct machine_function): Add
has_explicit_vzeroupper bitfield.
* config/i386/i386-expand.c (ix86_expand_builtin): Set
cfun->machine->has_explicit_vzeroupper when expanding
IX86_BUILTIN_VZEROUPPER.
* config/i386/i386-features.c (rest_of_handle_insert_vzeroupper):
Do the mode switching only when TARGET_VZEROUPPER, expensive
optimizations turned on and not optimizing for size.
(pass_insert_vzeroupper::gate): Enable even when
cfun->machine->has_explicit_vzeroupper is set.
* gcc.target/i386/avx-pr99563.c: New test.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
` (2 preceding siblings ...)
2021-03-16 10:17 ` cvs-commit at gcc dot gnu.org
@ 2021-03-16 19:26 ` jakub at gcc dot gnu.org
2021-03-16 20:12 ` andysem at mail dot ru
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-03-16 19:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[10/11 Regression] Code |[10 Regression] Code
|miscompilation caused by |miscompilation caused by
|_mm256_zeroupper() |_mm256_zeroupper()
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed on the trunk so far.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
` (3 preceding siblings ...)
2021-03-16 19:26 ` [Bug target/99563] [10 " jakub at gcc dot gnu.org
@ 2021-03-16 20:12 ` andysem at mail dot ru
2021-03-19 23:30 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: andysem at mail dot ru @ 2021-03-16 20:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
--- Comment #5 from andysem at mail dot ru ---
Thanks. Will there be a fix in branch 10?
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
` (4 preceding siblings ...)
2021-03-16 20:12 ` andysem at mail dot ru
@ 2021-03-19 23:30 ` cvs-commit at gcc dot gnu.org
2021-03-20 8:10 ` jakub at gcc dot gnu.org
2021-03-20 8:29 ` andysem at mail dot ru
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-03-19 23:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:788da80413551fe1a1411c700864640b590dcfc5
commit r10-9486-g788da80413551fe1a1411c700864640b590dcfc5
Author: Jakub Jelinek <jakub@redhat.com>
Date: Tue Mar 16 11:16:15 2021 +0100
i386: Fix up _mm256_vzeroupper() handling [PR99563]
My r10-6451-gb7b3378f91c0641f2ef4d88db22af62a571c9359 fix for
vzeroupper vs. ms ABI apparently broke the explicit vzeroupper handling
when the implicit vzeroupper handling is disabled.
The epilogue_completed splitter for vzeroupper now adds clobbers for all
registers which don't have explicit sets in the pattern and the sets are
added during vzeroupper pass. Before my changes, for explicit user
vzeroupper, we just weren't modelling its effects at all, it was just
unspec that didn't tell that it clobbers the upper parts of all XMM <
%xmm16
registers. But now the splitter will even for those add clobbers and as
it has no sets, it will add clobbers for all registers, which means
we optimize away anything that lived across that vzeroupper.
The vzeroupper pass has two parts, one is the mode switching that computes
where to put the implicit vzeroupper calls and puts them there, and then
another that uses df to figure out what sets to add to all the vzeroupper.
The former part should be done only under the conditions we have in the
gate, but the latter as this PR shows needs to happen either if we perform
the implicit vzeroupper additions, or if there are (or could be) any
explicit vzeroupper instructions. As that function does df_analyze and
walks the whole IL, I think it would be too expensive to run it always
whenever TARGET_AVX, so this patch remembers if we've expanded at least
one __builtin_ia32_vzeroupper in the function and runs that part of the
vzeroupper pass both when the old condition is true or when this new
flag is set.
2021-03-16 Jakub Jelinek <jakub@redhat.com>
PR target/99563
* config/i386/i386.h (struct machine_function): Add
has_explicit_vzeroupper bitfield.
* config/i386/i386-expand.c (ix86_expand_builtin): Set
cfun->machine->has_explicit_vzeroupper when expanding
IX86_BUILTIN_VZEROUPPER.
* config/i386/i386-features.c (rest_of_handle_insert_vzeroupper):
Do the mode switching only when TARGET_VZEROUPPER, expensive
optimizations turned on and not optimizing for size.
(pass_insert_vzeroupper::gate): Enable even when
cfun->machine->has_explicit_vzeroupper is set.
* gcc.target/i386/avx-pr99563.c: New test.
(cherry picked from commit 82085eb3d44833bd1557fdd932c4738d987f559d)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
` (5 preceding siblings ...)
2021-03-19 23:30 ` cvs-commit at gcc dot gnu.org
@ 2021-03-20 8:10 ` jakub at gcc dot gnu.org
2021-03-20 8:29 ` andysem at mail dot ru
7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-03-20 8:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for 10.3 too.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/99563] [10 Regression] Code miscompilation caused by _mm256_zeroupper()
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
` (6 preceding siblings ...)
2021-03-20 8:10 ` jakub at gcc dot gnu.org
@ 2021-03-20 8:29 ` andysem at mail dot ru
7 siblings, 0 replies; 9+ messages in thread
From: andysem at mail dot ru @ 2021-03-20 8:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99563
--- Comment #8 from andysem at mail dot ru ---
Thanks again.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-03-20 8:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 16:36 [Bug target/99563] New: Code miscompilation caused by _mm256_zeroupper() andysem at mail dot ru
2021-03-12 17:39 ` [Bug target/99563] [10/11 Regression] " jakub at gcc dot gnu.org
2021-03-15 12:51 ` jakub at gcc dot gnu.org
2021-03-16 10:17 ` cvs-commit at gcc dot gnu.org
2021-03-16 19:26 ` [Bug target/99563] [10 " jakub at gcc dot gnu.org
2021-03-16 20:12 ` andysem at mail dot ru
2021-03-19 23:30 ` cvs-commit at gcc dot gnu.org
2021-03-20 8:10 ` jakub at gcc dot gnu.org
2021-03-20 8:29 ` andysem at mail dot ru
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).