public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero
@ 2021-07-14 22:39 hjl.tools at gmail dot com
2021-07-14 22:59 ` [Bug target/101456] " arjan at linux dot intel.com
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-14 22:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
Bug ID: 101456
Summary: Unnecessary vzeroupper when upper bits of YMM
registers already zero
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com
Target Milestone: ---
Target: i386, x86-64
Unnecessary vzeroupper:
[hjl@gnu-cfl-2 tmp]$ cat x.c
#include <x86intrin.h>
extern __m256d x;
void
foo (void)
{
x = _mm256_setzero_pd ();
}
[hjl@gnu-cfl-2 tmp]$ gcc -S -O2 x.c -mavx2
c[hjl@gnu-cfl-2 tmp]$ cat x.s
.file "x.c"
.text
.p2align 4
.globl foo
.type foo, @function
foo:
.LFB5667:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
vxorpd %xmm0, %xmm0, %xmm0
vmovapd %ymm0, x(%rip)
movq %rsp, %rbp
.cfi_def_cfa_register 6
vzeroupper <<<<<< Not needed since upper bits of YMM0 are zero.
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE5667:
.size foo, .-foo
.ident "GCC: (GNU) 11.1.1 20210531 (Red Hat 11.1.1-3)"
.section .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 tmp]$
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
@ 2021-07-14 22:59 ` arjan at linux dot intel.com
2021-07-15 0:04 ` hjl.tools at gmail dot com
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: arjan at linux dot intel.com @ 2021-07-14 22:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
Arjan van de Ven <arjan at linux dot intel.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |arjan at linux dot intel.com
--- Comment #1 from Arjan van de Ven <arjan at linux dot intel.com> ---
Actually it's not that they're zero (they are) but they're in "init" state
since the vpxor wrote to xmm not ymm
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
2021-07-14 22:59 ` [Bug target/101456] " arjan at linux dot intel.com
@ 2021-07-15 0:04 ` hjl.tools at gmail dot com
2021-07-15 0:06 ` hjl.tools at gmail dot com
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-15 0:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 51153
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51153&action=edit
A patch
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
2021-07-14 22:59 ` [Bug target/101456] " arjan at linux dot intel.com
2021-07-15 0:04 ` hjl.tools at gmail dot com
@ 2021-07-15 0:06 ` hjl.tools at gmail dot com
2021-07-15 14:32 ` hjl.tools at gmail dot com
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-15 0:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Arjan van de Ven from comment #1)
> Actually it's not that they're zero (they are) but they're in "init" state
> since the vpxor wrote to xmm not ymm
We generate:
vxorpd %xmm0, %xmm0, %xmm0 # 5 [c=4 l=4] movv4df_internal/0
to zero all bits in YMM and ZMM registers.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
` (2 preceding siblings ...)
2021-07-15 0:06 ` hjl.tools at gmail dot com
@ 2021-07-15 14:32 ` hjl.tools at gmail dot com
2021-07-16 12:18 ` hjl.tools at gmail dot com
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-15 14:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #51153|0 |1
is obsolete| |
--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 51157
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51157&action=edit
A new patch
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
` (3 preceding siblings ...)
2021-07-15 14:32 ` hjl.tools at gmail dot com
@ 2021-07-16 12:18 ` hjl.tools at gmail dot com
2021-07-28 14:29 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-16 12:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> ---
We need to verify that LOADING the zero YMM register won't trigger
SSE<->AVX transition penalty.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
` (4 preceding siblings ...)
2021-07-16 12:18 ` hjl.tools at gmail dot com
@ 2021-07-28 14:29 ` cvs-commit at gcc dot gnu.org
2021-07-28 15:02 ` hjl.tools at gmail dot com
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-28 14:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:
https://gcc.gnu.org/g:9775e465c1fbfc32656de77c618c61acf5bd905d
commit r12-2571-g9775e465c1fbfc32656de77c618c61acf5bd905d
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Jul 27 07:46:04 2021 -0700
x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register
There is no SSE <-> AVX transition penalty if the upper bits of YMM/ZMM
registers are unchanged and YMM/ZMM store doesn't change the upper bits
of YMM/ZMM registers.
1. Since zeroing YMM/ZMM register is implemented with zeroing XMM
register, don't set AVX_U128_DIRTY when zeroing YMM/ZMM register.
2. Since store doesn't change the INIT state on the upper bits of
YMM/ZMM register, don't set AVX_U128_DIRTY on store if the source
of store was never non-zero.
Here are the vzeroupper count differences on SPEC CPU 2017 with
-Ofast -march=skylake-avx512
Before After Diff
500.perlbench_r 226 225 -0.44%
502.gcc_r 1263 1103 -12.67%
503.bwaves_r 14 14 0.00%
505.mcf_r 29 28 -3.45%
507.cactuBSSN_r 4651 4628 -0.49%
508.namd_r 433 432 -0.23%
510.parest_r 20380 19347 -5.07%
511.povray_r 495 452 -8.69%
519.lbm_r 2 2 0.00%
520.omnetpp_r 5954 5677 -4.65%
521.wrf_r 12353 12339 -0.11%
523.xalancbmk_r 13137 13001 -1.04%
525.x264_r 192 191 -0.52%
526.blender_r 2515 2366 -5.92%
527.cam4_r 4601 4583 -0.39%
531.deepsjeng_r 20 19 -5.00%
538.imagick_r 898 805 -10.36%
541.leela_r 427 399 -6.56%
544.nab_r 74 74 0.00%
548.exchange2_r 72 72 0.00%
549.fotonik3d_r 318 318 0.00%
554.roms_r 558 554 -0.72%
557.xz_r 79 52 -34.18%
and performance differences are within noise range.
gcc/
PR target/101456
* config/i386/i386.c (ix86_avx_u128_mode_needed): Don't set
AVX_U128_DIRTY when all bits are zero.
gcc/testsuite/
PR target/101456
* gcc.target/i386/pr101456-1.c: New test.
* gcc.target/i386/pr101456-2.c: Likewise.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
` (5 preceding siblings ...)
2021-07-28 14:29 ` cvs-commit at gcc dot gnu.org
@ 2021-07-28 15:02 ` hjl.tools at gmail dot com
2022-02-15 13:42 ` hjl.tools at gmail dot com
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-28 15:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |12.0
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 12. Please reopen it if there are other cases where
vzeroupper should be skipped.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
` (6 preceding siblings ...)
2021-07-28 15:02 ` hjl.tools at gmail dot com
@ 2022-02-15 13:42 ` hjl.tools at gmail dot com
2022-02-16 4:57 ` crazylht at gmail dot com
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2022-02-15 13:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Resolution|FIXED |---
Status|RESOLVED |REOPENED
Last reconfirmed| |2022-02-15
--- Comment #8 from H.J. Lu <hjl.tools at gmail dot com> ---
It turns out that reading YMM registers with all zero bits needs VZEROUPPER
on Sandy Bride, Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid
SSE <-> AVX transition penalty.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
` (7 preceding siblings ...)
2022-02-15 13:42 ` hjl.tools at gmail dot com
@ 2022-02-16 4:57 ` crazylht at gmail dot com
2022-05-06 8:30 ` jakub at gcc dot gnu.org
2023-05-08 12:22 ` rguenth at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2022-02-16 4:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to H.J. Lu from comment #8)
> It turns out that reading YMM registers with all zero bits needs VZEROUPPER
> on Sandy Bride, Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid
> SSE <-> AVX transition penalty.
We should target tune for r12-2571-g9775e465c1fbfc32656de77c618c61acf5bd905d.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
` (8 preceding siblings ...)
2022-02-16 4:57 ` crazylht at gmail dot com
@ 2022-05-06 8:30 ` jakub at gcc dot gnu.org
2023-05-08 12:22 ` rguenth at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06 8:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|12.0 |12.2
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
` (9 preceding siblings ...)
2022-05-06 8:30 ` jakub at gcc dot gnu.org
@ 2023-05-08 12:22 ` rguenth at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|12.3 |12.4
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-05-08 12:22 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
2021-07-14 22:59 ` [Bug target/101456] " arjan at linux dot intel.com
2021-07-15 0:04 ` hjl.tools at gmail dot com
2021-07-15 0:06 ` hjl.tools at gmail dot com
2021-07-15 14:32 ` hjl.tools at gmail dot com
2021-07-16 12:18 ` hjl.tools at gmail dot com
2021-07-28 14:29 ` cvs-commit at gcc dot gnu.org
2021-07-28 15:02 ` hjl.tools at gmail dot com
2022-02-15 13:42 ` hjl.tools at gmail dot com
2022-02-16 4:57 ` crazylht at gmail dot com
2022-05-06 8:30 ` jakub at gcc dot gnu.org
2023-05-08 12:22 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).