public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero
@ 2021-07-14 22:39 hjl.tools at gmail dot com
  2021-07-14 22:59 ` [Bug target/101456] " arjan at linux dot intel.com
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-14 22:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

            Bug ID: 101456
           Summary: Unnecessary vzeroupper when upper bits of YMM
                    registers already zero
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hjl.tools at gmail dot com
                CC: crazylht at gmail dot com
  Target Milestone: ---
            Target: i386, x86-64

Unnecessary vzeroupper:

[hjl@gnu-cfl-2 tmp]$ cat x.c 
#include <x86intrin.h>

extern __m256d x;

void
foo (void)
{
  x = _mm256_setzero_pd ();
}
[hjl@gnu-cfl-2 tmp]$ gcc -S -O2 x.c -mavx2 
c[hjl@gnu-cfl-2 tmp]$ cat x.s 
        .file   "x.c"
        .text
        .p2align 4
        .globl  foo
        .type   foo, @function
foo:
.LFB5667:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        vxorpd  %xmm0, %xmm0, %xmm0
        vmovapd %ymm0, x(%rip)
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        vzeroupper  <<<<<< Not needed since upper bits of YMM0 are zero.
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE5667:
        .size   foo, .-foo
        .ident  "GCC: (GNU) 11.1.1 20210531 (Red Hat 11.1.1-3)"
        .section        .note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 tmp]$

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
@ 2021-07-14 22:59 ` arjan at linux dot intel.com
  2021-07-15  0:04 ` hjl.tools at gmail dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: arjan at linux dot intel.com @ 2021-07-14 22:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

Arjan van de Ven <arjan at linux dot intel.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arjan at linux dot intel.com

--- Comment #1 from Arjan van de Ven <arjan at linux dot intel.com> ---
Actually it's not that they're zero (they are) but they're in "init" state
since the vpxor wrote to xmm not ymm

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
  2021-07-14 22:59 ` [Bug target/101456] " arjan at linux dot intel.com
@ 2021-07-15  0:04 ` hjl.tools at gmail dot com
  2021-07-15  0:06 ` hjl.tools at gmail dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-15  0:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 51153
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51153&action=edit
A patch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
  2021-07-14 22:59 ` [Bug target/101456] " arjan at linux dot intel.com
  2021-07-15  0:04 ` hjl.tools at gmail dot com
@ 2021-07-15  0:06 ` hjl.tools at gmail dot com
  2021-07-15 14:32 ` hjl.tools at gmail dot com
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-15  0:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #3 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Arjan van de Ven from comment #1)
> Actually it's not that they're zero (they are) but they're in "init" state
> since the vpxor wrote to xmm not ymm

We generate:

        vxorpd  %xmm0, %xmm0, %xmm0     # 5     [c=4 l=4]  movv4df_internal/0

to zero all bits in YMM and ZMM registers.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
                   ` (2 preceding siblings ...)
  2021-07-15  0:06 ` hjl.tools at gmail dot com
@ 2021-07-15 14:32 ` hjl.tools at gmail dot com
  2021-07-16 12:18 ` hjl.tools at gmail dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-15 14:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #51153|0                           |1
        is obsolete|                            |

--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 51157
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51157&action=edit
A new patch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
                   ` (3 preceding siblings ...)
  2021-07-15 14:32 ` hjl.tools at gmail dot com
@ 2021-07-16 12:18 ` hjl.tools at gmail dot com
  2021-07-28 14:29 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-16 12:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #5 from H.J. Lu <hjl.tools at gmail dot com> ---
We need to verify that LOADING the zero YMM register won't trigger
SSE<->AVX transition penalty.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
                   ` (4 preceding siblings ...)
  2021-07-16 12:18 ` hjl.tools at gmail dot com
@ 2021-07-28 14:29 ` cvs-commit at gcc dot gnu.org
  2021-07-28 15:02 ` hjl.tools at gmail dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-28 14:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:9775e465c1fbfc32656de77c618c61acf5bd905d

commit r12-2571-g9775e465c1fbfc32656de77c618c61acf5bd905d
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Tue Jul 27 07:46:04 2021 -0700

    x86: Don't set AVX_U128_DIRTY when zeroing YMM/ZMM register

    There is no SSE <-> AVX transition penalty if the upper bits of YMM/ZMM
    registers are unchanged and YMM/ZMM store doesn't change the upper bits
    of YMM/ZMM registers.

    1. Since zeroing YMM/ZMM register is implemented with zeroing XMM
    register, don't set AVX_U128_DIRTY when zeroing YMM/ZMM register.
    2. Since store doesn't change the INIT state on the upper bits of
    YMM/ZMM register, don't set AVX_U128_DIRTY on store if the source
    of store was never non-zero.

    Here are the vzeroupper count differences on SPEC CPU 2017 with

    -Ofast -march=skylake-avx512

                    Before  After    Diff
    500.perlbench_r 226     225     -0.44%
    502.gcc_r       1263    1103    -12.67%
    503.bwaves_r    14      14      0.00%
    505.mcf_r       29      28      -3.45%
    507.cactuBSSN_r 4651    4628    -0.49%
    508.namd_r      433     432     -0.23%
    510.parest_r    20380   19347   -5.07%
    511.povray_r    495     452     -8.69%
    519.lbm_r       2       2       0.00%
    520.omnetpp_r   5954    5677    -4.65%
    521.wrf_r       12353   12339   -0.11%
    523.xalancbmk_r 13137   13001   -1.04%
    525.x264_r      192     191     -0.52%
    526.blender_r   2515    2366    -5.92%
    527.cam4_r      4601    4583    -0.39%
    531.deepsjeng_r 20      19      -5.00%
    538.imagick_r   898     805     -10.36%
    541.leela_r     427     399     -6.56%
    544.nab_r       74      74      0.00%
    548.exchange2_r 72      72      0.00%
    549.fotonik3d_r 318     318     0.00%
    554.roms_r      558     554     -0.72%
    557.xz_r        79      52      -34.18%

    and performance differences are within noise range.

    gcc/

            PR target/101456
            * config/i386/i386.c (ix86_avx_u128_mode_needed): Don't set
            AVX_U128_DIRTY when all bits are zero.

    gcc/testsuite/

            PR target/101456
            * gcc.target/i386/pr101456-1.c: New test.
            * gcc.target/i386/pr101456-2.c: Likewise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
                   ` (5 preceding siblings ...)
  2021-07-28 14:29 ` cvs-commit at gcc dot gnu.org
@ 2021-07-28 15:02 ` hjl.tools at gmail dot com
  2022-02-15 13:42 ` hjl.tools at gmail dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-28 15:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 12.  Please reopen it if there are other cases where
vzeroupper should be skipped.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
                   ` (6 preceding siblings ...)
  2021-07-28 15:02 ` hjl.tools at gmail dot com
@ 2022-02-15 13:42 ` hjl.tools at gmail dot com
  2022-02-16  4:57 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2022-02-15 13:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
         Resolution|FIXED                       |---
             Status|RESOLVED                    |REOPENED
   Last reconfirmed|                            |2022-02-15

--- Comment #8 from H.J. Lu <hjl.tools at gmail dot com> ---
It turns out that reading YMM registers with all zero bits needs VZEROUPPER
on Sandy Bride, Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid
SSE <-> AVX transition penalty.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
                   ` (7 preceding siblings ...)
  2022-02-15 13:42 ` hjl.tools at gmail dot com
@ 2022-02-16  4:57 ` crazylht at gmail dot com
  2022-05-06  8:30 ` jakub at gcc dot gnu.org
  2023-05-08 12:22 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2022-02-16  4:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to H.J. Lu from comment #8)
> It turns out that reading YMM registers with all zero bits needs VZEROUPPER
> on Sandy Bride, Ivy Bridge, Haswell, Broadwell and Alder Lake to avoid
> SSE <-> AVX transition penalty.

We should target tune for r12-2571-g9775e465c1fbfc32656de77c618c61acf5bd905d.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
                   ` (8 preceding siblings ...)
  2022-02-16  4:57 ` crazylht at gmail dot com
@ 2022-05-06  8:30 ` jakub at gcc dot gnu.org
  2023-05-08 12:22 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06  8:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.0                        |12.2

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/101456] Unnecessary vzeroupper when upper bits of YMM registers already zero
  2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
                   ` (9 preceding siblings ...)
  2022-05-06  8:30 ` jakub at gcc dot gnu.org
@ 2023-05-08 12:22 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101456

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.3                        |12.4

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-05-08 12:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14 22:39 [Bug target/101456] New: Unnecessary vzeroupper when upper bits of YMM registers already zero hjl.tools at gmail dot com
2021-07-14 22:59 ` [Bug target/101456] " arjan at linux dot intel.com
2021-07-15  0:04 ` hjl.tools at gmail dot com
2021-07-15  0:06 ` hjl.tools at gmail dot com
2021-07-15 14:32 ` hjl.tools at gmail dot com
2021-07-16 12:18 ` hjl.tools at gmail dot com
2021-07-28 14:29 ` cvs-commit at gcc dot gnu.org
2021-07-28 15:02 ` hjl.tools at gmail dot com
2022-02-15 13:42 ` hjl.tools at gmail dot com
2022-02-16  4:57 ` crazylht at gmail dot com
2022-05-06  8:30 ` jakub at gcc dot gnu.org
2023-05-08 12:22 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).