public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem
@ 2022-05-06 9:12 amonakov at gcc dot gnu.org
2022-05-07 3:02 ` [Bug target/105504] " crazylht at gmail dot com
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-05-06 9:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
Bug ID: 105504
Summary: Fails to break dependency for vcvtss2sd xmm, xmm, mem
Product: gcc
Version: unknown
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
Created attachment 52933
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52933&action=edit
testcase
Hit by core-math team at
https://gcc.gnu.org/pipermail/gcc-help/2022-May/141480.html
Compile the attached testcase with -O2 -march=haswell (other AVX-capable Intel
families except Alderlake are affected too) and observe that the big basic
block begins with
.L6:
vcvtss2sd xmm1, xmm1, DWORD PTR [rsp-4]
This creates a false dependency on the previous assignment into xmm1, resulting
in wildly varying (and suboptimal) throughput figures depending on how long the
CPU stalls waiting for the previous assignment to complete.
GCC has code to emit such instructions in a manner that avoids false
dependencies (see e.g. PR89071), but here it doesn't seem to work.
Also there's a potentially related issue that GCC copies the initial xmm0 value
to eax via stack in the beginning of the function:
cr_exp10f:
vmovss DWORD PTR [rsp-4], xmm0
mov eax, DWORD PTR [rsp-4]
This seems wrong since xmm-reg moves on Haswell are 1 cycle afaict.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
@ 2022-05-07 3:02 ` crazylht at gmail dot com
2022-05-07 3:16 ` crazylht at gmail dot com
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-05-07 3:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
Pass remove_partial_avx_dependency is before RA, which we have
(insn 128 127 129 22 (set (reg/v:DF 99 [ z ])
(float_extend:DF (reg/v:SF 117 [ x ]))) "test.c":43:10 163
{*extendsfdf2}
and attr avx_partial_xmm_update is false, since we don't need to handle
vcvtss2sd %xmm, %xmm, %xmm
But after RA, 117 is allocate as mem which causes partial xmm dependency.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
2022-05-07 3:02 ` [Bug target/105504] " crazylht at gmail dot com
@ 2022-05-07 3:16 ` crazylht at gmail dot com
2022-05-07 3:36 ` crazylht at gmail dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-05-07 3:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
After set remove_partial_avx_dependency to true for register alternative, we
get
vxorps %xmm3, %xmm3, %xmm3
vmovsd .LC16(%rip), %xmm6
vmovsd .LC14(%rip), %xmm5
vcvtss2sd %xmm0, %xmm3, %xmm0
vmulsd .LC12(%rip), %xmm0, %xmm1
vroundsd $9, %xmm1, %xmm3, %xmm2
vcvttsd2siq %xmm2, %rax
vsubsd %xmm2, %xmm1, %xmm1
vfmadd231sd .LC13(%rip), %xmm0, %xmm1
vfmadd213sd .LC17(%rip), %xmm1, %xmm6
vmovsd .LC18(%rip), %xmm0
vfmadd213sd .LC19(%rip), %xmm1, %xmm0
vfmadd213sd .LC15(%rip), %xmm1, %xmm5
movq %rax, %rdx
sarq $4, %rax
vmulsd %xmm1, %xmm1, %xmm4
addq $1023, %rax
andl $15, %edx
salq $52, %rax
vmovq %rax, %xmm7
vmulsd tb.1(,%rdx,8), %xmm7, %xmm2
vfmadd132sd %xmm4, %xmm6, %xmm0
vmulsd %xmm2, %xmm1, %xmm1
vfmadd132sd %xmm4, %xmm5, %xmm0
vfmadd132sd %xmm1, %xmm2, %xmm0
vcvtsd2ss %xmm0, %xmm3, %xmm0
Also
>Also there's a potentially related issue that GCC copies the initial xmm0 value >to eax via stack in the beginning of the function:
This issue disappears(should still be latent after fix).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
2022-05-07 3:02 ` [Bug target/105504] " crazylht at gmail dot com
2022-05-07 3:16 ` crazylht at gmail dot com
@ 2022-05-07 3:36 ` crazylht at gmail dot com
2022-05-07 4:04 ` crazylht at gmail dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-05-07 3:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hjl.tools at gmail dot com
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
The overhead for setting attr remove_partial_avx_dependency to true for
register alternative is:
For simple
double
foo (float a)
{
return a;
}
Now we generates:
foo:
.LFB0:
.cfi_startproc
vmovaps %xmm0, %xmm1
vxorps %xmm0, %xmm0, %xmm0
vcvtss2sd %xmm1, %xmm0, %xmm0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
` (2 preceding siblings ...)
2022-05-07 3:36 ` crazylht at gmail dot com
@ 2022-05-07 4:04 ` crazylht at gmail dot com
2022-05-07 8:28 ` amonakov at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-05-07 4:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Another possible solution is add a little bit dislike for "m" alternative(like
?m) to avoid potential spill.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
` (3 preceding siblings ...)
2022-05-07 4:04 ` crazylht at gmail dot com
@ 2022-05-07 8:28 ` amonakov at gcc dot gnu.org
2022-06-08 3:24 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-05-07 8:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
The strange xmm0 spill issue may affect more code, so I reported an isolated
testcase: PR 105513 (regression vs. gcc-8, the complete testcase in this PR
also does not spill with gcc-8).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
` (4 preceding siblings ...)
2022-05-07 8:28 ` amonakov at gcc dot gnu.org
@ 2022-06-08 3:24 ` cvs-commit at gcc dot gnu.org
2023-08-05 15:05 ` egallager at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-08 3:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
commit r13-1009-g5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
Author: liuhongt <hongtao.liu@intel.com>
Date: Mon May 30 15:30:51 2022 +0800
Disparages SSE_REGS alternatives sligntly with ?v instead of *v in
*mov{si,di}_internal.
So alternative v won't be igored in record_reg_classess.
Similar for *r alternatives in some vector patterns.
It helps testcase in the PR, also RA now makes better decisions for
gcc.target/i386/extract-insert-combining.c
movd %esi, %xmm0
movd %edi, %xmm1
- movl %esi, -12(%rsp)
paddd %xmm0, %xmm1
pinsrd $0, %esi, %xmm0
paddd %xmm1, %xmm0
The patch has no big impact on SPEC2017 for both O2 and Ofast
march=native run.
And I noticed there's some changes in SPEC2017 from code like
mov mem, %eax
vmovd %eax, %xmm0
..
mov %eax, 64(%rsp)
to
vmovd mem, %xmm0
..
vmovd %xmm0, 64(%rsp)
Which should be exactly what we want?
gcc/ChangeLog:
PR target/105513
PR target/105504
* config/i386/i386.md (*movsi_internal): Change alternative
from *v to ?v.
(*movdi_internal): Ditto.
* config/i386/sse.md (vec_set<mode>_0): Change alternative *r
to ?r.
(*vec_extractv4sf_mem): Ditto.
(*vec_extracthf): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr105513-1.c: New test.
* gcc.target/i386/extract-insert-combining.c: Add new
scan-assembler-not for spill.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
` (5 preceding siblings ...)
2022-06-08 3:24 ` cvs-commit at gcc dot gnu.org
@ 2023-08-05 15:05 ` egallager at gcc dot gnu.org
2023-08-07 1:55 ` crazylht at gmail dot com
2023-08-12 13:50 ` egallager at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: egallager at gcc dot gnu.org @ 2023-08-05 15:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
--- Comment #7 from Eric Gallager <egallager at gcc dot gnu.org> ---
(In reply to CVS Commits from comment #6)
> The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
>
> https://gcc.gnu.org/g:5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
>
> commit r13-1009-g5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
> Author: liuhongt <hongtao.liu@intel.com>
> Date: Mon May 30 15:30:51 2022 +0800
>
> Disparages SSE_REGS alternatives sligntly with ?v instead of *v in
> *mov{si,di}_internal.
>
> So alternative v won't be igored in record_reg_classess.
>
> Similar for *r alternatives in some vector patterns.
>
> It helps testcase in the PR, also RA now makes better decisions for
> gcc.target/i386/extract-insert-combining.c
>
> movd %esi, %xmm0
> movd %edi, %xmm1
> - movl %esi, -12(%rsp)
> paddd %xmm0, %xmm1
> pinsrd $0, %esi, %xmm0
> paddd %xmm1, %xmm0
>
> The patch has no big impact on SPEC2017 for both O2 and Ofast
> march=native run.
>
> And I noticed there's some changes in SPEC2017 from code like
>
> mov mem, %eax
> vmovd %eax, %xmm0
> ..
> mov %eax, 64(%rsp)
>
> to
>
> vmovd mem, %xmm0
> ..
> vmovd %xmm0, 64(%rsp)
>
> Which should be exactly what we want?
>
> gcc/ChangeLog:
>
> PR target/105513
> PR target/105504
> * config/i386/i386.md (*movsi_internal): Change alternative
> from *v to ?v.
> (*movdi_internal): Ditto.
> * config/i386/sse.md (vec_set<mode>_0): Change alternative *r
> to ?r.
> (*vec_extractv4sf_mem): Ditto.
> (*vec_extracthf): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr105513-1.c: New test.
> * gcc.target/i386/extract-insert-combining.c: Add new
> scan-assembler-not for spill.
Did this fix it?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
` (6 preceding siblings ...)
2023-08-05 15:05 ` egallager at gcc dot gnu.org
@ 2023-08-07 1:55 ` crazylht at gmail dot com
2023-08-12 13:50 ` egallager at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2023-08-07 1:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Eric Gallager from comment #7)
> (In reply to CVS Commits from comment #6)
> > The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
> >
> > https://gcc.gnu.org/g:5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
> >
> > commit r13-1009-g5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
> > Author: liuhongt <hongtao.liu@intel.com>
> > Date: Mon May 30 15:30:51 2022 +0800
> >
> > Disparages SSE_REGS alternatives sligntly with ?v instead of *v in
> > *mov{si,di}_internal.
> >
> > So alternative v won't be igored in record_reg_classess.
> >
> > Similar for *r alternatives in some vector patterns.
> >
> > It helps testcase in the PR, also RA now makes better decisions for
> > gcc.target/i386/extract-insert-combining.c
> >
> > movd %esi, %xmm0
> > movd %edi, %xmm1
> > - movl %esi, -12(%rsp)
> > paddd %xmm0, %xmm1
> > pinsrd $0, %esi, %xmm0
> > paddd %xmm1, %xmm0
> >
> > The patch has no big impact on SPEC2017 for both O2 and Ofast
> > march=native run.
> >
> > And I noticed there's some changes in SPEC2017 from code like
> >
> > mov mem, %eax
> > vmovd %eax, %xmm0
> > ..
> > mov %eax, 64(%rsp)
> >
> > to
> >
> > vmovd mem, %xmm0
> > ..
> > vmovd %xmm0, 64(%rsp)
> >
> > Which should be exactly what we want?
> >
> > gcc/ChangeLog:
> >
> > PR target/105513
> > PR target/105504
> > * config/i386/i386.md (*movsi_internal): Change alternative
> > from *v to ?v.
> > (*movdi_internal): Ditto.
> > * config/i386/sse.md (vec_set<mode>_0): Change alternative *r
> > to ?r.
> > (*vec_extractv4sf_mem): Ditto.
> > (*vec_extracthf): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr105513-1.c: New test.
> > * gcc.target/i386/extract-insert-combining.c: Add new
> > scan-assembler-not for spill.
>
> Did this fix it?
Yes.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
` (7 preceding siblings ...)
2023-08-07 1:55 ` crazylht at gmail dot com
@ 2023-08-12 13:50 ` egallager at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: egallager at gcc dot gnu.org @ 2023-08-12 13:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504
Eric Gallager <egallager at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
--- Comment #9 from Eric Gallager <egallager at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Eric Gallager from comment #7)
> >
> > Did this fix it?
>
> Yes.
OK, closing, then
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-08-12 13:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-06 9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
2022-05-07 3:02 ` [Bug target/105504] " crazylht at gmail dot com
2022-05-07 3:16 ` crazylht at gmail dot com
2022-05-07 3:36 ` crazylht at gmail dot com
2022-05-07 4:04 ` crazylht at gmail dot com
2022-05-07 8:28 ` amonakov at gcc dot gnu.org
2022-06-08 3:24 ` cvs-commit at gcc dot gnu.org
2023-08-05 15:05 ` egallager at gcc dot gnu.org
2023-08-07 1:55 ` crazylht at gmail dot com
2023-08-12 13:50 ` egallager at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).