public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem
@ 2022-05-06  9:12 amonakov at gcc dot gnu.org
  2022-05-07  3:02 ` [Bug target/105504] " crazylht at gmail dot com
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-05-06  9:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

            Bug ID: 105504
           Summary: Fails to break dependency for vcvtss2sd xmm, xmm, mem
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

Created attachment 52933
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52933&action=edit
testcase

Hit by core-math team at
https://gcc.gnu.org/pipermail/gcc-help/2022-May/141480.html

Compile the attached testcase with -O2 -march=haswell (other AVX-capable Intel
families except Alderlake are affected too) and observe that the big basic
block begins with

.L6:
        vcvtss2sd       xmm1, xmm1, DWORD PTR [rsp-4]

This creates a false dependency on the previous assignment into xmm1, resulting
in wildly varying (and suboptimal) throughput figures depending on how long the
CPU stalls waiting for the previous assignment to complete.

GCC has code to emit such instructions in a manner that avoids false
dependencies (see e.g. PR89071), but here it doesn't seem to work.


Also there's a potentially related issue that GCC copies the initial xmm0 value
to eax via stack in the beginning of the function:

cr_exp10f:
        vmovss  DWORD PTR [rsp-4], xmm0
        mov     eax, DWORD PTR [rsp-4]

This seems wrong since xmm-reg moves on Haswell are 1 cycle afaict.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
@ 2022-05-07  3:02 ` crazylht at gmail dot com
  2022-05-07  3:16 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-05-07  3:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
Pass remove_partial_avx_dependency is before RA, which we have

(insn 128 127 129 22 (set (reg/v:DF 99 [ z ])
        (float_extend:DF (reg/v:SF 117 [ x ]))) "test.c":43:10 163
{*extendsfdf2}

and attr avx_partial_xmm_update is false, since we don't need to handle

vcvtss2sd %xmm, %xmm, %xmm

But after RA, 117 is allocate as mem which causes partial xmm dependency.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
  2022-05-07  3:02 ` [Bug target/105504] " crazylht at gmail dot com
@ 2022-05-07  3:16 ` crazylht at gmail dot com
  2022-05-07  3:36 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-05-07  3:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
After set remove_partial_avx_dependency to true for register alternative, we
get

       vxorps  %xmm3, %xmm3, %xmm3
        vmovsd  .LC16(%rip), %xmm6
        vmovsd  .LC14(%rip), %xmm5
        vcvtss2sd       %xmm0, %xmm3, %xmm0
        vmulsd  .LC12(%rip), %xmm0, %xmm1
        vroundsd        $9, %xmm1, %xmm3, %xmm2
        vcvttsd2siq     %xmm2, %rax
        vsubsd  %xmm2, %xmm1, %xmm1
        vfmadd231sd     .LC13(%rip), %xmm0, %xmm1
        vfmadd213sd     .LC17(%rip), %xmm1, %xmm6
        vmovsd  .LC18(%rip), %xmm0
        vfmadd213sd     .LC19(%rip), %xmm1, %xmm0
        vfmadd213sd     .LC15(%rip), %xmm1, %xmm5
        movq    %rax, %rdx
        sarq    $4, %rax
        vmulsd  %xmm1, %xmm1, %xmm4
        addq    $1023, %rax
        andl    $15, %edx
        salq    $52, %rax
        vmovq   %rax, %xmm7
        vmulsd  tb.1(,%rdx,8), %xmm7, %xmm2
        vfmadd132sd     %xmm4, %xmm6, %xmm0
        vmulsd  %xmm2, %xmm1, %xmm1
        vfmadd132sd     %xmm4, %xmm5, %xmm0
        vfmadd132sd     %xmm1, %xmm2, %xmm0
        vcvtsd2ss       %xmm0, %xmm3, %xmm0

Also  
>Also there's a potentially related issue that GCC copies the initial xmm0 value >to eax via stack in the beginning of the function:

This issue disappears(should still be latent after fix).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
  2022-05-07  3:02 ` [Bug target/105504] " crazylht at gmail dot com
  2022-05-07  3:16 ` crazylht at gmail dot com
@ 2022-05-07  3:36 ` crazylht at gmail dot com
  2022-05-07  4:04 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-05-07  3:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl.tools at gmail dot com

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
The overhead for setting attr remove_partial_avx_dependency to true for
register alternative is:

For simple 
double
foo (float a)
{
        return a;
}


Now we generates:

foo:
.LFB0:
        .cfi_startproc
        vmovaps %xmm0, %xmm1
        vxorps  %xmm0, %xmm0, %xmm0
        vcvtss2sd       %xmm1, %xmm0, %xmm0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-05-07  3:36 ` crazylht at gmail dot com
@ 2022-05-07  4:04 ` crazylht at gmail dot com
  2022-05-07  8:28 ` amonakov at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2022-05-07  4:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Another possible solution is add a little bit dislike for "m" alternative(like
?m) to avoid potential spill.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-05-07  4:04 ` crazylht at gmail dot com
@ 2022-05-07  8:28 ` amonakov at gcc dot gnu.org
  2022-06-08  3:24 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-05-07  8:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

--- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
The strange xmm0 spill issue may affect more code, so I reported an isolated
testcase: PR 105513 (regression vs. gcc-8, the complete testcase in this PR
also does not spill with gcc-8).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-05-07  8:28 ` amonakov at gcc dot gnu.org
@ 2022-06-08  3:24 ` cvs-commit at gcc dot gnu.org
  2023-08-05 15:05 ` egallager at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-06-08  3:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5

commit r13-1009-g5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon May 30 15:30:51 2022 +0800

    Disparages SSE_REGS alternatives sligntly with ?v instead of *v in
*mov{si,di}_internal.

    So alternative v won't be igored in record_reg_classess.

    Similar for *r alternatives in some vector patterns.

    It helps testcase in the PR, also RA now makes better decisions for
    gcc.target/i386/extract-insert-combining.c

            movd    %esi, %xmm0
            movd    %edi, %xmm1
    -       movl    %esi, -12(%rsp)
            paddd   %xmm0, %xmm1
            pinsrd  $0, %esi, %xmm0
            paddd   %xmm1, %xmm0

    The patch has no big impact on SPEC2017 for both O2 and Ofast
    march=native run.

    And I noticed there's some changes in SPEC2017 from code like

    mov mem, %eax
    vmovd %eax, %xmm0
    ..
    mov %eax, 64(%rsp)

    to

    vmovd mem, %xmm0
    ..
    vmovd %xmm0, 64(%rsp)

    Which should be exactly what we want?

    gcc/ChangeLog:

            PR target/105513
            PR target/105504
            * config/i386/i386.md (*movsi_internal): Change alternative
            from *v to ?v.
            (*movdi_internal): Ditto.
            * config/i386/sse.md (vec_set<mode>_0): Change alternative *r
            to ?r.
            (*vec_extractv4sf_mem): Ditto.
            (*vec_extracthf): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr105513-1.c: New test.
            * gcc.target/i386/extract-insert-combining.c: Add new
            scan-assembler-not for spill.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-06-08  3:24 ` cvs-commit at gcc dot gnu.org
@ 2023-08-05 15:05 ` egallager at gcc dot gnu.org
  2023-08-07  1:55 ` crazylht at gmail dot com
  2023-08-12 13:50 ` egallager at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: egallager at gcc dot gnu.org @ 2023-08-05 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

--- Comment #7 from Eric Gallager <egallager at gcc dot gnu.org> ---
(In reply to CVS Commits from comment #6)
> The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
> 
> commit r13-1009-g5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
> Author: liuhongt <hongtao.liu@intel.com>
> Date:   Mon May 30 15:30:51 2022 +0800
> 
>     Disparages SSE_REGS alternatives sligntly with ?v instead of *v in
> *mov{si,di}_internal.
>     
>     So alternative v won't be igored in record_reg_classess.
>     
>     Similar for *r alternatives in some vector patterns.
>     
>     It helps testcase in the PR, also RA now makes better decisions for
>     gcc.target/i386/extract-insert-combining.c
>     
>             movd    %esi, %xmm0
>             movd    %edi, %xmm1
>     -       movl    %esi, -12(%rsp)
>             paddd   %xmm0, %xmm1
>             pinsrd  $0, %esi, %xmm0
>             paddd   %xmm1, %xmm0
>     
>     The patch has no big impact on SPEC2017 for both O2 and Ofast
>     march=native run.
>     
>     And I noticed there's some changes in SPEC2017 from code like
>     
>     mov mem, %eax
>     vmovd %eax, %xmm0
>     ..
>     mov %eax, 64(%rsp)
>     
>     to
>     
>     vmovd mem, %xmm0
>     ..
>     vmovd %xmm0, 64(%rsp)
>     
>     Which should be exactly what we want?
>     
>     gcc/ChangeLog:
>     
>             PR target/105513
>             PR target/105504
>             * config/i386/i386.md (*movsi_internal): Change alternative
>             from *v to ?v.
>             (*movdi_internal): Ditto.
>             * config/i386/sse.md (vec_set<mode>_0): Change alternative *r
>             to ?r.
>             (*vec_extractv4sf_mem): Ditto.
>             (*vec_extracthf): Ditto.
>     
>     gcc/testsuite/ChangeLog:
>     
>             * gcc.target/i386/pr105513-1.c: New test.
>             * gcc.target/i386/extract-insert-combining.c: Add new
>             scan-assembler-not for spill.

Did this fix it?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-08-05 15:05 ` egallager at gcc dot gnu.org
@ 2023-08-07  1:55 ` crazylht at gmail dot com
  2023-08-12 13:50 ` egallager at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: crazylht at gmail dot com @ 2023-08-07  1:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Eric Gallager from comment #7)
> (In reply to CVS Commits from comment #6)
> > The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
> > 
> > https://gcc.gnu.org/g:5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
> > 
> > commit r13-1009-g5e005393d4ff0a428c5f55b9ba7f65d6078a7cf5
> > Author: liuhongt <hongtao.liu@intel.com>
> > Date:   Mon May 30 15:30:51 2022 +0800
> > 
> >     Disparages SSE_REGS alternatives sligntly with ?v instead of *v in
> > *mov{si,di}_internal.
> >     
> >     So alternative v won't be igored in record_reg_classess.
> >     
> >     Similar for *r alternatives in some vector patterns.
> >     
> >     It helps testcase in the PR, also RA now makes better decisions for
> >     gcc.target/i386/extract-insert-combining.c
> >     
> >             movd    %esi, %xmm0
> >             movd    %edi, %xmm1
> >     -       movl    %esi, -12(%rsp)
> >             paddd   %xmm0, %xmm1
> >             pinsrd  $0, %esi, %xmm0
> >             paddd   %xmm1, %xmm0
> >     
> >     The patch has no big impact on SPEC2017 for both O2 and Ofast
> >     march=native run.
> >     
> >     And I noticed there's some changes in SPEC2017 from code like
> >     
> >     mov mem, %eax
> >     vmovd %eax, %xmm0
> >     ..
> >     mov %eax, 64(%rsp)
> >     
> >     to
> >     
> >     vmovd mem, %xmm0
> >     ..
> >     vmovd %xmm0, 64(%rsp)
> >     
> >     Which should be exactly what we want?
> >     
> >     gcc/ChangeLog:
> >     
> >             PR target/105513
> >             PR target/105504
> >             * config/i386/i386.md (*movsi_internal): Change alternative
> >             from *v to ?v.
> >             (*movdi_internal): Ditto.
> >             * config/i386/sse.md (vec_set<mode>_0): Change alternative *r
> >             to ?r.
> >             (*vec_extractv4sf_mem): Ditto.
> >             (*vec_extracthf): Ditto.
> >     
> >     gcc/testsuite/ChangeLog:
> >     
> >             * gcc.target/i386/pr105513-1.c: New test.
> >             * gcc.target/i386/extract-insert-combining.c: Add new
> >             scan-assembler-not for spill.
> 
> Did this fix it?

Yes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/105504] Fails to break dependency for vcvtss2sd xmm, xmm, mem
  2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-08-07  1:55 ` crazylht at gmail dot com
@ 2023-08-12 13:50 ` egallager at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: egallager at gcc dot gnu.org @ 2023-08-12 13:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504

Eric Gallager <egallager at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Eric Gallager <egallager at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Eric Gallager from comment #7)
> > 
> > Did this fix it?
> 
> Yes.

OK, closing, then

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-12 13:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-06  9:12 [Bug target/105504] New: Fails to break dependency for vcvtss2sd xmm, xmm, mem amonakov at gcc dot gnu.org
2022-05-07  3:02 ` [Bug target/105504] " crazylht at gmail dot com
2022-05-07  3:16 ` crazylht at gmail dot com
2022-05-07  3:36 ` crazylht at gmail dot com
2022-05-07  4:04 ` crazylht at gmail dot com
2022-05-07  8:28 ` amonakov at gcc dot gnu.org
2022-06-08  3:24 ` cvs-commit at gcc dot gnu.org
2023-08-05 15:05 ` egallager at gcc dot gnu.org
2023-08-07  1:55 ` crazylht at gmail dot com
2023-08-12 13:50 ` egallager at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).