public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
@ 2023-08-21 12:24 Richard Biener
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2023-08-21 12:24 UTC (permalink / raw)
  To: gcc-patches; +Cc: hongtao.liu, hjl.tools

The following fixes the gcc.target/i386/pr87007-5.c testcase which
changed code generation again after the recent sinking improvements.
We now have

        vxorps  %xmm0, %xmm0, %xmm0
        vsqrtsd d2(%rip), %xmm0, %xmm0

and an unnecessary xor again in one case, the other vsqrtsd has
a register source and a properly zeroing load:

        vmovsd  d3(%rip), %xmm0
        testl   %esi, %esi
        jg      .L11
.L3:
        vsqrtsd %xmm0, %xmm0, %xmm0

the following patch XFAILs the scan.  I'm not sure what's at
fault here, there are no loops in the CFG, but somehow
r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
doesn't.  I guess I don't really understand what
remove_partial_avx_dependency is supposed to do so can't
really assess whether the pxor is necessary or not.

OK?

	* gcc.target/i386/pr87007-5.c: Update comment, XFAIL
	subtest.
---
 gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index a6cdf11522e..5902616d1f1 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
-/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence.  */
+/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
+   are sunk out of the loop and the loop is elided.  One vsqrtsd with
+   memory operand will need a xor to avoid partial dependence.  */
 
 #include<math.h>
 
@@ -17,4 +19,4 @@ foo (int n, int k)
 
 /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
 /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
-- 
2.35.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
  2023-08-21 12:59   ` Richard Biener
@ 2023-08-21 13:08     ` Hongtao Liu
  0 siblings, 0 replies; 5+ messages in thread
From: Hongtao Liu @ 2023-08-21 13:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hongtao.liu, hjl.tools

On Mon, Aug 21, 2023 at 8:59 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Mon, 21 Aug 2023, Hongtao Liu wrote:
>
> > On Mon, Aug 21, 2023 at 8:25?PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > The following fixes the gcc.target/i386/pr87007-5.c testcase which
> > > changed code generation again after the recent sinking improvements.
> > > We now have
> > >
> > >         vxorps  %xmm0, %xmm0, %xmm0
> > >         vsqrtsd d2(%rip), %xmm0, %xmm0
> > >
> > > and an unnecessary xor again in one case, the other vsqrtsd has
> > > a register source and a properly zeroing load:
> > >
> > >         vmovsd  d3(%rip), %xmm0
> > >         testl   %esi, %esi
> > >         jg      .L11
> > > .L3:
> > >         vsqrtsd %xmm0, %xmm0, %xmm0
> > >
> > > the following patch XFAILs the scan.  I'm not sure what's at
> > > fault here, there are no loops in the CFG, but somehow
> > > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> > > doesn't.  I guess I don't really understand what
> > > remove_partial_avx_dependency is supposed to do so can't
> > > really assess whether the pxor is necessary or not.
> > There's a false dependency on xmm0 when the source operand in the
> > pattern is memory, the pattern only takes xmm0 as dest, but the output
> > instruction takes xmm0 also as input(the second source operand),
> > that's why we need an pxor here.
>
> OK, so XFAIL is wrong, we should instead scan for one xorps then
> (like it was in the past).
>
> > When the source operand in the pattern is register_operand, we can
> > reuse the register_operand for the second source operand. The
> > instructions here are not very obvious, the more representative one
> > should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
> > >
> > > OK?
> > Can we add -fno-XXX to disable the optimization to make the assembly
> > more stable?
>
> Not sure, we could feed GIMPLE IR to RTL expansion instead of
> feeding a complex testcase through the pipeline, but I'm not sure
> what we were originally supposed to test (the PR trail is a bit
> large).
>
> > Or current codegen should be optimal(for the sinking), then Ok for the patch.
>
> So like the following (I've just adjusted the comments to reflect the
> pxor is necessary).
>
> OK?
OK.
>
> Richard.
>
> From 7bed9399ae736c20a677ccf7e7fc4d2751a32327 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Mon, 21 Aug 2023 14:09:48 +0200
> Subject: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
> To: gcc-patches@gcc.gnu.org
>
> The following fixes the gcc.target/i386/pr87007-5.c testcase which
> changed code generation again after the recent sinking improvements.
> We now have
>
>         vxorps  %xmm0, %xmm0, %xmm0
>         vsqrtsd d2(%rip), %xmm0, %xmm0
>
> and a necessary xor again in one case, the other vsqrtsd has
> a register source and a properly zeroing load:
>
>         vmovsd  d3(%rip), %xmm0
>         testl   %esi, %esi
>         jg      .L11
> .L3:
>         vsqrtsd %xmm0, %xmm0, %xmm0
>
> the following patch adjusts the scan.
>
>         * gcc.target/i386/pr87007-5.c: Update comment, adjust subtest.
> ---
>  gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> index a6cdf11522e..8f2dc947f6c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> @@ -1,6 +1,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
> -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence.  */
> +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
> +   are sunk out of the loop and the loop is elided.  One vsqrtsd with
> +   memory operand needs a xor to avoid partial dependence.  */
>
>  #include<math.h>
>
> @@ -17,4 +19,4 @@ foo (int n, int k)
>
>  /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
>  /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> --
> 2.35.3
>


-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
  2023-08-21 12:40 ` Hongtao Liu
  2023-08-21 12:41   ` Hongtao Liu
@ 2023-08-21 12:59   ` Richard Biener
  2023-08-21 13:08     ` Hongtao Liu
  1 sibling, 1 reply; 5+ messages in thread
From: Richard Biener @ 2023-08-21 12:59 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: gcc-patches, hongtao.liu, hjl.tools

On Mon, 21 Aug 2023, Hongtao Liu wrote:

> On Mon, Aug 21, 2023 at 8:25?PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > The following fixes the gcc.target/i386/pr87007-5.c testcase which
> > changed code generation again after the recent sinking improvements.
> > We now have
> >
> >         vxorps  %xmm0, %xmm0, %xmm0
> >         vsqrtsd d2(%rip), %xmm0, %xmm0
> >
> > and an unnecessary xor again in one case, the other vsqrtsd has
> > a register source and a properly zeroing load:
> >
> >         vmovsd  d3(%rip), %xmm0
> >         testl   %esi, %esi
> >         jg      .L11
> > .L3:
> >         vsqrtsd %xmm0, %xmm0, %xmm0
> >
> > the following patch XFAILs the scan.  I'm not sure what's at
> > fault here, there are no loops in the CFG, but somehow
> > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> > doesn't.  I guess I don't really understand what
> > remove_partial_avx_dependency is supposed to do so can't
> > really assess whether the pxor is necessary or not.
> There's a false dependency on xmm0 when the source operand in the
> pattern is memory, the pattern only takes xmm0 as dest, but the output
> instruction takes xmm0 also as input(the second source operand),
> that's why we need an pxor here.

OK, so XFAIL is wrong, we should instead scan for one xorps then
(like it was in the past).

> When the source operand in the pattern is register_operand, we can
> reuse the register_operand for the second source operand. The
> instructions here are not very obvious, the more representative one
> should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
> >
> > OK?
> Can we add -fno-XXX to disable the optimization to make the assembly
> more stable?

Not sure, we could feed GIMPLE IR to RTL expansion instead of
feeding a complex testcase through the pipeline, but I'm not sure
what we were originally supposed to test (the PR trail is a bit
large).

> Or current codegen should be optimal(for the sinking), then Ok for the patch.

So like the following (I've just adjusted the comments to reflect the
pxor is necessary).

OK?

Richard.

From 7bed9399ae736c20a677ccf7e7fc4d2751a32327 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Mon, 21 Aug 2023 14:09:48 +0200
Subject: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
To: gcc-patches@gcc.gnu.org

The following fixes the gcc.target/i386/pr87007-5.c testcase which
changed code generation again after the recent sinking improvements.
We now have

        vxorps  %xmm0, %xmm0, %xmm0
        vsqrtsd d2(%rip), %xmm0, %xmm0

and a necessary xor again in one case, the other vsqrtsd has
a register source and a properly zeroing load:

        vmovsd  d3(%rip), %xmm0
        testl   %esi, %esi
        jg      .L11
.L3:
        vsqrtsd %xmm0, %xmm0, %xmm0

the following patch adjusts the scan.

	* gcc.target/i386/pr87007-5.c: Update comment, adjust subtest.
---
 gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index a6cdf11522e..8f2dc947f6c 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
-/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence.  */
+/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
+   are sunk out of the loop and the loop is elided.  One vsqrtsd with
+   memory operand needs a xor to avoid partial dependence.  */
 
 #include<math.h>
 
@@ -17,4 +19,4 @@ foo (int n, int k)
 
 /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
 /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
-- 
2.35.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
  2023-08-21 12:40 ` Hongtao Liu
@ 2023-08-21 12:41   ` Hongtao Liu
  2023-08-21 12:59   ` Richard Biener
  1 sibling, 0 replies; 5+ messages in thread
From: Hongtao Liu @ 2023-08-21 12:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hongtao.liu, hjl.tools

On Mon, Aug 21, 2023 at 8:40 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Aug 21, 2023 at 8:25 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > The following fixes the gcc.target/i386/pr87007-5.c testcase which
> > changed code generation again after the recent sinking improvements.
> > We now have
> >
> >         vxorps  %xmm0, %xmm0, %xmm0
> >         vsqrtsd d2(%rip), %xmm0, %xmm0
> >
> > and an unnecessary xor again in one case, the other vsqrtsd has
> > a register source and a properly zeroing load:
> >
> >         vmovsd  d3(%rip), %xmm0
> >         testl   %esi, %esi
> >         jg      .L11
> > .L3:
> >         vsqrtsd %xmm0, %xmm0, %xmm0
> >
> > the following patch XFAILs the scan.  I'm not sure what's at
> > fault here, there are no loops in the CFG, but somehow
> > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> > doesn't.  I guess I don't really understand what
> > remove_partial_avx_dependency is supposed to do so can't
> > really assess whether the pxor is necessary or not.
> There's a false dependency on xmm0 when the source operand in the
> pattern is memory, the pattern only takes xmm0 as dest, but the output
> instruction takes xmm0 also as input(the second source operand),
> that's why we need an pxor here.
> When the source operand in the pattern is register_operand, we can
> reuse the register_operand for the second source operand. The
> instructions here are not very obvious, the more representative one
> should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
And there's no false dependence here.
> >
> > OK?
> Can we add -fno-XXX to disable the optimization to make the assembly
> more stable?
> Or current codegen should be optimal(for the sinking), then Ok for the patch.
>
> >
> >         * gcc.target/i386/pr87007-5.c: Update comment, XFAIL
> >         subtest.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > index a6cdf11522e..5902616d1f1 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > @@ -1,6 +1,8 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
> > -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence.  */
> > +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
> > +   are sunk out of the loop and the loop is elided.  One vsqrtsd with
> > +   memory operand will need a xor to avoid partial dependence.  */
> >
> >  #include<math.h>
> >
> > @@ -17,4 +19,4 @@ foo (int n, int k)
> >
> >  /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
> >  /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
> > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> > --
> > 2.35.3
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
       [not found] <20230821122515.5A627385697D@sourceware.org>
@ 2023-08-21 12:40 ` Hongtao Liu
  2023-08-21 12:41   ` Hongtao Liu
  2023-08-21 12:59   ` Richard Biener
  0 siblings, 2 replies; 5+ messages in thread
From: Hongtao Liu @ 2023-08-21 12:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hongtao.liu, hjl.tools

On Mon, Aug 21, 2023 at 8:25 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> The following fixes the gcc.target/i386/pr87007-5.c testcase which
> changed code generation again after the recent sinking improvements.
> We now have
>
>         vxorps  %xmm0, %xmm0, %xmm0
>         vsqrtsd d2(%rip), %xmm0, %xmm0
>
> and an unnecessary xor again in one case, the other vsqrtsd has
> a register source and a properly zeroing load:
>
>         vmovsd  d3(%rip), %xmm0
>         testl   %esi, %esi
>         jg      .L11
> .L3:
>         vsqrtsd %xmm0, %xmm0, %xmm0
>
> the following patch XFAILs the scan.  I'm not sure what's at
> fault here, there are no loops in the CFG, but somehow
> r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> doesn't.  I guess I don't really understand what
> remove_partial_avx_dependency is supposed to do so can't
> really assess whether the pxor is necessary or not.
There's a false dependency on xmm0 when the source operand in the
pattern is memory, the pattern only takes xmm0 as dest, but the output
instruction takes xmm0 also as input(the second source operand),
that's why we need an pxor here.
When the source operand in the pattern is register_operand, we can
reuse the register_operand for the second source operand. The
instructions here are not very obvious, the more representative one
should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
>
> OK?
Can we add -fno-XXX to disable the optimization to make the assembly
more stable?
Or current codegen should be optimal(for the sinking), then Ok for the patch.

>
>         * gcc.target/i386/pr87007-5.c: Update comment, XFAIL
>         subtest.
> ---
>  gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> index a6cdf11522e..5902616d1f1 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> @@ -1,6 +1,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
> -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence.  */
> +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
> +   are sunk out of the loop and the loop is elided.  One vsqrtsd with
> +   memory operand will need a xor to avoid partial dependence.  */
>
>  #include<math.h>
>
> @@ -17,4 +19,4 @@ foo (int n, int k)
>
>  /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
>  /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> --
> 2.35.3



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-21 13:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-21 12:24 [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c Richard Biener
     [not found] <20230821122515.5A627385697D@sourceware.org>
2023-08-21 12:40 ` Hongtao Liu
2023-08-21 12:41   ` Hongtao Liu
2023-08-21 12:59   ` Richard Biener
2023-08-21 13:08     ` Hongtao Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).