* [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c @ 2023-08-21 12:24 Richard Biener 0 siblings, 0 replies; 5+ messages in thread From: Richard Biener @ 2023-08-21 12:24 UTC (permalink / raw) To: gcc-patches; +Cc: hongtao.liu, hjl.tools The following fixes the gcc.target/i386/pr87007-5.c testcase which changed code generation again after the recent sinking improvements. We now have vxorps %xmm0, %xmm0, %xmm0 vsqrtsd d2(%rip), %xmm0, %xmm0 and an unnecessary xor again in one case, the other vsqrtsd has a register source and a properly zeroing load: vmovsd d3(%rip), %xmm0 testl %esi, %esi jg .L11 .L3: vsqrtsd %xmm0, %xmm0, %xmm0 the following patch XFAILs the scan. I'm not sure what's at fault here, there are no loops in the CFG, but somehow r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF) doesn't. I guess I don't really understand what remove_partial_avx_dependency is supposed to do so can't really assess whether the pxor is necessary or not. OK? * gcc.target/i386/pr87007-5.c: Update comment, XFAIL subtest. --- gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c index a6cdf11522e..5902616d1f1 100644 --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c @@ -1,6 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */ -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence. */ +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt + are sunk out of the loop and the loop is elided. One vsqrtsd with + memory operand will need a xor to avoid partial dependence. */ #include<math.h> @@ -17,4 +19,4 @@ foo (int n, int k) /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */ /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */ -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ -- 2.35.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20230821122515.5A627385697D@sourceware.org>]
* Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c [not found] <20230821122515.5A627385697D@sourceware.org> @ 2023-08-21 12:40 ` Hongtao Liu 2023-08-21 12:41 ` Hongtao Liu 2023-08-21 12:59 ` Richard Biener 0 siblings, 2 replies; 5+ messages in thread From: Hongtao Liu @ 2023-08-21 12:40 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, hongtao.liu, hjl.tools On Mon, Aug 21, 2023 at 8:25 PM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > changed code generation again after the recent sinking improvements. > We now have > > vxorps %xmm0, %xmm0, %xmm0 > vsqrtsd d2(%rip), %xmm0, %xmm0 > > and an unnecessary xor again in one case, the other vsqrtsd has > a register source and a properly zeroing load: > > vmovsd d3(%rip), %xmm0 > testl %esi, %esi > jg .L11 > .L3: > vsqrtsd %xmm0, %xmm0, %xmm0 > > the following patch XFAILs the scan. I'm not sure what's at > fault here, there are no loops in the CFG, but somehow > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF) > doesn't. I guess I don't really understand what > remove_partial_avx_dependency is supposed to do so can't > really assess whether the pxor is necessary or not. There's a false dependency on xmm0 when the source operand in the pattern is memory, the pattern only takes xmm0 as dest, but the output instruction takes xmm0 also as input(the second source operand), that's why we need an pxor here. When the source operand in the pattern is register_operand, we can reuse the register_operand for the second source operand. The instructions here are not very obvious, the more representative one should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0. > > OK? Can we add -fno-XXX to disable the optimization to make the assembly more stable? Or current codegen should be optimal(for the sinking), then Ok for the patch. > > * gcc.target/i386/pr87007-5.c: Update comment, XFAIL > subtest. > --- > gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c > index a6cdf11522e..5902616d1f1 100644 > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c > @@ -1,6 +1,8 @@ > /* { dg-do compile } */ > /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */ > -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence. */ > +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt > + are sunk out of the loop and the loop is elided. One vsqrtsd with > + memory operand will need a xor to avoid partial dependence. */ > > #include<math.h> > > @@ -17,4 +19,4 @@ foo (int n, int k) > > /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */ > /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */ > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ > -- > 2.35.3 -- BR, Hongtao ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c 2023-08-21 12:40 ` Hongtao Liu @ 2023-08-21 12:41 ` Hongtao Liu 2023-08-21 12:59 ` Richard Biener 1 sibling, 0 replies; 5+ messages in thread From: Hongtao Liu @ 2023-08-21 12:41 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, hongtao.liu, hjl.tools On Mon, Aug 21, 2023 at 8:40 PM Hongtao Liu <crazylht@gmail.com> wrote: > > On Mon, Aug 21, 2023 at 8:25 PM Richard Biener via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > > changed code generation again after the recent sinking improvements. > > We now have > > > > vxorps %xmm0, %xmm0, %xmm0 > > vsqrtsd d2(%rip), %xmm0, %xmm0 > > > > and an unnecessary xor again in one case, the other vsqrtsd has > > a register source and a properly zeroing load: > > > > vmovsd d3(%rip), %xmm0 > > testl %esi, %esi > > jg .L11 > > .L3: > > vsqrtsd %xmm0, %xmm0, %xmm0 > > > > the following patch XFAILs the scan. I'm not sure what's at > > fault here, there are no loops in the CFG, but somehow > > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF) > > doesn't. I guess I don't really understand what > > remove_partial_avx_dependency is supposed to do so can't > > really assess whether the pxor is necessary or not. > There's a false dependency on xmm0 when the source operand in the > pattern is memory, the pattern only takes xmm0 as dest, but the output > instruction takes xmm0 also as input(the second source operand), > that's why we need an pxor here. > When the source operand in the pattern is register_operand, we can > reuse the register_operand for the second source operand. The > instructions here are not very obvious, the more representative one > should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0. And there's no false dependence here. > > > > OK? > Can we add -fno-XXX to disable the optimization to make the assembly > more stable? > Or current codegen should be optimal(for the sinking), then Ok for the patch. > > > > > * gcc.target/i386/pr87007-5.c: Update comment, XFAIL > > subtest. > > --- > > gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c > > index a6cdf11522e..5902616d1f1 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c > > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c > > @@ -1,6 +1,8 @@ > > /* { dg-do compile } */ > > /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */ > > -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence. */ > > +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt > > + are sunk out of the loop and the loop is elided. One vsqrtsd with > > + memory operand will need a xor to avoid partial dependence. */ > > > > #include<math.h> > > > > @@ -17,4 +19,4 @@ foo (int n, int k) > > > > /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */ > > /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */ > > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ > > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ > > -- > > 2.35.3 > > > > -- > BR, > Hongtao -- BR, Hongtao ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c 2023-08-21 12:40 ` Hongtao Liu 2023-08-21 12:41 ` Hongtao Liu @ 2023-08-21 12:59 ` Richard Biener 2023-08-21 13:08 ` Hongtao Liu 1 sibling, 1 reply; 5+ messages in thread From: Richard Biener @ 2023-08-21 12:59 UTC (permalink / raw) To: Hongtao Liu; +Cc: gcc-patches, hongtao.liu, hjl.tools On Mon, 21 Aug 2023, Hongtao Liu wrote: > On Mon, Aug 21, 2023 at 8:25?PM Richard Biener via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > > changed code generation again after the recent sinking improvements. > > We now have > > > > vxorps %xmm0, %xmm0, %xmm0 > > vsqrtsd d2(%rip), %xmm0, %xmm0 > > > > and an unnecessary xor again in one case, the other vsqrtsd has > > a register source and a properly zeroing load: > > > > vmovsd d3(%rip), %xmm0 > > testl %esi, %esi > > jg .L11 > > .L3: > > vsqrtsd %xmm0, %xmm0, %xmm0 > > > > the following patch XFAILs the scan. I'm not sure what's at > > fault here, there are no loops in the CFG, but somehow > > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF) > > doesn't. I guess I don't really understand what > > remove_partial_avx_dependency is supposed to do so can't > > really assess whether the pxor is necessary or not. > There's a false dependency on xmm0 when the source operand in the > pattern is memory, the pattern only takes xmm0 as dest, but the output > instruction takes xmm0 also as input(the second source operand), > that's why we need an pxor here. OK, so XFAIL is wrong, we should instead scan for one xorps then (like it was in the past). > When the source operand in the pattern is register_operand, we can > reuse the register_operand for the second source operand. The > instructions here are not very obvious, the more representative one > should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0. > > > > OK? > Can we add -fno-XXX to disable the optimization to make the assembly > more stable? Not sure, we could feed GIMPLE IR to RTL expansion instead of feeding a complex testcase through the pipeline, but I'm not sure what we were originally supposed to test (the PR trail is a bit large). > Or current codegen should be optimal(for the sinking), then Ok for the patch. So like the following (I've just adjusted the comments to reflect the pxor is necessary). OK? Richard. From 7bed9399ae736c20a677ccf7e7fc4d2751a32327 Mon Sep 17 00:00:00 2001 From: Richard Biener <rguenther@suse.de> Date: Mon, 21 Aug 2023 14:09:48 +0200 Subject: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c To: gcc-patches@gcc.gnu.org The following fixes the gcc.target/i386/pr87007-5.c testcase which changed code generation again after the recent sinking improvements. We now have vxorps %xmm0, %xmm0, %xmm0 vsqrtsd d2(%rip), %xmm0, %xmm0 and a necessary xor again in one case, the other vsqrtsd has a register source and a properly zeroing load: vmovsd d3(%rip), %xmm0 testl %esi, %esi jg .L11 .L3: vsqrtsd %xmm0, %xmm0, %xmm0 the following patch adjusts the scan. * gcc.target/i386/pr87007-5.c: Update comment, adjust subtest. --- gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c index a6cdf11522e..8f2dc947f6c 100644 --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c @@ -1,6 +1,8 @@ /* { dg-do compile } */ /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */ -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence. */ +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt + are sunk out of the loop and the loop is elided. One vsqrtsd with + memory operand needs a xor to avoid partial dependence. */ #include<math.h> @@ -17,4 +19,4 @@ foo (int n, int k) /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */ /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */ -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ -- 2.35.3 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c 2023-08-21 12:59 ` Richard Biener @ 2023-08-21 13:08 ` Hongtao Liu 0 siblings, 0 replies; 5+ messages in thread From: Hongtao Liu @ 2023-08-21 13:08 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, hongtao.liu, hjl.tools On Mon, Aug 21, 2023 at 8:59 PM Richard Biener <rguenther@suse.de> wrote: > > On Mon, 21 Aug 2023, Hongtao Liu wrote: > > > On Mon, Aug 21, 2023 at 8:25?PM Richard Biener via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > > > changed code generation again after the recent sinking improvements. > > > We now have > > > > > > vxorps %xmm0, %xmm0, %xmm0 > > > vsqrtsd d2(%rip), %xmm0, %xmm0 > > > > > > and an unnecessary xor again in one case, the other vsqrtsd has > > > a register source and a properly zeroing load: > > > > > > vmovsd d3(%rip), %xmm0 > > > testl %esi, %esi > > > jg .L11 > > > .L3: > > > vsqrtsd %xmm0, %xmm0, %xmm0 > > > > > > the following patch XFAILs the scan. I'm not sure what's at > > > fault here, there are no loops in the CFG, but somehow > > > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF) > > > doesn't. I guess I don't really understand what > > > remove_partial_avx_dependency is supposed to do so can't > > > really assess whether the pxor is necessary or not. > > There's a false dependency on xmm0 when the source operand in the > > pattern is memory, the pattern only takes xmm0 as dest, but the output > > instruction takes xmm0 also as input(the second source operand), > > that's why we need an pxor here. > > OK, so XFAIL is wrong, we should instead scan for one xorps then > (like it was in the past). > > > When the source operand in the pattern is register_operand, we can > > reuse the register_operand for the second source operand. The > > instructions here are not very obvious, the more representative one > > should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0. > > > > > > OK? > > Can we add -fno-XXX to disable the optimization to make the assembly > > more stable? > > Not sure, we could feed GIMPLE IR to RTL expansion instead of > feeding a complex testcase through the pipeline, but I'm not sure > what we were originally supposed to test (the PR trail is a bit > large). > > > Or current codegen should be optimal(for the sinking), then Ok for the patch. > > So like the following (I've just adjusted the comments to reflect the > pxor is necessary). > > OK? OK. > > Richard. > > From 7bed9399ae736c20a677ccf7e7fc4d2751a32327 Mon Sep 17 00:00:00 2001 > From: Richard Biener <rguenther@suse.de> > Date: Mon, 21 Aug 2023 14:09:48 +0200 > Subject: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c > To: gcc-patches@gcc.gnu.org > > The following fixes the gcc.target/i386/pr87007-5.c testcase which > changed code generation again after the recent sinking improvements. > We now have > > vxorps %xmm0, %xmm0, %xmm0 > vsqrtsd d2(%rip), %xmm0, %xmm0 > > and a necessary xor again in one case, the other vsqrtsd has > a register source and a properly zeroing load: > > vmovsd d3(%rip), %xmm0 > testl %esi, %esi > jg .L11 > .L3: > vsqrtsd %xmm0, %xmm0, %xmm0 > > the following patch adjusts the scan. > > * gcc.target/i386/pr87007-5.c: Update comment, adjust subtest. > --- > gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c > index a6cdf11522e..8f2dc947f6c 100644 > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c > @@ -1,6 +1,8 @@ > /* { dg-do compile } */ > /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */ > -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to avoid partial dependence. */ > +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt > + are sunk out of the loop and the loop is elided. One vsqrtsd with > + memory operand needs a xor to avoid partial dependence. */ > > #include<math.h> > > @@ -17,4 +19,4 @@ foo (int n, int k) > > /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */ > /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */ > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ > -- > 2.35.3 > -- BR, Hongtao ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-21 13:09 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-08-21 12:24 [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c Richard Biener [not found] <20230821122515.5A627385697D@sourceware.org> 2023-08-21 12:40 ` Hongtao Liu 2023-08-21 12:41 ` Hongtao Liu 2023-08-21 12:59 ` Richard Biener 2023-08-21 13:08 ` Hongtao Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).