* [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale
@ 2019-10-23 2:25 Hongtao Liu
2019-10-23 6:44 ` Hongtao Liu
0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2019-10-23 2:25 UTC (permalink / raw)
To: Uros Bizjak, GCC Patches; +Cc: H. J. Lu, yanguan.wang
[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]
Hi uros:
This patch fixes false dependence of scalar operations
vrcp/vsqrt/vrsqrt/vrndscale.
Bootstrap ok, regression test on i386/x86 ok.
It does something like this:
-----
For scalar instructions with both xmm operands:
op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
for scalar instructions with one mem or gpr operand:
op mem/gpr, %xmmQ, %xmmQ
---> using pass rpad ---->
xorps %xmmN, %xmmN, %xxN
op mem/gpr, %xmmN, %xmmQ
Performance influence of SPEC2017 fprate which is tested on SKX
503.bwaves_r -0.03%
507.cactuBSSN_r -0.22%
508.namd_r -0.02%
510.parest_r 0.37%
511.povray_r 0.74%
519.lbm_r 0.24%
521.wrf_r 2.35%
526.blender_r 0.71%
527.cam4_r 0.65%
538.imagick_r 0.95%
544.nab_r -0.37
549.fotonik3d_r 0.24%
554.roms_r 0.90%
fprate geomean 0.50%
-----
Changelog
gcc/
* config/i386/i386.md (*rcpsf2_sse): Add
avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
(*rsqrtsf2_sse): Ditto.
(*sqrt<mode>2_sse): Ditto.
(sse4_1_round<mode>2): separate constraint vm, add
avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
* config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
by pass rpad.
(*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
Ditto.
(*sse_vmrsqrtv4sf2): Ditto.
(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
(*sse4_1_round<ssescalarmodesuffix>): Ditto.
gcc/testsuite
* gcc.target/i386/pr87007-4.c: New test.
* gcc.target/i386/pr87007-5.c: Ditto.
--
BR,
Hongtao
[-- Attachment #2: 0001-Fix-false-dependence-of-scalar-operation-vrcp-vsqrt-.patch --]
[-- Type: text/x-patch, Size: 11436 bytes --]
From 2db08bff3fb9e2720c6c57a52e6f51c990d1a57f Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 9 Oct 2019 11:21:25 +0800
Subject: [PATCH] Fix false dependence of scalar operation
vrcp/vsqrt/vrsqrt/vrndscale
For instructions with xmm operand:
op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
for instruction with mem operand or gpr operand:
op mem/gpr, %xmmQ, %xmmQ
---> using pass rpad ---->
xorps %xmmN, %xmmN, %xxN
op mem/gpr, %xmmN, %xmmQ
Performance influence of SPEC2017 fprate which is tested on SKX
----
503.bwaves_r -0.03%
507.cactuBSSN_r -0.22%
508.namd_r -0.02%
510.parest_r 0.37%
511.povray_r 0.74%
519.lbm_r 0.24%
521.wrf_r 2.35%
526.blender_r 0.71%
527.cam4_r 0.65%
538.imagick_r 0.95%
544.nab_r -0.37
549.fotonik3d_r 0.24%
554.roms_r 0.90%
fprate geomean 0.50%
-----
Changelog
gcc/
* config/i386/i386.md (*rcpsf2_sse): Add
avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
(*rsqrtsf2_sse): Ditto.
(*sqrt<mode>2_sse): Ditto.
(sse4_1_round<mode>2): separate constraint vm, add
avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
* config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
by pass rpad.
(*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
Ditto.
(*sse_vmrsqrtv4sf2): Ditto.
(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
(*sse4_1_round<ssescalarmodesuffix>): Ditto.
gcc/testsuite
* gcc.target/i386/pr87007-4.c: New test.
* gcc.target/i386/pr87007-5.c: Ditto.
---
gcc/config/i386/i386.md | 27 ++++---
gcc/config/i386/sse.md | 95 +++++++++++++++++++++++
gcc/testsuite/gcc.target/i386/pr87007-4.c | 18 +++++
gcc/testsuite/gcc.target/i386/pr87007-5.c | 18 +++++
4 files changed, 147 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-5.c
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5e0795953d8..ab785d3d6d7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14843,11 +14843,12 @@
(set_attr "btver2_sse_attr" "rcp")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SF")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
@@ -15089,11 +15090,12 @@
(set_attr "btver2_sse_attr" "rcp")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SF")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
@@ -15120,12 +15122,13 @@
(set_attr "atom_sse_attr" "sqrt")
(set_attr "btver2_sse_attr" "sqrt")
(set_attr "prefix" "maybe_vex")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set_attr "mode" "<MODE>")
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
@@ -16261,28 +16264,30 @@
\f
(define_insn "sse4_1_round<mode>2"
- [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v")
+ [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v,v")
(unspec:MODEF
- [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,vm")
- (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n")]
+ [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,v,m")
+ (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n,n")]
UNSPEC_ROUND))]
"TARGET_SSE4_1"
"@
%vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
%vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
%vround<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}
+ vrndscale<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
vrndscale<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}"
[(set_attr "type" "ssecvt")
- (set_attr "prefix_extra" "1,1,1,*")
- (set_attr "length_immediate" "*,*,*,1")
- (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex")
- (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f")
+ (set_attr "prefix_extra" "1,1,1,*,*")
+ (set_attr "length_immediate" "*,*,*,1,1")
+ (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
+ (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
+ (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
(set_attr "mode" "<MODE>")
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 403e91d4b17..d2e157df81f 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2035,6 +2035,25 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SF")])
+(define_insn "*sse_vmrcpv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RCP))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rcpss\t{%1, %0|%0, %k1}
+ vrcpss\t{%1, %2, %0|%0, %2, %k1}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "atom_sse_attr" "rcp")
+ (set_attr "btver2_sse_attr" "rcp")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "SF")])
+
(define_insn "<mask_codefor>rcp14<mode><mask_name>"
[(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
(unspec:VF_AVX512VL
@@ -2130,6 +2149,25 @@
(set_attr "btver2_sse_attr" "sqrt")
(set_attr "mode" "<ssescalarmode>")])
+(define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=x,v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (sqrt:<ssescalarmode>
+ (match_operand:<ssescalarmode> 1 "vector_operand" "xBm,<round_scalar_constraint>")))
+ (match_operand:VF_128 2 "register_operand" "0,v")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
+ vsqrt<ssescalarmodesuffix>\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1<round_scalar_mask_op3>}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "atom_sse_attr" "sqrt")
+ (set_attr "prefix" "<round_scalar_prefix>")
+ (set_attr "btver2_sse_attr" "sqrt")
+ (set_attr "mode" "<ssescalarmode>")])
+
(define_expand "rsqrt<mode>2"
[(set (match_operand:VF1_128_256 0 "register_operand")
(unspec:VF1_128_256
@@ -2219,6 +2257,23 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SF")])
+(define_insn "*sse_vmrsqrtv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RSQRT))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rsqrtss\t{%1, %0|%0, %k1}
+ vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "SF")])
+
(define_expand "<code><mode>3<mask_name><round_saeonly_name>"
[(set (match_operand:VF 0 "register_operand")
(smaxmin:VF
@@ -9709,6 +9764,22 @@
(set_attr "prefix" "evex")
(set_attr "mode" "<MODE>")])
+(define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+ (match_operand:SI 3 "const_0_to_255_operand")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "v")
+ (const_int 1)))]
+ "TARGET_AVX512F"
+ "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"
+ [(set_attr "length_immediate" "1")
+ (set_attr "prefix" "evex")
+ (set_attr "mode" "<MODE>")])
+
;; One bit in mask selects 2 elements.
(define_insn "avx512f_shufps512_1<mask_name>"
[(set (match_operand:V16SF 0 "register_operand" "=v")
@@ -17973,6 +18044,30 @@
(set_attr "prefix" "orig,orig,vex,evex")
(set_attr "mode" "<MODE>")])
+(define_insn "*sse4_1_round<ssescalarmodesuffix>"
+ [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2 "register_operand" "Yr,*x,x,v")
+ (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "0,0,x,v")
+ (const_int 1)))]
+ "TARGET_SSE4_1"
+ "@
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
+ vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
+ vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+ [(set_attr "isa" "noavx,noavx,avx,avx512f")
+ (set_attr "type" "ssecvt")
+ (set_attr "length_immediate" "1")
+ (set_attr "prefix_data16" "1,1,*,*")
+ (set_attr "prefix_extra" "1")
+ (set_attr "prefix" "orig,orig,vex,evex")
+ (set_attr "mode" "<MODE>")])
+
(define_expand "round<mode>2"
[(set (match_dup 3)
(plus:VF
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
new file mode 100644
index 00000000000..e91bdcbac44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+ for (int i = 0; i != n; i++)
+ if(i < k)
+ d1 = floor (d2);
+ else
+ d1 = ceil (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
new file mode 100644
index 00000000000..20d13cf650b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+ for (int i = 0; i != n; i++)
+ if(i < k)
+ d1 = sqrt (d2);
+ else
+ d1 = sqrt (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
--
2.19.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale
2019-10-23 2:25 [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale Hongtao Liu
@ 2019-10-23 6:44 ` Hongtao Liu
2019-10-24 19:12 ` Uros Bizjak
0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2019-10-23 6:44 UTC (permalink / raw)
To: Uros Bizjak, GCC Patches
[-- Attachment #1: Type: text/plain, Size: 2195 bytes --]
Update patch:
Add m constraint to define_insn (sse_1_round<ssescalaemodesuffix,
*sse_1_round<ssescalaemodesuffix) to fix some unrecognized insn error
when under sse4 but not avx512f.
Changelog:
gcc/
* config/i386/sse.md: (sse4_1_round<ssescalarmodesuffix>):
Change constraint x to xm
since vround support memory operand.
* (*sse4_1_round<ssescalarmodesuffix>): Ditto.
Bootstrap and regression test ok.
On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Hi uros:
> This patch fixes false dependence of scalar operations
> vrcp/vsqrt/vrsqrt/vrndscale.
> Bootstrap ok, regression test on i386/x86 ok.
>
> It does something like this:
> -----
> For scalar instructions with both xmm operands:
>
> op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
>
> for scalar instructions with one mem or gpr operand:
>
> op mem/gpr, %xmmQ, %xmmQ
>
> ---> using pass rpad ---->
>
> xorps %xmmN, %xmmN, %xxN
> op mem/gpr, %xmmN, %xmmQ
>
> Performance influence of SPEC2017 fprate which is tested on SKX
>
> 503.bwaves_r -0.03%
> 507.cactuBSSN_r -0.22%
> 508.namd_r -0.02%
> 510.parest_r 0.37%
> 511.povray_r 0.74%
> 519.lbm_r 0.24%
> 521.wrf_r 2.35%
> 526.blender_r 0.71%
> 527.cam4_r 0.65%
> 538.imagick_r 0.95%
> 544.nab_r -0.37
> 549.fotonik3d_r 0.24%
> 554.roms_r 0.90%
> fprate geomean 0.50%
> -----
>
> Changelog
> gcc/
> * config/i386/i386.md (*rcpsf2_sse): Add
> avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> (*rsqrtsf2_sse): Ditto.
> (*sqrt<mode>2_sse): Ditto.
> (sse4_1_round<mode>2): separate constraint vm, add
> avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> by pass rpad.
> (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
> Ditto.
> (*sse_vmrsqrtv4sf2): Ditto.
> (*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
> (*sse4_1_round<ssescalarmodesuffix>): Ditto.
>
> gcc/testsuite
> * gcc.target/i386/pr87007-4.c: New test.
> * gcc.target/i386/pr87007-5.c: Ditto.
>
>
> --
> BR,
> Hongtao
--
BR,
Hongtao
[-- Attachment #2: 0001-Fix-false-dependence-of-scalar-operation-vrcp-vsqrt-V2.patch --]
[-- Type: text/x-patch, Size: 11940 bytes --]
From 23c11846a2dd3eeed32174a7334eb888eb576353 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 9 Oct 2019 11:21:25 +0800
Subject: [PATCH] Fix false dependence of scalar operation
vrcp/vsqrt/vrsqrt/vrndscale
For instructions with xmm operand:
op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
for instruction with mem operand or gpr operand:
op mem/gpr, %xmmQ, %xmmQ
---> using pass rpad ---->
xorps %xmmN, %xmmN, %xxN
op mem/gpr, %xmmN, %xmmQ
Performance influence of SPEC2017 fprate which is tested on SKX
----
503.bwaves_r -0.03%
507.cactuBSSN_r -0.22%
508.namd_r -0.02%
510.parest_r 0.37%
511.povray_r 0.74%
519.lbm_r 0.24%
521.wrf_r 2.35%
526.blender_r 0.71%
527.cam4_r 0.65%
538.imagick_r 0.95%
544.nab_r -0.37
549.fotonik3d_r 0.24%
554.roms_r 0.90%
fprate geomean 0.50%
-----
Changelog
gcc/
* config/i386/i386.md (*rcpsf2_sse): Add
avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
(*rsqrtsf2_sse): Ditto.
(*sqrt<mode>2_sse): Ditto.
(sse4_1_round<mode>2): separate constraint vm, add
avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
* config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
by pass rpad.
(*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
Ditto.
(*sse_vmrsqrtv4sf2): Ditto.
(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
(*sse4_1_round<ssescalarmodesuffix>): Ditto.
(sse4_1_round<ssescalarmodesuffix>): Change constraint x to xm
since vround support memory operand.
gcc/testsuite
* gcc.target/i386/pr87007-4.c: New test.
* gcc.target/i386/pr87007-5.c: Ditto.
---
gcc/config/i386/i386.md | 27 ++++---
gcc/config/i386/sse.md | 97 ++++++++++++++++++++++-
gcc/testsuite/gcc.target/i386/pr87007-4.c | 18 +++++
gcc/testsuite/gcc.target/i386/pr87007-5.c | 18 +++++
4 files changed, 148 insertions(+), 12 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-5.c
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5e0795953d8..ab785d3d6d7 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14843,11 +14843,12 @@
(set_attr "btver2_sse_attr" "rcp")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SF")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
@@ -15089,11 +15090,12 @@
(set_attr "btver2_sse_attr" "rcp")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SF")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
@@ -15120,12 +15122,13 @@
(set_attr "atom_sse_attr" "sqrt")
(set_attr "btver2_sse_attr" "sqrt")
(set_attr "prefix" "maybe_vex")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set_attr "mode" "<MODE>")
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
@@ -16261,28 +16264,30 @@
\f
(define_insn "sse4_1_round<mode>2"
- [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v")
+ [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v,v")
(unspec:MODEF
- [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,vm")
- (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n")]
+ [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,v,m")
+ (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n,n")]
UNSPEC_ROUND))]
"TARGET_SSE4_1"
"@
%vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
%vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
%vround<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}
+ vrndscale<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
vrndscale<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}"
[(set_attr "type" "ssecvt")
- (set_attr "prefix_extra" "1,1,1,*")
- (set_attr "length_immediate" "*,*,*,1")
- (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex")
- (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f")
+ (set_attr "prefix_extra" "1,1,1,*,*")
+ (set_attr "length_immediate" "*,*,*,1,1")
+ (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
+ (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
+ (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
(set_attr "mode" "<MODE>")
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 403e91d4b17..4a04dc3362b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2035,6 +2035,25 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SF")])
+(define_insn "*sse_vmrcpv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RCP))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rcpss\t{%1, %0|%0, %k1}
+ vrcpss\t{%1, %2, %0|%0, %2, %k1}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "atom_sse_attr" "rcp")
+ (set_attr "btver2_sse_attr" "rcp")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "SF")])
+
(define_insn "<mask_codefor>rcp14<mode><mask_name>"
[(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
(unspec:VF_AVX512VL
@@ -2130,6 +2149,25 @@
(set_attr "btver2_sse_attr" "sqrt")
(set_attr "mode" "<ssescalarmode>")])
+(define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=x,v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (sqrt:<ssescalarmode>
+ (match_operand:<ssescalarmode> 1 "vector_operand" "xBm,<round_scalar_constraint>")))
+ (match_operand:VF_128 2 "register_operand" "0,v")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
+ vsqrt<ssescalarmodesuffix>\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1<round_scalar_mask_op3>}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "atom_sse_attr" "sqrt")
+ (set_attr "prefix" "<round_scalar_prefix>")
+ (set_attr "btver2_sse_attr" "sqrt")
+ (set_attr "mode" "<ssescalarmode>")])
+
(define_expand "rsqrt<mode>2"
[(set (match_operand:VF1_128_256 0 "register_operand")
(unspec:VF1_128_256
@@ -2219,6 +2257,23 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SF")])
+(define_insn "*sse_vmrsqrtv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RSQRT))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rsqrtss\t{%1, %0|%0, %k1}
+ vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "SF")])
+
(define_expand "<code><mode>3<mask_name><round_saeonly_name>"
[(set (match_operand:VF 0 "register_operand")
(smaxmin:VF
@@ -9709,6 +9764,22 @@
(set_attr "prefix" "evex")
(set_attr "mode" "<MODE>")])
+(define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+ (match_operand:SI 3 "const_0_to_255_operand")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "v")
+ (const_int 1)))]
+ "TARGET_AVX512F"
+ "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"
+ [(set_attr "length_immediate" "1")
+ (set_attr "prefix" "evex")
+ (set_attr "mode" "<MODE>")])
+
;; One bit in mask selects 2 elements.
(define_insn "avx512f_shufps512_1<mask_name>"
[(set (match_operand:V16SF 0 "register_operand" "=v")
@@ -17954,7 +18025,7 @@
[(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
(vec_merge:VF_128
(unspec:VF_128
- [(match_operand:VF_128 2 "register_operand" "Yr,*x,x,v")
+ [(match_operand:VF_128 2 "register_operand" "Yr,*x,xm,v")
(match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
UNSPEC_ROUND)
(match_operand:VF_128 1 "register_operand" "0,0,x,v")
@@ -17973,6 +18044,30 @@
(set_attr "prefix" "orig,orig,vex,evex")
(set_attr "mode" "<MODE>")])
+(define_insn "*sse4_1_round<ssescalarmodesuffix>"
+ [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yr,*x,xm,v")
+ (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "0,0,x,v")
+ (const_int 1)))]
+ "TARGET_SSE4_1"
+ "@
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
+ vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
+ vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+ [(set_attr "isa" "noavx,noavx,avx,avx512f")
+ (set_attr "type" "ssecvt")
+ (set_attr "length_immediate" "1")
+ (set_attr "prefix_data16" "1,1,*,*")
+ (set_attr "prefix_extra" "1")
+ (set_attr "prefix" "orig,orig,vex,evex")
+ (set_attr "mode" "<MODE>")])
+
(define_expand "round<mode>2"
[(set (match_dup 3)
(plus:VF
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
new file mode 100644
index 00000000000..e91bdcbac44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+ for (int i = 0; i != n; i++)
+ if(i < k)
+ d1 = floor (d2);
+ else
+ d1 = ceil (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
new file mode 100644
index 00000000000..20d13cf650b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+ for (int i = 0; i != n; i++)
+ if(i < k)
+ d1 = sqrt (d2);
+ else
+ d1 = sqrt (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
--
2.19.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale
2019-10-23 6:44 ` Hongtao Liu
@ 2019-10-24 19:12 ` Uros Bizjak
2019-10-25 5:20 ` Hongtao Liu
0 siblings, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2019-10-24 19:12 UTC (permalink / raw)
To: Hongtao Liu; +Cc: GCC Patches
On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Update patch:
> Add m constraint to define_insn (sse_1_round<ssescalaemodesuffix,
> *sse_1_round<ssescalaemodesuffix) to fix some unrecognized insn error
> when under sse4 but not avx512f.
It looks to me that the original insn is incompletely defined. It
should use nonimmediate_operand, "m" constraint and <iptr> pointer
size modifier. Something like:
(define_insn "sse4_1_round<ssescalarmodesuffix>"
[(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
(vec_merge:VF_128
(unspec:VF_128
[(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
(match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
UNSPEC_ROUND)
(match_operand:VF_128 1 "register_operand" "0,0,x,v")
(const_int 1)))]
"TARGET_SSE4_1"
"@
round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
>
> Changelog:
> gcc/
> * config/i386/sse.md: (sse4_1_round<ssescalarmodesuffix>):
> Change constraint x to xm
> since vround support memory operand.
> * (*sse4_1_round<ssescalarmodesuffix>): Ditto.
>
> Bootstrap and regression test ok.
>
> On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > Hi uros:
> > This patch fixes false dependence of scalar operations
> > vrcp/vsqrt/vrsqrt/vrndscale.
> > Bootstrap ok, regression test on i386/x86 ok.
> >
> > It does something like this:
> > -----
> > For scalar instructions with both xmm operands:
> >
> > op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
> >
> > for scalar instructions with one mem or gpr operand:
> >
> > op mem/gpr, %xmmQ, %xmmQ
> >
> > ---> using pass rpad ---->
> >
> > xorps %xmmN, %xmmN, %xxN
> > op mem/gpr, %xmmN, %xmmQ
> >
> > Performance influence of SPEC2017 fprate which is tested on SKX
> >
> > 503.bwaves_r -0.03%
> > 507.cactuBSSN_r -0.22%
> > 508.namd_r -0.02%
> > 510.parest_r 0.37%
> > 511.povray_r 0.74%
> > 519.lbm_r 0.24%
> > 521.wrf_r 2.35%
> > 526.blender_r 0.71%
> > 527.cam4_r 0.65%
> > 538.imagick_r 0.95%
> > 544.nab_r -0.37
> > 549.fotonik3d_r 0.24%
> > 554.roms_r 0.90%
> > fprate geomean 0.50%
> > -----
> >
> > Changelog
> > gcc/
> > * config/i386/i386.md (*rcpsf2_sse): Add
> > avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > (*rsqrtsf2_sse): Ditto.
> > (*sqrt<mode>2_sse): Ditto.
> > (sse4_1_round<mode>2): separate constraint vm, add
> > avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> > by pass rpad.
> > (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
> > Ditto.
> > (*sse_vmrsqrtv4sf2): Ditto.
> > (*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
> > (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> >
> > gcc/testsuite
> > * gcc.target/i386/pr87007-4.c: New test.
> > * gcc.target/i386/pr87007-5.c: Ditto.
> >
> >
> > --
> > BR,
> > Hongtao
(set (attr "preferred_for_speed")
(cond [(eq_attr "alternative" "1")
(symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
(eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
This can be written as:
(set (attr "preferred_for_speed")
(cond [(match_test "TARGET_AVX")
(symbol_ref "true")
(eq_attr "alternative" "1,2")
(symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
]
(symbol_ref "true")))])
Uros.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale
2019-10-24 19:12 ` Uros Bizjak
@ 2019-10-25 5:20 ` Hongtao Liu
2019-10-25 6:04 ` Hongtao Liu
0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2019-10-25 5:20 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > Update patch:
> > Add m constraint to define_insn (sse_1_round<ssescalaemodesuffix,
> > *sse_1_round<ssescalaemodesuffix) to fix some unrecognized insn error
> > when under sse4 but not avx512f.
>
> It looks to me that the original insn is incompletely defined. It
> should use nonimmediate_operand, "m" constraint and <iptr> pointer
> size modifier. Something like:
>
> (define_insn "sse4_1_round<ssescalarmodesuffix>"
> [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> (vec_merge:VF_128
> (unspec:VF_128
> [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> UNSPEC_ROUND)
> (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> (const_int 1)))]
> "TARGET_SSE4_1"
> "@
> round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
>
> >
> > Changelog:
> > gcc/
> > * config/i386/sse.md: (sse4_1_round<ssescalarmodesuffix>):
> > Change constraint x to xm
> > since vround support memory operand.
> > * (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> >
> > Bootstrap and regression test ok.
> >
> > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > Hi uros:
> > > This patch fixes false dependence of scalar operations
> > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > Bootstrap ok, regression test on i386/x86 ok.
> > >
> > > It does something like this:
> > > -----
> > > For scalar instructions with both xmm operands:
> > >
> > > op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
> > >
> > > for scalar instructions with one mem or gpr operand:
> > >
> > > op mem/gpr, %xmmQ, %xmmQ
> > >
> > > ---> using pass rpad ---->
> > >
> > > xorps %xmmN, %xmmN, %xxN
> > > op mem/gpr, %xmmN, %xmmQ
> > >
> > > Performance influence of SPEC2017 fprate which is tested on SKX
> > >
> > > 503.bwaves_r -0.03%
> > > 507.cactuBSSN_r -0.22%
> > > 508.namd_r -0.02%
> > > 510.parest_r 0.37%
> > > 511.povray_r 0.74%
> > > 519.lbm_r 0.24%
> > > 521.wrf_r 2.35%
> > > 526.blender_r 0.71%
> > > 527.cam4_r 0.65%
> > > 538.imagick_r 0.95%
> > > 544.nab_r -0.37
> > > 549.fotonik3d_r 0.24%
> > > 554.roms_r 0.90%
> > > fprate geomean 0.50%
> > > -----
> > >
> > > Changelog
> > > gcc/
> > > * config/i386/i386.md (*rcpsf2_sse): Add
> > > avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > > (*rsqrtsf2_sse): Ditto.
> > > (*sqrt<mode>2_sse): Ditto.
> > > (sse4_1_round<mode>2): separate constraint vm, add
> > > avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> > > by pass rpad.
> > > (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
> > > Ditto.
> > > (*sse_vmrsqrtv4sf2): Ditto.
> > > (*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
> > > (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > >
> > > gcc/testsuite
> > > * gcc.target/i386/pr87007-4.c: New test.
> > > * gcc.target/i386/pr87007-5.c: Ditto.
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
> (set (attr "preferred_for_speed")
> (cond [(eq_attr "alternative" "1")
> (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> (eq_attr "alternative" "2")
> - (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> + (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> ]
> (symbol_ref "true")))])
>
> This can be written as:
>
> (set (attr "preferred_for_speed")
> (cond [(match_test "TARGET_AVX")
> (symbol_ref "true")
> (eq_attr "alternative" "1,2")
> (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> ]
> (symbol_ref "true")))])
>
> Uros.
Yes, after these fixed, i'll upstream to trunk, ok?
--
BR,
Hongtao
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale
2019-10-25 5:20 ` Hongtao Liu
@ 2019-10-25 6:04 ` Hongtao Liu
2019-10-25 8:03 ` Uros Bizjak
0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2019-10-25 6:04 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 4651 bytes --]
On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > Update patch:
> > > Add m constraint to define_insn (sse_1_round<ssescalaemodesuffix,
> > > *sse_1_round<ssescalaemodesuffix) to fix some unrecognized insn error
> > > when under sse4 but not avx512f.
> >
> > It looks to me that the original insn is incompletely defined. It
> > should use nonimmediate_operand, "m" constraint and <iptr> pointer
> > size modifier. Something like:
> >
> > (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > (vec_merge:VF_128
> > (unspec:VF_128
> > [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> > UNSPEC_ROUND)
> > (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > (const_int 1)))]
> > "TARGET_SSE4_1"
> > "@
> > round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> >
> > >
> > > Changelog:
> > > gcc/
> > > * config/i386/sse.md: (sse4_1_round<ssescalarmodesuffix>):
> > > Change constraint x to xm
> > > since vround support memory operand.
> > > * (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > >
> > > Bootstrap and regression test ok.
> > >
> > > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > Hi uros:
> > > > This patch fixes false dependence of scalar operations
> > > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > > Bootstrap ok, regression test on i386/x86 ok.
> > > >
> > > > It does something like this:
> > > > -----
> > > > For scalar instructions with both xmm operands:
> > > >
> > > > op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
> > > >
> > > > for scalar instructions with one mem or gpr operand:
> > > >
> > > > op mem/gpr, %xmmQ, %xmmQ
> > > >
> > > > ---> using pass rpad ---->
> > > >
> > > > xorps %xmmN, %xmmN, %xxN
> > > > op mem/gpr, %xmmN, %xmmQ
> > > >
> > > > Performance influence of SPEC2017 fprate which is tested on SKX
> > > >
> > > > 503.bwaves_r -0.03%
> > > > 507.cactuBSSN_r -0.22%
> > > > 508.namd_r -0.02%
> > > > 510.parest_r 0.37%
> > > > 511.povray_r 0.74%
> > > > 519.lbm_r 0.24%
> > > > 521.wrf_r 2.35%
> > > > 526.blender_r 0.71%
> > > > 527.cam4_r 0.65%
> > > > 538.imagick_r 0.95%
> > > > 544.nab_r -0.37
> > > > 549.fotonik3d_r 0.24%
> > > > 554.roms_r 0.90%
> > > > fprate geomean 0.50%
> > > > -----
> > > >
> > > > Changelog
> > > > gcc/
> > > > * config/i386/i386.md (*rcpsf2_sse): Add
> > > > avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > > > (*rsqrtsf2_sse): Ditto.
> > > > (*sqrt<mode>2_sse): Ditto.
> > > > (sse4_1_round<mode>2): separate constraint vm, add
> > > > avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > > > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> > > > by pass rpad.
> > > > (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
> > > > Ditto.
> > > > (*sse_vmrsqrtv4sf2): Ditto.
> > > > (*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
> > > > (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > > >
> > > > gcc/testsuite
> > > > * gcc.target/i386/pr87007-4.c: New test.
> > > > * gcc.target/i386/pr87007-5.c: Ditto.
> > > >
> > > >
> > > > --
> > > > BR,
> > > > Hongtao
> >
> > (set (attr "preferred_for_speed")
> > (cond [(eq_attr "alternative" "1")
> > (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > (eq_attr "alternative" "2")
> > - (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > + (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > ]
> > (symbol_ref "true")))])
> >
> > This can be written as:
> >
> > (set (attr "preferred_for_speed")
> > (cond [(match_test "TARGET_AVX")
> > (symbol_ref "true")
> > (eq_attr "alternative" "1,2")
> > (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > ]
> > (symbol_ref "true")))])
> >
> > Uros.
>
> Yes, after these fixed, i'll upstream to trunk, ok?
Update patch.
> --
> BR,
> Hongtao
--
BR,
Hongtao
[-- Attachment #2: 0001-Fix-false-dependence-of-scalar-operation-vrcp-vsqrt-V3.patch --]
[-- Type: text/x-patch, Size: 13363 bytes --]
From 1892f7b52ea0c5b59d3d0c9e50330f70712fc9cc Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 9 Oct 2019 11:21:25 +0800
Subject: [PATCH] Fix false dependence of scalar operation
vrcp/vsqrt/vrsqrt/vrndscale
For instructions with xmm operand:
op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
for instruction with mem operand or gpr operand:
op mem/gpr, %xmmQ, %xmmQ
---> using pass rpad ---->
xorps %xmmN, %xmmN, %xxN
op mem/gpr, %xmmN, %xmmQ
Performance influence of SPEC2017 fprate which is tested on SKX
----
503.bwaves_r -0.03%
507.cactuBSSN_r -0.22%
508.namd_r -0.02%
510.parest_r 0.37%
511.povray_r 0.74%
519.lbm_r 0.24%
521.wrf_r 2.35%
526.blender_r 0.71%
527.cam4_r 0.65%
538.imagick_r 0.95%
544.nab_r -0.37
549.fotonik3d_r 0.24%
554.roms_r 0.90%
fprate geomean 0.50%
-----
Changelog
gcc/
* config/i386/i386.md (*rcpsf2_sse): Add
avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
(*rsqrtsf2_sse): Ditto.
(*sqrt<mode>2_sse): Ditto.
(sse4_1_round<mode>2): separate constraint vm, add
avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
* config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
by pass rpad.
(*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
Ditto.
(*sse_vmrsqrtv4sf2): Ditto.
(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
(*sse4_1_round<ssescalarmodesuffix>): Ditto.
(sse4_1_round<ssescalarmodesuffix>): Add m constraint and
<iptr> pointer size modifier since vround support memory operand.
gcc/testsuite
* gcc.target/i386/pr87007-4.c: New test.
* gcc.target/i386/pr87007-5.c: Ditto.
---
gcc/config/i386/i386.md | 67 +++++++-------
gcc/config/i386/sse.md | 105 ++++++++++++++++++++--
gcc/testsuite/gcc.target/i386/pr87007-4.c | 18 ++++
gcc/testsuite/gcc.target/i386/pr87007-5.c | 18 ++++
4 files changed, 172 insertions(+), 36 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-5.c
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5e0795953d8..fb2235a5e34 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14843,13 +14843,14 @@
(set_attr "btver2_sse_attr" "rcp")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SF")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
- (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- (eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- ]
- (symbol_ref "true")))])
+ (cond [(match_test "TARGET_AVX")
+ (symbol_ref "true")
+ (eq_attr "alternative" "1,2")
+ (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ ]
+ (symbol_ref "true")))])
(define_insn "*fop_xf_1_i387"
[(set (match_operand:XF 0 "register_operand" "=f,f")
@@ -15089,13 +15090,14 @@
(set_attr "btver2_sse_attr" "rcp")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SF")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
- (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- (eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- ]
- (symbol_ref "true")))])
+ (cond [(match_test "TARGET_AVX")
+ (symbol_ref "true")
+ (eq_attr "alternative" "1,2")
+ (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ ]
+ (symbol_ref "true")))])
(define_expand "rsqrtsf2"
[(set (match_operand:SF 0 "register_operand")
@@ -15120,14 +15122,15 @@
(set_attr "atom_sse_attr" "sqrt")
(set_attr "btver2_sse_attr" "sqrt")
(set_attr "prefix" "maybe_vex")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set_attr "mode" "<MODE>")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
- (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- (eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- ]
- (symbol_ref "true")))])
+ (cond [(match_test "TARGET_AVX")
+ (symbol_ref "true")
+ (eq_attr "alternative" "1,2")
+ (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ ]
+ (symbol_ref "true")))])
(define_expand "sqrt<mode>2"
[(set (match_operand:MODEF 0 "register_operand")
@@ -16261,30 +16264,32 @@
\f
(define_insn "sse4_1_round<mode>2"
- [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v")
+ [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v,v")
(unspec:MODEF
- [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,vm")
- (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n")]
+ [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,v,m")
+ (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n,n")]
UNSPEC_ROUND))]
"TARGET_SSE4_1"
"@
%vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
%vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
%vround<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}
+ vrndscale<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
vrndscale<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}"
[(set_attr "type" "ssecvt")
- (set_attr "prefix_extra" "1,1,1,*")
- (set_attr "length_immediate" "*,*,*,1")
- (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex")
- (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f")
+ (set_attr "prefix_extra" "1,1,1,*,*")
+ (set_attr "length_immediate" "*,*,*,1,1")
+ (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
+ (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
+ (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
(set_attr "mode" "<MODE>")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
- (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- (eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- ]
- (symbol_ref "true")))])
+ (cond [(match_test "TARGET_AVX")
+ (symbol_ref "true")
+ (eq_attr "alternative" "1,2")
+ (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ ]
+ (symbol_ref "true")))])
(define_insn "rintxf2"
[(set (match_operand:XF 0 "register_operand" "=f")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 403e91d4b17..a0d611c5128 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2035,6 +2035,25 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SF")])
+(define_insn "*sse_vmrcpv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RCP))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rcpss\t{%1, %0|%0, %k1}
+ vrcpss\t{%1, %2, %0|%0, %2, %k1}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "atom_sse_attr" "rcp")
+ (set_attr "btver2_sse_attr" "rcp")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "SF")])
+
(define_insn "<mask_codefor>rcp14<mode><mask_name>"
[(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
(unspec:VF_AVX512VL
@@ -2130,6 +2149,25 @@
(set_attr "btver2_sse_attr" "sqrt")
(set_attr "mode" "<ssescalarmode>")])
+(define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=x,v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (sqrt:<ssescalarmode>
+ (match_operand:<ssescalarmode> 1 "vector_operand" "xBm,<round_scalar_constraint>")))
+ (match_operand:VF_128 2 "register_operand" "0,v")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
+ vsqrt<ssescalarmodesuffix>\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %<iptr>1<round_scalar_mask_op3>}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "atom_sse_attr" "sqrt")
+ (set_attr "prefix" "<round_scalar_prefix>")
+ (set_attr "btver2_sse_attr" "sqrt")
+ (set_attr "mode" "<ssescalarmode>")])
+
(define_expand "rsqrt<mode>2"
[(set (match_operand:VF1_128_256 0 "register_operand")
(unspec:VF1_128_256
@@ -2219,6 +2257,23 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SF")])
+(define_insn "*sse_vmrsqrtv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RSQRT))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rsqrtss\t{%1, %0|%0, %k1}
+ vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "SF")])
+
(define_expand "<code><mode>3<mask_name><round_saeonly_name>"
[(set (match_operand:VF 0 "register_operand")
(smaxmin:VF
@@ -9709,6 +9764,22 @@
(set_attr "prefix" "evex")
(set_attr "mode" "<MODE>")])
+(define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+ (match_operand:SI 3 "const_0_to_255_operand")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "v")
+ (const_int 1)))]
+ "TARGET_AVX512F"
+ "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"
+ [(set_attr "length_immediate" "1")
+ (set_attr "prefix" "evex")
+ (set_attr "mode" "<MODE>")])
+
;; One bit in mask selects 2 elements.
(define_insn "avx512f_shufps512_1<mask_name>"
[(set (match_operand:V16SF 0 "register_operand" "=v")
@@ -17954,17 +18025,41 @@
[(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
(vec_merge:VF_128
(unspec:VF_128
- [(match_operand:VF_128 2 "register_operand" "Yr,*x,x,v")
+ [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
(match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
UNSPEC_ROUND)
(match_operand:VF_128 1 "register_operand" "0,0,x,v")
(const_int 1)))]
"TARGET_SSE4_1"
"@
- round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
- round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
- vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
- vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+ vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
+ vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
+ [(set_attr "isa" "noavx,noavx,avx,avx512f")
+ (set_attr "type" "ssecvt")
+ (set_attr "length_immediate" "1")
+ (set_attr "prefix_data16" "1,1,*,*")
+ (set_attr "prefix_extra" "1")
+ (set_attr "prefix" "orig,orig,vex,evex")
+ (set_attr "mode" "<MODE>")])
+
+(define_insn "*sse4_1_round<ssescalarmodesuffix>"
+ [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+ (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "0,0,x,v")
+ (const_int 1)))]
+ "TARGET_SSE4_1"
+ "@
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+ vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
+ vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
[(set_attr "isa" "noavx,noavx,avx,avx512f")
(set_attr "type" "ssecvt")
(set_attr "length_immediate" "1")
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
new file mode 100644
index 00000000000..e91bdcbac44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+ for (int i = 0; i != n; i++)
+ if(i < k)
+ d1 = floor (d2);
+ else
+ d1 = ceil (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
new file mode 100644
index 00000000000..20d13cf650b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+ for (int i = 0; i != n; i++)
+ if(i < k)
+ d1 = sqrt (d2);
+ else
+ d1 = sqrt (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
--
2.19.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale
2019-10-25 6:04 ` Hongtao Liu
@ 2019-10-25 8:03 ` Uros Bizjak
2019-10-25 19:21 ` Hongtao Liu
0 siblings, 1 reply; 8+ messages in thread
From: Uros Bizjak @ 2019-10-25 8:03 UTC (permalink / raw)
To: Hongtao Liu; +Cc: GCC Patches
On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > Update patch:
> > > > Add m constraint to define_insn (sse_1_round<ssescalaemodesuffix,
> > > > *sse_1_round<ssescalaemodesuffix) to fix some unrecognized insn error
> > > > when under sse4 but not avx512f.
> > >
> > > It looks to me that the original insn is incompletely defined. It
> > > should use nonimmediate_operand, "m" constraint and <iptr> pointer
> > > size modifier. Something like:
> > >
> > > (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > > [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > > (vec_merge:VF_128
> > > (unspec:VF_128
> > > [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> > > UNSPEC_ROUND)
> > > (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > > (const_int 1)))]
> > > "TARGET_SSE4_1"
> > > "@
> > > round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > > vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > >
> > > >
> > > > Changelog:
> > > > gcc/
> > > > * config/i386/sse.md: (sse4_1_round<ssescalarmodesuffix>):
> > > > Change constraint x to xm
> > > > since vround support memory operand.
> > > > * (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > > >
> > > > Bootstrap and regression test ok.
> > > >
> > > > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > Hi uros:
> > > > > This patch fixes false dependence of scalar operations
> > > > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > > > Bootstrap ok, regression test on i386/x86 ok.
> > > > >
> > > > > It does something like this:
> > > > > -----
> > > > > For scalar instructions with both xmm operands:
> > > > >
> > > > > op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
> > > > >
> > > > > for scalar instructions with one mem or gpr operand:
> > > > >
> > > > > op mem/gpr, %xmmQ, %xmmQ
> > > > >
> > > > > ---> using pass rpad ---->
> > > > >
> > > > > xorps %xmmN, %xmmN, %xxN
> > > > > op mem/gpr, %xmmN, %xmmQ
> > > > >
> > > > > Performance influence of SPEC2017 fprate which is tested on SKX
> > > > >
> > > > > 503.bwaves_r -0.03%
> > > > > 507.cactuBSSN_r -0.22%
> > > > > 508.namd_r -0.02%
> > > > > 510.parest_r 0.37%
> > > > > 511.povray_r 0.74%
> > > > > 519.lbm_r 0.24%
> > > > > 521.wrf_r 2.35%
> > > > > 526.blender_r 0.71%
> > > > > 527.cam4_r 0.65%
> > > > > 538.imagick_r 0.95%
> > > > > 544.nab_r -0.37
> > > > > 549.fotonik3d_r 0.24%
> > > > > 554.roms_r 0.90%
> > > > > fprate geomean 0.50%
> > > > > -----
> > > > >
> > > > > Changelog
> > > > > gcc/
> > > > > * config/i386/i386.md (*rcpsf2_sse): Add
> > > > > avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > (*rsqrtsf2_sse): Ditto.
> > > > > (*sqrt<mode>2_sse): Ditto.
> > > > > (sse4_1_round<mode>2): separate constraint vm, add
> > > > > avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> > > > > by pass rpad.
> > > > > (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
> > > > > Ditto.
> > > > > (*sse_vmrsqrtv4sf2): Ditto.
> > > > > (*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
> > > > > (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > > > >
> > > > > gcc/testsuite
> > > > > * gcc.target/i386/pr87007-4.c: New test.
> > > > > * gcc.target/i386/pr87007-5.c: Ditto.
> > > > >
> > > > >
> > > > > --
> > > > > BR,
> > > > > Hongtao
> > >
> > > (set (attr "preferred_for_speed")
> > > (cond [(eq_attr "alternative" "1")
> > > (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > (eq_attr "alternative" "2")
> > > - (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > + (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > ]
> > > (symbol_ref "true")))])
> > >
> > > This can be written as:
> > >
> > > (set (attr "preferred_for_speed")
> > > (cond [(match_test "TARGET_AVX")
> > > (symbol_ref "true")
> > > (eq_attr "alternative" "1,2")
> > > (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > ]
> > > (symbol_ref "true")))])
> > >
> > > Uros.
> >
> > Yes, after these fixed, i'll upstream to trunk, ok?
> Update patch.
+ (sqrt:<ssescalarmode>
+ (match_operand:<ssescalarmode> 1 "vector_operand"
"xBm,<round_scalar_constraint>")))
+ (match_operand:VF_128 2 "register_operand" "0,v")
+ (const_int 1)))]
vector_operand and Bm are needed for vector mode operands. This is in
effect scalar operand, so nonimmediate_operand and simple "xm" should
be used here.
+(define_insn "*sse_vmrsqrtv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RSQRT))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rsqrtss\t{%1, %0|%0, %k1}
+ vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
No need for %k modifier. We already have scalar size 4 SFmode operand
that will genereate DWORD PTR.
+(define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2
"<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+ (match_operand:SI 3 "const_0_to_255_operand")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "v")
+ (const_int 1)))]
+ "TARGET_AVX512F"
+ "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1,
%0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"
There is no need for <iptr> override for scalar mode operands in the
above and other new patterns,
Looking into sse.md, there is a lot of inconsistencies in existing *vm
patterns w.r.t. operand constraints. Unfortunately, these were copied
into proposed patterns. One example is existing
(define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
[(set (match_operand:VF_128 0 "register_operand" "=x,v")
(vec_merge:VF_128
(sqrt:VF_128
(match_operand:VF_128 1 "vector_operand"
"xBm,<round_scalar_constraint>"))
(match_operand:VF_128 2 "register_operand" "0,v")
(const_int 1)))]
"TARGET_SSE"
"@
sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
Due to combine benefits, *vm operands to be merged is described in
vector mode. Since the insn operates in scalar mode, there is no need
for "vector_operand" and Bm constraint that impose more strict
alignment requirements. However, iptr modifier is needed here to
override VF_128 vector mode (e.g. V4SFmode) to generate scalar
(SFmode, DWORD PTR) memory access prefix.
Someone should fix these existing inconsistencies in a follow-up patch.
Uros.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale
2019-10-25 8:03 ` Uros Bizjak
@ 2019-10-25 19:21 ` Hongtao Liu
2019-10-25 21:08 ` Uros Bizjak
0 siblings, 1 reply; 8+ messages in thread
From: Hongtao Liu @ 2019-10-25 19:21 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
[-- Attachment #1: Type: text/plain, Size: 8089 bytes --]
Update patch.
On Fri, Oct 25, 2019 at 4:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > >
> > > > > Update patch:
> > > > > Add m constraint to define_insn (sse_1_round<ssescalaemodesuffix,
> > > > > *sse_1_round<ssescalaemodesuffix) to fix some unrecognized insn error
> > > > > when under sse4 but not avx512f.
> > > >
> > > > It looks to me that the original insn is incompletely defined. It
> > > > should use nonimmediate_operand, "m" constraint and <iptr> pointer
> > > > size modifier. Something like:
> > > >
> > > > (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > > > [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > > > (vec_merge:VF_128
> > > > (unspec:VF_128
> > > > [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > > (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> > > > UNSPEC_ROUND)
> > > > (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > > > (const_int 1)))]
> > > > "TARGET_SSE4_1"
> > > > "@
> > > > round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > > round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > > vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > > > vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > > >
> > > > >
> > > > > Changelog:
> > > > > gcc/
> > > > > * config/i386/sse.md: (sse4_1_round<ssescalarmodesuffix>):
> > > > > Change constraint x to xm
> > > > > since vround support memory operand.
> > > > > * (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > > > >
> > > > > Bootstrap and regression test ok.
> > > > >
> > > > > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > > >
> > > > > > Hi uros:
> > > > > > This patch fixes false dependence of scalar operations
> > > > > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > > > > Bootstrap ok, regression test on i386/x86 ok.
> > > > > >
> > > > > > It does something like this:
> > > > > > -----
> > > > > > For scalar instructions with both xmm operands:
> > > > > >
> > > > > > op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
> > > > > >
> > > > > > for scalar instructions with one mem or gpr operand:
> > > > > >
> > > > > > op mem/gpr, %xmmQ, %xmmQ
> > > > > >
> > > > > > ---> using pass rpad ---->
> > > > > >
> > > > > > xorps %xmmN, %xmmN, %xxN
> > > > > > op mem/gpr, %xmmN, %xmmQ
> > > > > >
> > > > > > Performance influence of SPEC2017 fprate which is tested on SKX
> > > > > >
> > > > > > 503.bwaves_r -0.03%
> > > > > > 507.cactuBSSN_r -0.22%
> > > > > > 508.namd_r -0.02%
> > > > > > 510.parest_r 0.37%
> > > > > > 511.povray_r 0.74%
> > > > > > 519.lbm_r 0.24%
> > > > > > 521.wrf_r 2.35%
> > > > > > 526.blender_r 0.71%
> > > > > > 527.cam4_r 0.65%
> > > > > > 538.imagick_r 0.95%
> > > > > > 544.nab_r -0.37
> > > > > > 549.fotonik3d_r 0.24%
> > > > > > 554.roms_r 0.90%
> > > > > > fprate geomean 0.50%
> > > > > > -----
> > > > > >
> > > > > > Changelog
> > > > > > gcc/
> > > > > > * config/i386/i386.md (*rcpsf2_sse): Add
> > > > > > avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > > (*rsqrtsf2_sse): Ditto.
> > > > > > (*sqrt<mode>2_sse): Ditto.
> > > > > > (sse4_1_round<mode>2): separate constraint vm, add
> > > > > > avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> > > > > > by pass rpad.
> > > > > > (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
> > > > > > Ditto.
> > > > > > (*sse_vmrsqrtv4sf2): Ditto.
> > > > > > (*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
> > > > > > (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > > > > >
> > > > > > gcc/testsuite
> > > > > > * gcc.target/i386/pr87007-4.c: New test.
> > > > > > * gcc.target/i386/pr87007-5.c: Ditto.
> > > > > >
> > > > > >
> > > > > > --
> > > > > > BR,
> > > > > > Hongtao
> > > >
> > > > (set (attr "preferred_for_speed")
> > > > (cond [(eq_attr "alternative" "1")
> > > > (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > (eq_attr "alternative" "2")
> > > > - (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > + (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > ]
> > > > (symbol_ref "true")))])
> > > >
> > > > This can be written as:
> > > >
> > > > (set (attr "preferred_for_speed")
> > > > (cond [(match_test "TARGET_AVX")
> > > > (symbol_ref "true")
> > > > (eq_attr "alternative" "1,2")
> > > > (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > ]
> > > > (symbol_ref "true")))])
> > > >
> > > > Uros.
> > >
> > > Yes, after these fixed, i'll upstream to trunk, ok?
> > Update patch.
>
> + (sqrt:<ssescalarmode>
> + (match_operand:<ssescalarmode> 1 "vector_operand"
> "xBm,<round_scalar_constraint>")))
> + (match_operand:VF_128 2 "register_operand" "0,v")
> + (const_int 1)))]
>
> vector_operand and Bm are needed for vector mode operands. This is in
> effect scalar operand, so nonimmediate_operand and simple "xm" should
> be used here.
>
> +(define_insn "*sse_vmrsqrtv4sf2"
> + [(set (match_operand:V4SF 0 "register_operand" "=x,x")
> + (vec_merge:V4SF
> + (vec_duplicate:V4SF
> + (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
> + UNSPEC_RSQRT))
> + (match_operand:V4SF 2 "register_operand" "0,x")
> + (const_int 1)))]
> + "TARGET_SSE"
> + "@
> + rsqrtss\t{%1, %0|%0, %k1}
> + vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
>
> No need for %k modifier. We already have scalar size 4 SFmode operand
> that will genereate DWORD PTR.
>
> +(define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
> + [(set (match_operand:VF_128 0 "register_operand" "=v")
> + (vec_merge:VF_128
> + (vec_duplicate:VF_128
> + (unspec:<ssescalarmode>
> + [(match_operand:<ssescalarmode> 2
> "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
> + (match_operand:SI 3 "const_0_to_255_operand")]
> + UNSPEC_ROUND))
> + (match_operand:VF_128 1 "register_operand" "v")
> + (const_int 1)))]
> + "TARGET_AVX512F"
> + "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1,
> %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"
>
> There is no need for <iptr> override for scalar mode operands in the
> above and other new patterns,
>
> Looking into sse.md, there is a lot of inconsistencies in existing *vm
> patterns w.r.t. operand constraints. Unfortunately, these were copied
> into proposed patterns. One example is existing
>
> (define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
> [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> (vec_merge:VF_128
> (sqrt:VF_128
> (match_operand:VF_128 1 "vector_operand"
> "xBm,<round_scalar_constraint>"))
> (match_operand:VF_128 2 "register_operand" "0,v")
> (const_int 1)))]
> "TARGET_SSE"
> "@
> sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
>
> Due to combine benefits, *vm operands to be merged is described in
> vector mode. Since the insn operates in scalar mode, there is no need
> for "vector_operand" and Bm constraint that impose more strict
> alignment requirements. However, iptr modifier is needed here to
> override VF_128 vector mode (e.g. V4SFmode) to generate scalar
> (SFmode, DWORD PTR) memory access prefix.
>
> Someone should fix these existing inconsistencies in a follow-up patch.
>
> Uros.
--
BR,
Hongtao
[-- Attachment #2: 0001-Fix-false-dependence-of-scalar-operation-vrcp-vsqrt-V4.patch --]
[-- Type: application/octet-stream, Size: 12827 bytes --]
From 8c2709054430ec173a5bb0956d58342836fe5380 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Wed, 9 Oct 2019 11:21:25 +0800
Subject: [PATCH] Fix false dependence of scalar operation
vrcp/vsqrt/vrsqrt/vrndscale
For instructions with xmm operand:
op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
for instruction with mem operand or gpr operand:
op mem/gpr, %xmmQ, %xmmQ
---> using pass rpad ---->
xorps %xmmN, %xmmN, %xxN
op mem/gpr, %xmmN, %xmmQ
Performance influence of SPEC2017 fprate which is tested on SKX
----
503.bwaves_r -0.03%
507.cactuBSSN_r -0.22%
508.namd_r -0.02%
510.parest_r 0.37%
511.povray_r 0.74%
519.lbm_r 0.24%
521.wrf_r 2.35%
526.blender_r 0.71%
527.cam4_r 0.65%
538.imagick_r 0.95%
544.nab_r -0.37
549.fotonik3d_r 0.24%
554.roms_r 0.90%
fprate geomean 0.50%
-----
Changelog
gcc/
* config/i386/i386.md (*rcpsf2_sse): Add
avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
(*rsqrtsf2_sse): Ditto.
(*sqrt<mode>2_sse): Ditto.
(sse4_1_round<mode>2): separate constraint vm, add
avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
* config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
by pass rpad.
(*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
Ditto.
(*sse_vmrsqrtv4sf2): Ditto.
(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
(*sse4_1_round<ssescalarmodesuffix>): Ditto.
(sse4_1_round<ssescalarmodesuffix>): Add m constraint and
<iptr> pointer size modifier since vround support memory operand.
gcc/testsuite
* gcc.target/i386/pr87007-4.c: New test.
* gcc.target/i386/pr87007-5.c: Ditto.
---
gcc/config/i386/i386.md | 67 ++++++++--------
gcc/config/i386/sse.md | 97 ++++++++++++++++++++++-
gcc/testsuite/gcc.target/i386/pr87007-4.c | 18 +++++
gcc/testsuite/gcc.target/i386/pr87007-5.c | 18 +++++
4 files changed, 168 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr87007-5.c
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5e0795953d8..fb2235a5e34 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14843,13 +14843,14 @@
(set_attr "btver2_sse_attr" "rcp")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SF")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
- (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- (eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- ]
- (symbol_ref "true")))])
+ (cond [(match_test "TARGET_AVX")
+ (symbol_ref "true")
+ (eq_attr "alternative" "1,2")
+ (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ ]
+ (symbol_ref "true")))])
(define_insn "*fop_xf_1_i387"
[(set (match_operand:XF 0 "register_operand" "=f,f")
@@ -15089,13 +15090,14 @@
(set_attr "btver2_sse_attr" "rcp")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SF")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
- (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- (eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- ]
- (symbol_ref "true")))])
+ (cond [(match_test "TARGET_AVX")
+ (symbol_ref "true")
+ (eq_attr "alternative" "1,2")
+ (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ ]
+ (symbol_ref "true")))])
(define_expand "rsqrtsf2"
[(set (match_operand:SF 0 "register_operand")
@@ -15120,14 +15122,15 @@
(set_attr "atom_sse_attr" "sqrt")
(set_attr "btver2_sse_attr" "sqrt")
(set_attr "prefix" "maybe_vex")
+ (set_attr "avx_partial_xmm_update" "false,false,true")
(set_attr "mode" "<MODE>")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
- (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- (eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- ]
- (symbol_ref "true")))])
+ (cond [(match_test "TARGET_AVX")
+ (symbol_ref "true")
+ (eq_attr "alternative" "1,2")
+ (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ ]
+ (symbol_ref "true")))])
(define_expand "sqrt<mode>2"
[(set (match_operand:MODEF 0 "register_operand")
@@ -16261,30 +16264,32 @@
\f
(define_insn "sse4_1_round<mode>2"
- [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v")
+ [(set (match_operand:MODEF 0 "register_operand" "=x,x,x,v,v")
(unspec:MODEF
- [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,vm")
- (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n")]
+ [(match_operand:MODEF 1 "nonimmediate_operand" "0,x,m,v,m")
+ (match_operand:SI 2 "const_0_to_15_operand" "n,n,n,n,n")]
UNSPEC_ROUND))]
"TARGET_SSE4_1"
"@
%vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
%vround<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
%vround<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}
+ vrndscale<ssemodesuffix>\t{%2, %d1, %0|%0, %d1, %2}
vrndscale<ssemodesuffix>\t{%2, %1, %d0|%d0, %1, %2}"
[(set_attr "type" "ssecvt")
- (set_attr "prefix_extra" "1,1,1,*")
- (set_attr "length_immediate" "*,*,*,1")
- (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex")
- (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f")
+ (set_attr "prefix_extra" "1,1,1,*,*")
+ (set_attr "length_immediate" "*,*,*,1,1")
+ (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
+ (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
+ (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
(set_attr "mode" "<MODE>")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
- (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- (eq_attr "alternative" "2")
- (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
- ]
- (symbol_ref "true")))])
+ (cond [(match_test "TARGET_AVX")
+ (symbol_ref "true")
+ (eq_attr "alternative" "1,2")
+ (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
+ ]
+ (symbol_ref "true")))])
(define_insn "rintxf2"
[(set (match_operand:XF 0 "register_operand" "=f")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 403e91d4b17..ce0dccf3e08 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2035,6 +2035,25 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SF")])
+(define_insn "*sse_vmrcpv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RCP))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rcpss\t{%1, %0|%0, %1}
+ vrcpss\t{%1, %2, %0|%0, %2, %1}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "atom_sse_attr" "rcp")
+ (set_attr "btver2_sse_attr" "rcp")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "SF")])
+
(define_insn "<mask_codefor>rcp14<mode><mask_name>"
[(set (match_operand:VF_AVX512VL 0 "register_operand" "=v")
(unspec:VF_AVX512VL
@@ -2130,6 +2149,25 @@
(set_attr "btver2_sse_attr" "sqrt")
(set_attr "mode" "<ssescalarmode>")])
+(define_insn "*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=x,v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (sqrt:<ssescalarmode>
+ (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "xm,<round_scalar_constraint>")))
+ (match_operand:VF_128 2 "register_operand" "0,v")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %1}
+ vsqrt<ssescalarmodesuffix>\t{<round_scalar_mask_op3>%1, %2, %0<mask_scalar_operand3>|%0<mask_scalar_operand3>, %2, %1<round_scalar_mask_op3>}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "atom_sse_attr" "sqrt")
+ (set_attr "prefix" "<round_scalar_prefix>")
+ (set_attr "btver2_sse_attr" "sqrt")
+ (set_attr "mode" "<ssescalarmode>")])
+
(define_expand "rsqrt<mode>2"
[(set (match_operand:VF1_128_256 0 "register_operand")
(unspec:VF1_128_256
@@ -2219,6 +2257,23 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "SF")])
+(define_insn "*sse_vmrsqrtv4sf2"
+ [(set (match_operand:V4SF 0 "register_operand" "=x,x")
+ (vec_merge:V4SF
+ (vec_duplicate:V4SF
+ (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
+ UNSPEC_RSQRT))
+ (match_operand:V4SF 2 "register_operand" "0,x")
+ (const_int 1)))]
+ "TARGET_SSE"
+ "@
+ rsqrtss\t{%1, %0|%0, %1}
+ vrsqrtss\t{%1, %2, %0|%0, %2, %1}"
+ [(set_attr "isa" "noavx,avx")
+ (set_attr "type" "sse")
+ (set_attr "prefix" "orig,vex")
+ (set_attr "mode" "SF")])
+
(define_expand "<code><mode>3<mask_name><round_saeonly_name>"
[(set (match_operand:VF 0 "register_operand")
(smaxmin:VF
@@ -9709,6 +9764,22 @@
(set_attr "prefix" "evex")
(set_attr "mode" "<MODE>")])
+(define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
+ [(set (match_operand:VF_128 0 "register_operand" "=v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+ (match_operand:SI 3 "const_0_to_255_operand")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "v")
+ (const_int 1)))]
+ "TARGET_AVX512F"
+ "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}"
+ [(set_attr "length_immediate" "1")
+ (set_attr "prefix" "evex")
+ (set_attr "mode" "<MODE>")])
+
;; One bit in mask selects 2 elements.
(define_insn "avx512f_shufps512_1<mask_name>"
[(set (match_operand:V16SF 0 "register_operand" "=v")
@@ -17954,12 +18025,36 @@
[(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
(vec_merge:VF_128
(unspec:VF_128
- [(match_operand:VF_128 2 "register_operand" "Yr,*x,x,v")
+ [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
(match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
UNSPEC_ROUND)
(match_operand:VF_128 1 "register_operand" "0,0,x,v")
(const_int 1)))]
"TARGET_SSE4_1"
+ "@
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+ round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
+ vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
+ vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
+ [(set_attr "isa" "noavx,noavx,avx,avx512f")
+ (set_attr "type" "ssecvt")
+ (set_attr "length_immediate" "1")
+ (set_attr "prefix_data16" "1,1,*,*")
+ (set_attr "prefix_extra" "1")
+ (set_attr "prefix" "orig,orig,vex,evex")
+ (set_attr "mode" "<MODE>")])
+
+(define_insn "*sse4_1_round<ssescalarmodesuffix>"
+ [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
+ (vec_merge:VF_128
+ (vec_duplicate:VF_128
+ (unspec:<ssescalarmode>
+ [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+ (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
+ UNSPEC_ROUND))
+ (match_operand:VF_128 1 "register_operand" "0,0,x,v")
+ (const_int 1)))]
+ "TARGET_SSE4_1"
"@
round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c b/gcc/testsuite/gcc.target/i386/pr87007-4.c
new file mode 100644
index 00000000000..e91bdcbac44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+ for (int i = 0; i != n; i++)
+ if(i < k)
+ d1 = floor (d2);
+ else
+ d1 = ceil (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c
new file mode 100644
index 00000000000..20d13cf650b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse" } */
+
+
+#include<math.h>
+
+extern double d1, d2, d3;
+void
+foo (int n, int k)
+{
+ for (int i = 0; i != n; i++)
+ if(i < k)
+ d1 = sqrt (d2);
+ else
+ d1 = sqrt (d3);
+}
+
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
--
2.19.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale
2019-10-25 19:21 ` Hongtao Liu
@ 2019-10-25 21:08 ` Uros Bizjak
0 siblings, 0 replies; 8+ messages in thread
From: Uros Bizjak @ 2019-10-25 21:08 UTC (permalink / raw)
To: Hongtao Liu; +Cc: GCC Patches
On Fri, Oct 25, 2019 at 9:13 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> Update patch.
>
> On Fri, Oct 25, 2019 at 4:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > >
> > > On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu <crazylht@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > > >
> > > > > > Update patch:
> > > > > > Add m constraint to define_insn (sse_1_round<ssescalaemodesuffix,
> > > > > > *sse_1_round<ssescalaemodesuffix) to fix some unrecognized insn error
> > > > > > when under sse4 but not avx512f.
> > > > >
> > > > > It looks to me that the original insn is incompletely defined. It
> > > > > should use nonimmediate_operand, "m" constraint and <iptr> pointer
> > > > > size modifier. Something like:
> > > > >
> > > > > (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > > > > [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > > > > (vec_merge:VF_128
> > > > > (unspec:VF_128
> > > > > [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > > > (match_operand:SI 3 "const_0_to_15_operand" "n,n,n,n")]
> > > > > UNSPEC_ROUND)
> > > > > (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > > > > (const_int 1)))]
> > > > > "TARGET_SSE4_1"
> > > > > "@
> > > > > round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > > > round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > > > vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > > > > vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > > > >
> > > > > >
> > > > > > Changelog:
> > > > > > gcc/
> > > > > > * config/i386/sse.md: (sse4_1_round<ssescalarmodesuffix>):
> > > > > > Change constraint x to xm
> > > > > > since vround support memory operand.
> > > > > > * (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > > > > >
> > > > > > Bootstrap and regression test ok.
> > > > > >
> > > > > > On Wed, Oct 23, 2019 at 9:56 AM Hongtao Liu <crazylht@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi uros:
> > > > > > > This patch fixes false dependence of scalar operations
> > > > > > > vrcp/vsqrt/vrsqrt/vrndscale.
> > > > > > > Bootstrap ok, regression test on i386/x86 ok.
> > > > > > >
> > > > > > > It does something like this:
> > > > > > > -----
> > > > > > > For scalar instructions with both xmm operands:
> > > > > > >
> > > > > > > op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
> > > > > > >
> > > > > > > for scalar instructions with one mem or gpr operand:
> > > > > > >
> > > > > > > op mem/gpr, %xmmQ, %xmmQ
> > > > > > >
> > > > > > > ---> using pass rpad ---->
> > > > > > >
> > > > > > > xorps %xmmN, %xmmN, %xxN
> > > > > > > op mem/gpr, %xmmN, %xmmQ
> > > > > > >
> > > > > > > Performance influence of SPEC2017 fprate which is tested on SKX
> > > > > > >
> > > > > > > 503.bwaves_r -0.03%
> > > > > > > 507.cactuBSSN_r -0.22%
> > > > > > > 508.namd_r -0.02%
> > > > > > > 510.parest_r 0.37%
> > > > > > > 511.povray_r 0.74%
> > > > > > > 519.lbm_r 0.24%
> > > > > > > 521.wrf_r 2.35%
> > > > > > > 526.blender_r 0.71%
> > > > > > > 527.cam4_r 0.65%
> > > > > > > 538.imagick_r 0.95%
> > > > > > > 544.nab_r -0.37
> > > > > > > 549.fotonik3d_r 0.24%
> > > > > > > 554.roms_r 0.90%
> > > > > > > fprate geomean 0.50%
> > > > > > > -----
> > > > > > >
> > > > > > > Changelog
> > > > > > > gcc/
> > > > > > > * config/i386/i386.md (*rcpsf2_sse): Add
> > > > > > > avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > > > (*rsqrtsf2_sse): Ditto.
> > > > > > > (*sqrt<mode>2_sse): Ditto.
> > > > > > > (sse4_1_round<mode>2): separate constraint vm, add
> > > > > > > avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
> > > > > > > * config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
> > > > > > > by pass rpad.
> > > > > > > (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
> > > > > > > Ditto.
> > > > > > > (*sse_vmrsqrtv4sf2): Ditto.
> > > > > > > (*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
> > > > > > > (*sse4_1_round<ssescalarmodesuffix>): Ditto.
> > > > > > >
> > > > > > > gcc/testsuite
> > > > > > > * gcc.target/i386/pr87007-4.c: New test.
> > > > > > > * gcc.target/i386/pr87007-5.c: Ditto.
OK.
Thanks,
Uros.
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > BR,
> > > > > > > Hongtao
> > > > >
> > > > > (set (attr "preferred_for_speed")
> > > > > (cond [(eq_attr "alternative" "1")
> > > > > (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > > (eq_attr "alternative" "2")
> > > > > - (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > > + (symbol_ref "TARGET_AVX || !TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > > ]
> > > > > (symbol_ref "true")))])
> > > > >
> > > > > This can be written as:
> > > > >
> > > > > (set (attr "preferred_for_speed")
> > > > > (cond [(match_test "TARGET_AVX")
> > > > > (symbol_ref "true")
> > > > > (eq_attr "alternative" "1,2")
> > > > > (symbol_ref "!TARGET_SSE_PARTIAL_REG_DEPENDENCY")
> > > > > ]
> > > > > (symbol_ref "true")))])
> > > > >
> > > > > Uros.
> > > >
> > > > Yes, after these fixed, i'll upstream to trunk, ok?
> > > Update patch.
> >
> > + (sqrt:<ssescalarmode>
> > + (match_operand:<ssescalarmode> 1 "vector_operand"
> > "xBm,<round_scalar_constraint>")))
> > + (match_operand:VF_128 2 "register_operand" "0,v")
> > + (const_int 1)))]
> >
> > vector_operand and Bm are needed for vector mode operands. This is in
> > effect scalar operand, so nonimmediate_operand and simple "xm" should
> > be used here.
> >
> > +(define_insn "*sse_vmrsqrtv4sf2"
> > + [(set (match_operand:V4SF 0 "register_operand" "=x,x")
> > + (vec_merge:V4SF
> > + (vec_duplicate:V4SF
> > + (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm,xm")]
> > + UNSPEC_RSQRT))
> > + (match_operand:V4SF 2 "register_operand" "0,x")
> > + (const_int 1)))]
> > + "TARGET_SSE"
> > + "@
> > + rsqrtss\t{%1, %0|%0, %k1}
> > + vrsqrtss\t{%1, %2, %0|%0, %2, %k1}"
> >
> > No need for %k modifier. We already have scalar size 4 SFmode operand
> > that will genereate DWORD PTR.
> >
> > +(define_insn "*avx512f_rndscale<mode><round_saeonly_name>"
> > + [(set (match_operand:VF_128 0 "register_operand" "=v")
> > + (vec_merge:VF_128
> > + (vec_duplicate:VF_128
> > + (unspec:<ssescalarmode>
> > + [(match_operand:<ssescalarmode> 2
> > "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
> > + (match_operand:SI 3 "const_0_to_255_operand")]
> > + UNSPEC_ROUND))
> > + (match_operand:VF_128 1 "register_operand" "v")
> > + (const_int 1)))]
> > + "TARGET_AVX512F"
> > + "vrndscale<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1,
> > %0|%0, %1, %<iptr>2<round_saeonly_op4>, %3}"
> >
> > There is no need for <iptr> override for scalar mode operands in the
> > above and other new patterns,
> >
> > Looking into sse.md, there is a lot of inconsistencies in existing *vm
> > patterns w.r.t. operand constraints. Unfortunately, these were copied
> > into proposed patterns. One example is existing
> >
> > (define_insn "<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>"
> > [(set (match_operand:VF_128 0 "register_operand" "=x,v")
> > (vec_merge:VF_128
> > (sqrt:VF_128
> > (match_operand:VF_128 1 "vector_operand"
> > "xBm,<round_scalar_constraint>"))
> > (match_operand:VF_128 2 "register_operand" "0,v")
> > (const_int 1)))]
> > "TARGET_SSE"
> > "@
> > sqrt<ssescalarmodesuffix>\t{%1, %0|%0, %<iptr>1}
> >
> > Due to combine benefits, *vm operands to be merged is described in
> > vector mode. Since the insn operates in scalar mode, there is no need
> > for "vector_operand" and Bm constraint that impose more strict
> > alignment requirements. However, iptr modifier is needed here to
> > override VF_128 vector mode (e.g. V4SFmode) to generate scalar
> > (SFmode, DWORD PTR) memory access prefix.
> >
> > Someone should fix these existing inconsistencies in a follow-up patch.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-10-25 20:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-23 2:25 [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale Hongtao Liu
2019-10-23 6:44 ` Hongtao Liu
2019-10-24 19:12 ` Uros Bizjak
2019-10-25 5:20 ` Hongtao Liu
2019-10-25 6:04 ` Hongtao Liu
2019-10-25 8:03 ` Uros Bizjak
2019-10-25 19:21 ` Hongtao Liu
2019-10-25 21:08 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).