public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
@ 2021-10-18 10:58 ubizjak at gmail dot com
2021-10-18 11:17 ` [Bug target/102811] " ubizjak at gmail dot com
` (26 more replies)
0 siblings, 27 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-18 10:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
Bug ID: 102811
Summary: vcvtph2ps and vcvtps2ph should be used to convert
_Float16 to SFmode with -mf16c
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
The following testcase:
_Float16 test (_Float16 a, _Float16 b)
{
return a + b;
}
compiles with -O2 -mf16c to:
--cut here--
subq $24, %rsp
pextrw $0, %xmm1, 14(%rsp)
call __extendhfsf2
pinsrw $0, 14(%rsp), %xmm1
vmovss %xmm0, 8(%rsp)
vmovss %xmm1, %xmm1, %xmm0
call __extendhfsf2
vaddss 8(%rsp), %xmm0, %xmm0
call __truncsfhf2
addq $24, %rsp
ret
--cut here--
Instead of calling __extendhfsf2 and __truncsfhf2, we should use vcvtph2ps and
vcvtps2ph (with zeroed elements 1..3) for -m16c targets.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
@ 2021-10-18 11:17 ` ubizjak at gmail dot com
2021-10-18 11:31 ` ubizjak at gmail dot com
` (25 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-18 11:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
Something like (argument and result in %xmm0):
vpxor %xmm1, %xmm1, %xmm1
vpblendw %xmm1, %xmm1, %xmm0, $1
vcvtph2ps %xmm0, %xmm1
instead of __extendhfsf2 and:
vxorps %xmm1, %xmm1, %xmm1
vblendps %xmm1, %xmm1, %xmm0, $1
vcvtps2ph %xmm0, %xmm1
instead of __truncsfhf2.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
2021-10-18 11:17 ` [Bug target/102811] " ubizjak at gmail dot com
@ 2021-10-18 11:31 ` ubizjak at gmail dot com
2021-10-18 21:20 ` pinskia at gcc dot gnu.org
` (24 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-18 11:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #1)
> vxorps %xmm1, %xmm1, %xmm1
> vblendps %xmm1, %xmm1, %xmm0, $1
> vcvtps2ph %xmm0, %xmm1
vmovss %xmm1, %xmm1, %xmm0
instead of vblendps would also do the trick.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
2021-10-18 11:17 ` [Bug target/102811] " ubizjak at gmail dot com
2021-10-18 11:31 ` ubizjak at gmail dot com
@ 2021-10-18 21:20 ` pinskia at gcc dot gnu.org
2021-11-26 1:30 ` cvs-commit at gcc dot gnu.org
` (23 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-18 21:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (2 preceding siblings ...)
2021-10-18 21:20 ` pinskia at gcc dot gnu.org
@ 2021-11-26 1:30 ` cvs-commit at gcc dot gnu.org
2021-11-26 1:32 ` crazylht at gmail dot com
` (22 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-26 1:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:90cb088ece8d8cc1019d25629d1585e5b0234179
commit r12-5536-g90cb088ece8d8cc1019d25629d1585e5b0234179
Author: konglin1 <lingling.kong@intel.com>
Date: Wed Nov 10 09:37:32 2021 +0800
i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode
with -mf16c [PR 102811]
Add define_insn extendhfsf2 and truncsfhf2 for target_f16c.
gcc/ChangeLog:
PR target/102811
* config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit
data in XMM register
for TARGET_SSE2.
* config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for
TARGET_F16C.
(extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only.
(*extendhf<mode>2): Rename from extendhf<mode>2.
(truncsfhf2): Likewise.
(truncdfhf2): Likewise.
(*trunc<mode>2): Likewise.
gcc/testsuite/ChangeLog:
PR target/102811
* gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw.
* gcc.target/i386/pr90773-23.c: Ditto.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (3 preceding siblings ...)
2021-11-26 1:30 ` cvs-commit at gcc dot gnu.org
@ 2021-11-26 1:32 ` crazylht at gmail dot com
2021-11-26 1:48 ` pinskia at gcc dot gnu.org
` (21 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 1:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC12.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (4 preceding siblings ...)
2021-11-26 1:32 ` crazylht at gmail dot com
@ 2021-11-26 1:48 ` pinskia at gcc dot gnu.org
2021-11-26 10:41 ` ubizjak at gmail dot com
` (20 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-11-26 1:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Target Milestone|--- |12.0
Resolution|--- |FIXED
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (5 preceding siblings ...)
2021-11-26 1:48 ` pinskia at gcc dot gnu.org
@ 2021-11-26 10:41 ` ubizjak at gmail dot com
2021-11-26 10:48 ` ubizjak at gmail dot com
` (19 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 10:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 51879
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51879&action=edit
Improve HI/HFmode scalar insert
The attached patch further improves HFmode -> SFmode conversion. HFmode values
are passed in XMM registers, but PINSRW insn inserts only from memory or GPR.
The patch introduces *vec_set<V8_128:mode>_0 insn pattern that also adds
PBLENDW instruction that handles insert to element 0 from XMM source.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (6 preceding siblings ...)
2021-11-26 10:41 ` ubizjak at gmail dot com
@ 2021-11-26 10:48 ` ubizjak at gmail dot com
2021-11-26 11:12 ` ubizjak at gmail dot com
` (18 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 10:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
The improvement with patch from comment #6:
The testcase:
_Float16 test (_Float16 a, _Float16 b)
{
return a + b;
}
compiles with unpatched gcc -O2 -mf16c to:
vmovss %xmm0, %xmm0, %xmm2 # 27 [c=4 l=4] *movhf_internal/3
pextrw $0, %xmm1, -4(%rsp) # 28 [c=4 l=6] *movhf_internal/5
vpxor %xmm0, %xmm0, %xmm0 # 7 [c=4 l=4] movv8hf_internal/0
vpxor %xmm1, %xmm1, %xmm1 # 11 [c=4 l=4] movv8hf_internal/0
pextrw $0, %xmm2, -2(%rsp) # 30 [c=4 l=6] *movhf_internal/5
vpinsrw $0, -4(%rsp), %xmm1, %xmm1 # 12 [c=4 l=8]
sse4_1_pinsrph/3
vpinsrw $0, -2(%rsp), %xmm0, %xmm0 # 8 [c=4 l=8]
sse4_1_pinsrph/3
vcvtph2ps %xmm1, %xmm1 # 13 [c=4 l=4] vcvtph2ps
vcvtph2ps %xmm0, %xmm0 # 9 [c=4 l=4] vcvtph2ps
vaddss %xmm1, %xmm0, %xmm0 # 15 [c=12 l=4] *fop_sf_comm/2
vinsertps $0xe, %xmm0, %xmm0, %xmm0 # 17 [c=4 l=4]
vec_setv4sf_0/2
vcvtps2ph $4, %xmm0, %xmm0 # 18 [c=4 l=4] *vcvtps2ph
ret # 35 [c=0 l=1] simple_return_internal
with unpatched gcc -O2 -mf16c -mavx2:
vpbroadcastw %xmm0, %xmm0 # 8 [c=4 l=5] *vec_dupv8hf/1
vpxor %xmm2, %xmm2, %xmm2 # 7 [c=4 l=4] movv8hf_internal/0
vpbroadcastw %xmm1, %xmm1 # 13 [c=4 l=5] *vec_dupv8hf/1
vpblendw $1, %xmm0, %xmm2, %xmm2 # 9 [c=4 l=6]
sse4_1_pblendph/2
vpxor %xmm0, %xmm0, %xmm0 # 12 [c=4 l=4] movv8hf_internal/0
vpblendw $1, %xmm1, %xmm0, %xmm0 # 14 [c=4 l=6]
sse4_1_pblendph/2
vcvtph2ps %xmm2, %xmm2 # 10 [c=4 l=4] vcvtph2ps
vcvtph2ps %xmm0, %xmm0 # 15 [c=4 l=4] vcvtph2ps
vaddss %xmm0, %xmm2, %xmm0 # 17 [c=12 l=4] *fop_sf_comm/2
vinsertps $0xe, %xmm0, %xmm0, %xmm0 # 19 [c=4 l=4]
vec_setv4sf_0/2
vcvtps2ph $4, %xmm0, %xmm0 # 20 [c=4 l=4] *vcvtps2ph
ret # 36 [c=0 l=1] simple_return_internal
And with patched gcc -O2 -mf16c:
vpxor %xmm2, %xmm2, %xmm2 # 32 [c=4 l=4] movv8hf_internal/0
vpblendw $1, %xmm0, %xmm2, %xmm0 # 9 [c=4 l=6]
*vec_setv8hf_0/8
vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=4 l=6]
*vec_setv8hf_0/8
vcvtph2ps %xmm1, %xmm1 # 15 [c=4 l=4] vcvtph2ps
vcvtph2ps %xmm0, %xmm0 # 10 [c=4 l=4] vcvtph2ps
vaddss %xmm1, %xmm0, %xmm0 # 17 [c=12 l=4] *fop_sf_comm/2
vinsertps $0xe, %xmm0, %xmm0, %xmm0 # 19 [c=4 l=4]
vec_setv4sf_0/2
vcvtps2ph $4, %xmm0, %xmm0 # 20 [c=4 l=4] *vcvtps2ph
ret # 40 [c=0 l=1] simple_return_internal
The above dumps show inconsistendy for PEXTRW (it should be VPEXTRW) and also
open a question, why unpatched gcc prefers memory temp instead of GPR temp for
PEXTRW/PINSRW.
The patch improves HI/HFmode inserts to element 0 in general.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (7 preceding siblings ...)
2021-11-26 10:48 ` ubizjak at gmail dot com
@ 2021-11-26 11:12 ` ubizjak at gmail dot com
2021-11-26 12:27 ` crazylht at gmail dot com
` (17 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 11:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 68606e57e60..a2ebaa5ac63 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2528,12 +2528,12 @@
case TYPE_SSELOG:
if (SSE_REG_P (operands[0]))
return MEM_P (operands[1])
- ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
- : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+ ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
+ : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
else
return MEM_P (operands[1])
- ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
- : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+ ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
+ : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";
case TYPE_MSKLOG:
if (operands[1] == const0_rtx)
@@ -3788,12 +3788,12 @@
case TYPE_SSELOG:
if (SSE_REG_P (operands[0]))
return MEM_P (operands[1])
- ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
- : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+ ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
+ : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
else
return MEM_P (operands[1])
- ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
- : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+ ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
+ : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";
default:
gcc_unreachable ();
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (8 preceding siblings ...)
2021-11-26 11:12 ` ubizjak at gmail dot com
@ 2021-11-26 12:27 ` crazylht at gmail dot com
2021-11-26 13:01 ` ubizjak at gmail dot com
` (16 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 12:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #8)
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 68606e57e60..a2ebaa5ac63 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2528,12 +2528,12 @@
> case TYPE_SSELOG:
> if (SSE_REG_P (operands[0]))
> return MEM_P (operands[1])
> - ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> - : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> + ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
> + : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
> else
> return MEM_P (operands[1])
> - ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> - : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> + ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
> + : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";
>
> case TYPE_MSKLOG:
> if (operands[1] == const0_rtx)
> @@ -3788,12 +3788,12 @@
> case TYPE_SSELOG:
> if (SSE_REG_P (operands[0]))
> return MEM_P (operands[1])
> - ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> - : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> + ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
> + : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
> else
> return MEM_P (operands[1])
> - ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> - : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> + ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
> + : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";
>
> default:
> gcc_unreachable ();
Yes, I'm testing
modified gcc/config/i386/i386.c
@@ -19240,7 +19240,7 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t
rclass,
}
/* Require movement to gpr, and then store to memory. */
- if (mode == HFmode
+ if ((mode == HFmode || mode == HImode)
&& !TARGET_SSE4_1
&& SSE_CLASS_P (rclass)
&& !in_p && MEM_P (x))
modified gcc/config/i386/i386.md
@@ -2528,12 +2528,12 @@ (define_insn "*movhi_internal"
case TYPE_SSELOG:
if (SSE_REG_P (operands[0]))
return MEM_P (operands[1])
- ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
- : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+ ? "%vpinsrw\t{$0, %1, %0|%0, %1, 0}"
+ : "%vpinsrw\t{$0, %k1, %0|%0, %k1, 0}";
else
- return MEM_P (operands[1])
- ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
- : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+ return MEM_P (operands[0])
+ ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
+ : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";
case TYPE_MSKLOG:
if (operands[1] == const0_rtx)
@@ -2557,12 +2557,14 @@ (define_insn "*movhi_internal"
]
(const_string "*")))
(set (attr "type")
- (cond [(eq_attr "alternative" "9,10,11,12,13")
+ (cond [(eq_attr "alternative" "9,10,12,13")
(if_then_else (match_test "TARGET_AVX512FP16")
(const_string "ssemov")
(const_string "sselog"))
(eq_attr "alternative" "4,5,6,7")
(const_string "mskmov")
+ (eq_attr "alternative" "11")
+ (const_string "ssemov")
(eq_attr "alternative" "8")
(const_string "msklog")
(match_test "optimize_function_for_size_p (cfun)")
@@ -2579,15 +2581,33 @@ (define_insn "*movhi_internal"
(const_string "imovx")
]
(const_string "imov")))
+ (set (attr "memory")
+ (cond [(eq_attr "alternative" "9,10")
+ (const_string "none")
+ (eq_attr "alternative" "12")
+ (const_string "load")
+ (eq_attr "alternative" "13")
+ (const_string "store")
+ ]
+ (const_string "*")))
(set (attr "prefix")
- (if_then_else (eq_attr "alternative" "4,5,6,7,8")
- (const_string "vex")
- (const_string "orig")))
+ (cond [(eq_attr "alternative" "9,10,11,12,13")
+ (const_string "maybe_evex")
+ (eq_attr "alternative" "4,5,6,7,8")
+ (const_string "vex")
+ ]
+ (const_string "orig")))
(set (attr "mode")
(cond [(eq_attr "type" "imovx")
(const_string "SI")
+ (eq_attr "alternative" "9,10,12,13")
+ (if_then_else (match_test "TARGET_AVX512FP16")
+ (const_string "HI")
+ (const_string "TI"))
(eq_attr "alternative" "11")
- (const_string "HF")
+ (if_then_else (match_test "TARGET_AVX512FP16")
+ (const_string "HF")
+ (const_string "SF"))
(and (eq_attr "alternative" "1,2")
(match_operand:HI 1 "aligned_operand"))
(const_string "SI")
@@ -3791,7 +3811,7 @@ (define_insn "*movhf_internal"
? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
: "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
else
- return MEM_P (operands[1])
+ return MEM_P (operands[0])
? "pextrw\t{$0, %1, %0|%0, %1, 0}"
: "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
modified gcc/config/i386/sse.md
@@ -11230,9 +11230,9 @@ (define_insn "*vec_extracthf"
switch (which_alternative)
{
case 0:
- return "vpextrw\t{%2, %1, %k0|%k0, %1, %2}";
+ return "%vpextrw\t{%2, %1, %k0|%k0, %1, %2}";
case 1:
- return "vpextrw\t{%2, %1, %0|%0, %1, %2}";
+ return "%vpextrw\t{%2, %1, %0|%0, %1, %2}";
case 2:
operands[2] = GEN_INT (INTVAL (operands[2]) * 2);
@@ -11245,7 +11245,7 @@ (define_insn "*vec_extracthf"
gcc_unreachable ();
}
}
- [(set_attr "isa" "*,*,noavx,avx")
+ [(set_attr "isa" "*,sse4,noavx,avx")
(set_attr "type" "sselog1,sselog1,sseishft1,sseishft1")
(set_attr "prefix" "maybe_evex")
(set_attr "mode" "TI")])
[back]
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (9 preceding siblings ...)
2021-11-26 12:27 ` crazylht at gmail dot com
@ 2021-11-26 13:01 ` ubizjak at gmail dot com
2021-11-26 13:28 ` crazylht at gmail dot com
` (15 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 13:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #7)
> compiles with unpatched gcc -O2 -mf16c to:
>
> vmovss %xmm0, %xmm0, %xmm2 # 27 [c=4 l=4] *movhf_internal/3
> pextrw $0, %xmm1, -4(%rsp) # 28 [c=4 l=6] *movhf_internal/5
> vpxor %xmm0, %xmm0, %xmm0 # 7 [c=4 l=4] movv8hf_internal/0
> vpxor %xmm1, %xmm1, %xmm1 # 11 [c=4 l=4] movv8hf_internal/0
> pextrw $0, %xmm2, -2(%rsp) # 30 [c=4 l=6] *movhf_internal/5
> vpinsrw $0, -4(%rsp), %xmm1, %xmm1 # 12 [c=4 l=8]
> sse4_1_pinsrph/3
> vpinsrw $0, -2(%rsp), %xmm0, %xmm0 # 8 [c=4 l=8]
> sse4_1_pinsrph/3
> vcvtph2ps %xmm1, %xmm1 # 13 [c=4 l=4] vcvtph2ps
> vcvtph2ps %xmm0, %xmm0 # 9 [c=4 l=4] vcvtph2ps
> vaddss %xmm1, %xmm0, %xmm0 # 15 [c=12 l=4] *fop_sf_comm/2
> vinsertps $0xe, %xmm0, %xmm0, %xmm0 # 17 [c=4 l=4]
> vec_setv4sf_0/2
> vcvtps2ph $4, %xmm0, %xmm0 # 18 [c=4 l=4] *vcvtps2ph
> ret # 35 [c=0 l=1] simple_return_internal
Just noticed that for some reason two VPXORs are emitted. One should be enough
for both VPINSRW insns.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (10 preceding siblings ...)
2021-11-26 13:01 ` ubizjak at gmail dot com
@ 2021-11-26 13:28 ` crazylht at gmail dot com
2021-11-26 13:34 ` crazylht at gmail dot com
` (14 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 13:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
> The above dumps show inconsistendy for PEXTRW (it should be VPEXTRW) and
> also open a question, why unpatched gcc prefers memory temp instead of GPR
> temp for PEXTRW/PINSRW.
>
Because RA thought memory is needed to move between GPR and SSE.
modified gcc/config/i386/i386.c
@@ -19438,7 +19438,7 @@ inline_secondary_memory_needed (machine_mode mode,
reg_class_t class1,
return true;
/* In addition to SImode moves, AVX512FP16 also enables HImode moves.
*/
- int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
+ int minsize = GET_MODE_SIZE (TARGET_SSE2 ? HImode : SImode);
if (msize < minsize)
> The patch improves HI/HFmode inserts to element 0 in general.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (11 preceding siblings ...)
2021-11-26 13:28 ` crazylht at gmail dot com
@ 2021-11-26 13:34 ` crazylht at gmail dot com
2021-11-26 13:46 ` ubizjak at gmail dot com
` (13 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 13:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #12 from Hongtao.liu <crazylht at gmail dot com> ---
>
> Just noticed that for some reason two VPXORs are emitted. One should be
> enough for both VPINSRW insns.
With new alternative in your attached match(vpblenw one), RA could reuse zero
register, w/o that, xmm0/xmm1 need to be explictly clear for the upper bits.
vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=4 l=6] *vec_setv8hf_0/8
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (12 preceding siblings ...)
2021-11-26 13:34 ` crazylht at gmail dot com
@ 2021-11-26 13:46 ` ubizjak at gmail dot com
2021-11-26 15:22 ` crazylht at gmail dot com
` (12 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 13:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #12)
> >
> > Just noticed that for some reason two VPXORs are emitted. One should be
> > enough for both VPINSRW insns.
>
> With new alternative in your attached match(vpblenw one), RA could reuse
> zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> bits.
> vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=4 l=6] *vec_setv8hf_0/8
True, but I'd expect some post-reload(?) pass to propagate zeros and remove
redundant initializations.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (13 preceding siblings ...)
2021-11-26 13:46 ` ubizjak at gmail dot com
@ 2021-11-26 15:22 ` crazylht at gmail dot com
2021-11-26 16:00 ` ubizjak at gmail dot com
` (11 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 15:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #13)
> (In reply to Hongtao.liu from comment #12)
> > >
> > > Just noticed that for some reason two VPXORs are emitted. One should be
> > > enough for both VPINSRW insns.
> >
> > With new alternative in your attached match(vpblenw one), RA could reuse
> > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> > bits.
> > vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=4 l=6] *vec_setv8hf_0/8
>
> True, but I'd expect some post-reload(?) pass to propagate zeros and remove
> redundant initializations.
On the other hand, if not use expand_vector_set (which treats zero register as
both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
pseudo register as dest. the redudant initialization could be optimized off by
fwprop1.
pextrw $0, %xmm1, %eax
pextrw $0, %xmm0, %edx
vpxor %xmm1, %xmm1, %xmm1
vpinsrw $0, %edx, %xmm1, %xmm0
vpinsrw $0, %eax, %xmm1, %xmm1
vcvtph2ps %xmm1, %xmm1
vcvtph2ps %xmm0, %xmm0
vaddss %xmm1, %xmm0, %xmm0
vinsertps $0xe, %xmm0, %xmm0, %xmm0
vcvtps2ph $4, %xmm0, %xmm0
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (14 preceding siblings ...)
2021-11-26 15:22 ` crazylht at gmail dot com
@ 2021-11-26 16:00 ` ubizjak at gmail dot com
2021-11-26 16:29 ` crazylht at gmail dot com
` (10 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 16:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #15 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #14)
> (In reply to Uroš Bizjak from comment #13)
> > (In reply to Hongtao.liu from comment #12)
> > > >
> > > > Just noticed that for some reason two VPXORs are emitted. One should be
> > > > enough for both VPINSRW insns.
> > >
> > > With new alternative in your attached match(vpblenw one), RA could reuse
> > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> > > bits.
> > > vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=4 l=6] *vec_setv8hf_0/8
> >
> > True, but I'd expect some post-reload(?) pass to propagate zeros and remove
> > redundant initializations.
>
> On the other hand, if not use expand_vector_set (which treats zero register
> as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
> pseudo register as dest. the redudant initialization could be optimized off
> by fwprop1.
>
> pextrw $0, %xmm1, %eax
> pextrw $0, %xmm0, %edx
> vpxor %xmm1, %xmm1, %xmm1
> vpinsrw $0, %edx, %xmm1, %xmm0
> vpinsrw $0, %eax, %xmm1, %xmm1
> vcvtph2ps %xmm1, %xmm1
> vcvtph2ps %xmm0, %xmm0
> vaddss %xmm1, %xmm0, %xmm0
> vinsertps $0xe, %xmm0, %xmm0, %xmm0
> vcvtps2ph $4, %xmm0, %xmm0
Then we will lose optimization in expand vector set:
case E_V8HFmode:
if (TARGET_AVX2)
{
mmode = SImode;
gen_blendm = gen_sse4_1_pblendph;
blendm_const = true;
}
else
use_vec_merge = true;
break;
Maybe we should simply copy "target" to a new pseudo here:
do_vec_merge:
tmp = gen_rtx_VEC_DUPLICATE (mode, val);
tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
GEN_INT (HOST_WIDE_INT_1U << elt));
emit_insn (gen_rtx_SET (target, tmp));
OTOH, if recycling "target" inhibits FWprop, we should perhaps copy "target" to
a new pseudo at the beginning of the expand_vector_set?
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (15 preceding siblings ...)
2021-11-26 16:00 ` ubizjak at gmail dot com
@ 2021-11-26 16:29 ` crazylht at gmail dot com
2021-11-26 16:49 ` ubizjak at gmail dot com
` (9 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 16:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #16 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #15)
> (In reply to Hongtao.liu from comment #14)
> > (In reply to Uroš Bizjak from comment #13)
> > > (In reply to Hongtao.liu from comment #12)
> > > > >
> > > > > Just noticed that for some reason two VPXORs are emitted. One should be
> > > > > enough for both VPINSRW insns.
> > > >
> > > > With new alternative in your attached match(vpblenw one), RA could reuse
> > > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> > > > bits.
> > > > vpblendw $1, %xmm1, %xmm2, %xmm1 # 14 [c=4 l=6] *vec_setv8hf_0/8
> > >
> > > True, but I'd expect some post-reload(?) pass to propagate zeros and remove
> > > redundant initializations.
> >
> > On the other hand, if not use expand_vector_set (which treats zero register
> > as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
> > pseudo register as dest. the redudant initialization could be optimized off
> > by fwprop1.
> >
> > pextrw $0, %xmm1, %eax
> > pextrw $0, %xmm0, %edx
> > vpxor %xmm1, %xmm1, %xmm1
> > vpinsrw $0, %edx, %xmm1, %xmm0
> > vpinsrw $0, %eax, %xmm1, %xmm1
> > vcvtph2ps %xmm1, %xmm1
> > vcvtph2ps %xmm0, %xmm0
> > vaddss %xmm1, %xmm0, %xmm0
> > vinsertps $0xe, %xmm0, %xmm0, %xmm0
> > vcvtps2ph $4, %xmm0, %xmm0
>
> Then we will lose optimization in expand vector set:
>
> case E_V8HFmode:
> if (TARGET_AVX2)
> {
> mmode = SImode;
> gen_blendm = gen_sse4_1_pblendph;
> blendm_const = true;
> }
> else
> use_vec_merge = true;
> break;
>
> Maybe we should simply copy "target" to a new pseudo here:
>
> do_vec_merge:
> tmp = gen_rtx_VEC_DUPLICATE (mode, val);
> tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
> GEN_INT (HOST_WIDE_INT_1U << elt));
> emit_insn (gen_rtx_SET (target, tmp));
>
> OTOH, if recycling "target" inhibits FWprop, we should perhaps copy "target"
> to a new pseudo at the beginning of the expand_vector_set?
ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
target as both input and output, it seems we can't create a new target for
that.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (16 preceding siblings ...)
2021-11-26 16:29 ` crazylht at gmail dot com
@ 2021-11-26 16:49 ` ubizjak at gmail dot com
2021-11-26 16:57 ` ubizjak at gmail dot com
` (8 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 16:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #17 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #16)
> ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> target as both input and output, it seems we can't create a new target for
> that.
OK, let's try to optimize it with gen_pinsr, as you proposed.
(It looks that the add-on patch from Comment #6 will generate VPBLEND in this
case, too.)
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (17 preceding siblings ...)
2021-11-26 16:49 ` ubizjak at gmail dot com
@ 2021-11-26 16:57 ` ubizjak at gmail dot com
2021-11-29 2:50 ` crazylht at gmail dot com
` (7 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 16:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #18 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #17)
> (In reply to Hongtao.liu from comment #16)
>
> > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > target as both input and output, it seems we can't create a new target for
> > that.
>
> OK, let's try to optimize it with gen_pinsr, as you proposed.
>
> (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> this case, too.)
We should manually generate vinsertps from truncsfhf2, too. There is no point
to call ix86_expand_vector_set if we already know the instruction. It will use
vec_set<VI4F_128:mode>_0 insn pattern, which has quite some alternatives.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (18 preceding siblings ...)
2021-11-26 16:57 ` ubizjak at gmail dot com
@ 2021-11-29 2:50 ` crazylht at gmail dot com
2021-11-29 3:22 ` crazylht at gmail dot com
` (6 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-29 2:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #19 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #17)
> (In reply to Hongtao.liu from comment #16)
>
> > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > target as both input and output, it seems we can't create a new target for
> > that.
>
> OK, let's try to optimize it with gen_pinsr, as you proposed.
>
> (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> this case, too.)
I think your attached patch is a seperate optimization, the new added
alternatives which generates VPBLEND extend the pattern to accept sse register
for the inserted value, currently we only have "rm".
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (19 preceding siblings ...)
2021-11-29 2:50 ` crazylht at gmail dot com
@ 2021-11-29 3:22 ` crazylht at gmail dot com
2021-11-29 8:03 ` ubizjak at gmail dot com
` (5 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-29 3:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #20 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #18)
> (In reply to Uroš Bizjak from comment #17)
> > (In reply to Hongtao.liu from comment #16)
> >
> > > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > > target as both input and output, it seems we can't create a new target for
> > > that.
> >
> > OK, let's try to optimize it with gen_pinsr, as you proposed.
> >
> > (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> > this case, too.)
>
> We should manually generate vinsertps from truncsfhf2, too. There is no
> point to call ix86_expand_vector_set if we already know the instruction. It
> will use vec_set<VI4F_128:mode>_0 insn pattern, which has quite some
> alternatives.
For AVX2, your attached patch will optimize
vpxor %xmm2, %xmm2, %xmm2
- vpbroadcastw %xmm1, %xmm1
- vpbroadcastw %xmm0, %xmm0
vpblendw $1, %xmm0, %xmm2, %xmm0
vpblendw $1, %xmm1, %xmm2, %xmm2
vcvtph2ps %xmm2, %xmm2
Since upper bits of xmm1/xmm0 is not selected by vpblendw.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (20 preceding siblings ...)
2021-11-29 3:22 ` crazylht at gmail dot com
@ 2021-11-29 8:03 ` ubizjak at gmail dot com
2021-11-29 9:46 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-29 8:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #21 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #20)
> (In reply to Uroš Bizjak from comment #18)
> > (In reply to Uroš Bizjak from comment #17)
> > > (In reply to Hongtao.liu from comment #16)
> > >
> > > > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > > > target as both input and output, it seems we can't create a new target for
> > > > that.
> > >
> > > OK, let's try to optimize it with gen_pinsr, as you proposed.
> > >
> > > (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> > > this case, too.)
> >
> > We should manually generate vinsertps from truncsfhf2, too. There is no
> > point to call ix86_expand_vector_set if we already know the instruction. It
> > will use vec_set<VI4F_128:mode>_0 insn pattern, which has quite some
> > alternatives.
>
> For AVX2, your attached patch will optimize
>
> vpxor %xmm2, %xmm2, %xmm2
> - vpbroadcastw %xmm1, %xmm1
> - vpbroadcastw %xmm0, %xmm0
> vpblendw $1, %xmm0, %xmm2, %xmm0
> vpblendw $1, %xmm1, %xmm2, %xmm2
> vcvtph2ps %xmm2, %xmm2
>
> Since upper bits of xmm1/xmm0 is not selected by vpblendw.
True, the blending of only element 0 does not need broadcast. I will prepare a
formal patch submission once your changes are committed.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (21 preceding siblings ...)
2021-11-29 8:03 ` ubizjak at gmail dot com
@ 2021-11-29 9:46 ` cvs-commit at gcc dot gnu.org
2021-11-29 9:46 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-29 9:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:9519b694afbf9a35c36cf9f14d35d1c0e9e8cacc
commit r12-5573-g9519b694afbf9a35c36cf9f14d35d1c0e9e8cacc
Author: liuhongt <hongtao.liu@intel.com>
Date: Fri Nov 26 23:24:20 2021 +0800
Fix regression introduced by r12-5536.
There're several failures:
1. unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
%vpextrw should be used in output templates.
2. ICE in get_attr_memory for movhi_internal since some alternatives
are marked as TYPE_SSELOG.
use TYPE_SSELOG1 instead.
Also this patch fixs a typo and some latent bugs which are related to
moving HImode from/to sse register w/o TARGET_AVX512FP16.
gcc/ChangeLog:
PR target/102811
PR target/103463
* config/i386/i386.c (ix86_secondary_reload): Without
TARGET_SSE4_1, General register is needed to move HImode from
sse register to memory.
* config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of
pextrw in output templates.
* config/i386/i386.md (movhi_internal): Ditto, also fix typo of
MEM_P (operands[1]) and adjust mode/prefix/type attribute for
alternatives related to sse register.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (22 preceding siblings ...)
2021-11-29 9:46 ` cvs-commit at gcc dot gnu.org
@ 2021-11-29 9:46 ` cvs-commit at gcc dot gnu.org
2021-11-29 21:17 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-29 9:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #23 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:11d0a2af33910c6d243e7265fb7ea04d2bc89b25
commit r12-5574-g11d0a2af33910c6d243e7265fb7ea04d2bc89b25
Author: liuhongt <hongtao.liu@intel.com>
Date: Mon Nov 29 10:01:42 2021 +0800
Optimize _Float16 usage for non AVX512FP16.
1. No memory is needed to move HI/HFmode between GPR and SSE registers
under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
AVX512FP16.
2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace
ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant
initialization cound be eliminated.
gcc/ChangeLog:
PR target/102811
* config/i386/i386.c (inline_secondary_memory_needed): HImode
move between GPR and SSE registers is supported under
TARGET_SSE2 and above.
* config/i386/i386.md (extendhfsf2): Optimize expander.
(truncsfhf2): Ditto.
* config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to
align with V8HImode.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr102811-2.c: New test.
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new
scan-assembler-times.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (23 preceding siblings ...)
2021-11-29 9:46 ` cvs-commit at gcc dot gnu.org
@ 2021-11-29 21:17 ` cvs-commit at gcc dot gnu.org
2021-12-01 22:05 ` cvs-commit at gcc dot gnu.org
2021-12-01 22:15 ` ubizjak at gmail dot com
26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-29 21:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #24 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:
https://gcc.gnu.org/g:ca5667e867252db3c8642ee90f55427149cd92b6
commit r12-5584-gca5667e867252db3c8642ee90f55427149cd92b6
Author: Uros Bizjak <ubizjak@gmail.com>
Date: Mon Nov 29 22:16:12 2021 +0100
i386: Fix and improve movhi_internal and movhf_internal some more.
An (*v,C) alternative can be added to movhi_internal to directly load
HImode constant 0 to xmm register. Also, V4SFmode moves can be used
for xmm->xmm moves instead of TImode moves when optimizing for size.
Fix invalid %vpinsrw insn template, which needs to duplicate %xmm
register for AVX targets.
Optimize GPR moves in movhf_internal in the same way as in movhi_internal.
Fix pinsrw and pextrw templates for AVX targets. Use sselog1
instead of sselog type. Also, handle TARGET_SSE_PARTIAL_REG_DEPENDENCY
and TARGET_SSE_SPLIT_REGS targets.
2021-11-29 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/102811
* config/i386/i386.md (*movhi_internal): Introduce (*v,C)
alternative.
Do not allocate non-GPR registers. Optimize xmm->xmm moves when
optimizing for size. Fix vpinsrw insn template.
(*movhf_internal): Fix pinsrw and pextrw insn templates for
AVX targets. Use sselog1 type instead of sselog. Optimize GPR
moves.
Optimize xmm->xmm moves for TARGET_SSE_PARTIAL_REG_DEPENDENCY
and TARGET_SSE_SPLIT_REGS targets.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (24 preceding siblings ...)
2021-11-29 21:17 ` cvs-commit at gcc dot gnu.org
@ 2021-12-01 22:05 ` cvs-commit at gcc dot gnu.org
2021-12-01 22:15 ` ubizjak at gmail dot com
26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-12-01 22:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #25 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:
https://gcc.gnu.org/g:7eb961d83b0eda53aeb1cfaacdc367e1952de613
commit r12-5700-g7eb961d83b0eda53aeb1cfaacdc367e1952de613
Author: Uros Bizjak <ubizjak@gmail.com>
Date: Wed Dec 1 23:01:09 2021 +0100
i386: Improve V8HI and V8HF inserts [PR102811]
Introduce vec_set_0 pattern for V8HI and V8HF modes to implement scalar
element 0 inserts to from a GP register, SSE register or memory. Also
add V8HI and V8HF AVX2 (x,x,x) alternative to PINSR insn pattern, which is
split after reload to a sequence of PBROADCASTW and PBLENDW.
The V8HF inserts from memory improve from:
- vpbroadcastw 4(%esp), %xmm1
- vpblendw $16, %xmm1, %xmm0, %xmm0
+ vpinsrw $4, 4(%esp), %xmm0, %xmm0
and V8HF inserts from SSE register to element 0 improve from:
vpxor %xmm2, %xmm2, %xmm2
- vpbroadcastw %xmm0, %xmm0
vpblendw $1, %xmm0, %xmm2, %xmm0
Based on the above improvements, the register allocator is able to
determine
the optimal instruction (or instruction sequence) based on the register set
of the input value, so there is no need to manually expand V8HI and V8HF
inserts to the sequence of VEC_DUPLICATE and VEC_MERGE RTXes.
2021-12-01 Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog:
PR target/102811
* config/i386/sse.md (VI2F): Remove mode iterator.
(VI2F_256_512): New mode iterator.
(vec_set<V8_128:mode>_0): New insn pattern.
(vec_set<VI2F_256_512:mode>_0>): Rename from
vec_set<VI2F:mode>mode.
Use VI2F_256_512 mode iterator instead of VI2F.
(*axv512fp16_movsh): Remove.
(<sse2p4_1>_pinsr<ssemodesuffix>): Add (x,x,x) AVX2 alternative.
Do not disable V8HF mode insn on AVX2 targets.
(pinsrw -> pbroadcast + pblendw peephole2): New peephole.
(pinsrw -> pbroadcast + pblendw splitter): New post-reload
splitter.
* config/i386/i386.md (extendhfsf): Call gen_vec_setv8hf_0.
* config/i386/i386-expand.c (ix86_expand_vector_set)
<case E_V8HFmode>: Use vec_merge path for TARGET_AVX2.
gcc/testsuite/ChangeLog:
PR target/102881
* gcc.target/i386/pr102811-1.c: New test.
* gcc.target/i386/avx512fp16-1c.c (dg-final): Update
scan-assembler-times scan strings for ia32 targets.
* gcc.target/i386/pr102327-1.c (dg-final): Ditto.
* gcc.target/i386/pr102811.c: Rename from ...
* gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: ... this.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
` (25 preceding siblings ...)
2021-12-01 22:05 ` cvs-commit at gcc dot gnu.org
@ 2021-12-01 22:15 ` ubizjak at gmail dot com
26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-01 22:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811
--- Comment #26 from Uroš Bizjak <ubizjak at gmail dot com> ---
The testcase now compiles with -O2 -mf16c to:
vpxor %xmm2, %xmm2, %xmm2
vpblendw $1, %xmm0, %xmm2, %xmm0
vpblendw $1, %xmm1, %xmm2, %xmm1
vcvtph2ps %xmm1, %xmm1
vcvtph2ps %xmm0, %xmm0
vaddss %xmm1, %xmm0, %xmm0
vinsertps $0xe, %xmm0, %xmm0, %xmm0
vcvtps2ph $4, %xmm0, %xmm0
ret
for 64-bit targets and:
vpxor %xmm2, %xmm2, %xmm2
vpinsrw $0, 4(%esp), %xmm2, %xmm0
vpinsrw $0, 8(%esp), %xmm2, %xmm1
vcvtph2ps %xmm0, %xmm0
vcvtph2ps %xmm1, %xmm1
vaddss %xmm1, %xmm0, %xmm0
vinsertps $0xe, %xmm0, %xmm0, %xmm0
vcvtps2ph $4, %xmm0, %xmm0
ret
for 32-bit targets.
Fixed.
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2021-12-01 22:15 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
2021-10-18 11:17 ` [Bug target/102811] " ubizjak at gmail dot com
2021-10-18 11:31 ` ubizjak at gmail dot com
2021-10-18 21:20 ` pinskia at gcc dot gnu.org
2021-11-26 1:30 ` cvs-commit at gcc dot gnu.org
2021-11-26 1:32 ` crazylht at gmail dot com
2021-11-26 1:48 ` pinskia at gcc dot gnu.org
2021-11-26 10:41 ` ubizjak at gmail dot com
2021-11-26 10:48 ` ubizjak at gmail dot com
2021-11-26 11:12 ` ubizjak at gmail dot com
2021-11-26 12:27 ` crazylht at gmail dot com
2021-11-26 13:01 ` ubizjak at gmail dot com
2021-11-26 13:28 ` crazylht at gmail dot com
2021-11-26 13:34 ` crazylht at gmail dot com
2021-11-26 13:46 ` ubizjak at gmail dot com
2021-11-26 15:22 ` crazylht at gmail dot com
2021-11-26 16:00 ` ubizjak at gmail dot com
2021-11-26 16:29 ` crazylht at gmail dot com
2021-11-26 16:49 ` ubizjak at gmail dot com
2021-11-26 16:57 ` ubizjak at gmail dot com
2021-11-29 2:50 ` crazylht at gmail dot com
2021-11-29 3:22 ` crazylht at gmail dot com
2021-11-29 8:03 ` ubizjak at gmail dot com
2021-11-29 9:46 ` cvs-commit at gcc dot gnu.org
2021-11-29 9:46 ` cvs-commit at gcc dot gnu.org
2021-11-29 21:17 ` cvs-commit at gcc dot gnu.org
2021-12-01 22:05 ` cvs-commit at gcc dot gnu.org
2021-12-01 22:15 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).