public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns
@ 2020-05-01 19:02 gabravier at gmail dot com
2020-05-01 21:24 ` [Bug tree-optimization/94908] " glisse at gcc dot gnu.org
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: gabravier at gmail dot com @ 2020-05-01 19:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
Bug ID: 94908
Summary: Failure to optimally optimize certain shuffle patterns
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef float v4sf __attribute__((vector_size(16)));
v4sf g();
v4sf f(v4sf a, v4sf b)
{
return (v4sf){g()[1], a[1], a[2], a[3]};
}
With -O3, LLVM outputs this :
f(float __vector(4), float __vector(4)): # @f(float __vector(4), float
__vector(4))
sub rsp, 24
movaps xmmword ptr [rsp], xmm0 # 16-byte Spill
call g()
movaps xmm1, xmmword ptr [rsp] # 16-byte Reload
shufps xmm0, xmm1, 17 # xmm0 = xmm0[1,0],xmm1[1,0]
shufps xmm0, xmm1, 232 # xmm0 = xmm0[0,2],xmm1[2,3]
add rsp, 24
ret
GCC outputs this :
f(float __vector(4), float __vector(4)):
sub rsp, 24
movaps XMMWORD PTR [rsp], xmm0
call g()
movaps xmm1, XMMWORD PTR [rsp]
add rsp, 24
shufps xmm0, xmm0, 85
movaps xmm2, xmm1
shufps xmm2, xmm1, 85
movaps xmm3, xmm2
movaps xmm2, xmm1
unpckhps xmm2, xmm1
unpcklps xmm0, xmm3
shufps xmm1, xmm1, 255
unpcklps xmm2, xmm1
movlhps xmm0, xmm2
ret
This also seems to occurs on powerpc64le, so I haven't marked it as
target-specific.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
@ 2020-05-01 21:24 ` glisse at gcc dot gnu.org
2020-05-04 6:30 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: glisse at gcc dot gnu.org @ 2020-05-01 21:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #1 from Marc Glisse <glisse at gcc dot gnu.org> ---
Even if we write __builtin_shuffle, the vector lowering pass turns it into the
same code (constructor of BIT_FIELD_REFs), which seems to indicate that the
target does not handle this pattern.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
2020-05-01 21:24 ` [Bug tree-optimization/94908] " glisse at gcc dot gnu.org
@ 2020-05-04 6:30 ` rguenth at gcc dot gnu.org
2023-02-17 20:49 ` gabravier at gmail dot com
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-04 6:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2020-05-04
Ever confirmed|0 |1
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, ideally it would be extract g()[1], insert at a[0]. But yes, we're not
trying to split an not handled suffle into two but leave that for targets
to sort out ... (x86 has code for many 3-insn shuffles for example).
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug tree-optimization/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
2020-05-01 21:24 ` [Bug tree-optimization/94908] " glisse at gcc dot gnu.org
2020-05-04 6:30 ` rguenth at gcc dot gnu.org
@ 2023-02-17 20:49 ` gabravier at gmail dot com
2023-02-17 21:05 ` [Bug target/94908] " pinskia at gcc dot gnu.org
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: gabravier at gmail dot com @ 2023-02-17 20:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #3 from Gabriel Ravier <gabravier at gmail dot com> ---
Looks like this gives much better output now.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (2 preceding siblings ...)
2023-02-17 20:49 ` gabravier at gmail dot com
@ 2023-02-17 21:05 ` pinskia at gcc dot gnu.org
2023-02-18 9:35 ` ubizjak at gmail dot com
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-02-17 21:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=53346,
| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=93720
Component|tree-optimization |target
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I think this was a target issue and maybe should be split into a couple
different bugs.
For GCC 8, aarch64 produces:
dup v0.4s, v0.s[1]
ldr q1, [sp, 16]
ldp x29, x30, [sp], 32
ins v0.s[1], v1.s[1]
ins v0.s[2], v1.s[2]
ins v0.s[3], v1.s[3]
For GCC 9/10 did (which is ok, though could be improved which it did in GCC
11):
adrp x0, .LC0
ldr q1, [sp, 16]
ldr q2, [x0, #:lo12:.LC0]
ldp x29, x30, [sp], 32
tbl v0.16b, {v0.16b - v1.16b}, v2.16b
For GCC 11+, aarch64 produces:
ldr q1, [sp, 16]
ins v1.s[0], v0.s[1]
mov v0.16b, v1.16b
Which means for aarch64, this was changed in GCC 10 and fixed fully for GCC 11
(by r11-2192-gc9c87e6f9c795b aka PR 93720 which was my patch in fact).
For x86_64, the trunk produces:
movaps (%rsp), %xmm1
addq $24, %rsp
shufps $85, %xmm1, %xmm0
shufps $232, %xmm1, %xmm0
While for GCC 12 produces:
movaps (%rsp), %xmm1
addq $24, %rsp
shufps $85, %xmm0, %xmm0
movaps %xmm1, %xmm2
shufps $85, %xmm1, %xmm2
movaps %xmm2, %xmm3
movaps %xmm1, %xmm2
unpckhps %xmm1, %xmm2
unpcklps %xmm3, %xmm0
shufps $255, %xmm1, %xmm1
unpcklps %xmm1, %xmm2
movlhps %xmm2, %xmm0
This was changed with r13-2843-g3db8e9c2422d92 (aka PR 53346).
For powerpc64le, it looks ok for GCC 11:
addis 9,2,.LC0@toc@ha
addi 1,1,48
addi 9,9,.LC0@toc@l
li 0,-16
lvx 0,0,9
vperm 2,31,2,0
Both the x86_64 and the PowerPC PERM implementation could be improved to
support the inseration like the aarch64 backend does too.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (3 preceding siblings ...)
2023-02-17 21:05 ` [Bug target/94908] " pinskia at gcc dot gnu.org
@ 2023-02-18 9:35 ` ubizjak at gmail dot com
2023-02-20 3:32 ` crazylht at gmail dot com
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-02-18 9:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Andrew Pinski from comment #4)
> Both the x86_64 and the PowerPC PERM implementation could be improved to
> support the inseration like the aarch64 backend does too.
Cc Hongtao for x86 part.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (4 preceding siblings ...)
2023-02-18 9:35 ` ubizjak at gmail dot com
@ 2023-02-20 3:32 ` crazylht at gmail dot com
2023-03-08 13:19 ` ubizjak at gmail dot com
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2023-02-20 3:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
Yes, insertps can select any element from src and insert into any place of the
dest. under sse4.1, x86 can generate
vinsertps xmm0, xmm1, xmm0, 64 # xmm0 = xmm0[1],xmm1[1,2,3]
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (5 preceding siblings ...)
2023-02-20 3:32 ` crazylht at gmail dot com
@ 2023-03-08 13:19 ` ubizjak at gmail dot com
2023-03-09 4:22 ` crazylht at gmail dot com
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-03-08 13:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 54607
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54607&action=edit
Proposed patch
Patch in testing.
Attached patch produces (-O2 -msse4.1):
f:
subq $24, %rsp
xorl %eax, %eax
vmovaps %xmm0, (%rsp)
call g
vmovaps (%rsp), %xmm1
addq $24, %rsp
vinsertps $64, %xmm0, %xmm1, %xmm0
ret
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (6 preceding siblings ...)
2023-03-08 13:19 ` ubizjak at gmail dot com
@ 2023-03-09 4:22 ` crazylht at gmail dot com
2023-03-09 14:27 ` ubizjak at gmail dot com
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2023-03-09 4:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #7)
> Created attachment 54607 [details]
> Proposed patch
>
> Patch in testing.
>
> Attached patch produces (-O2 -msse4.1):
>
> f:
> subq $24, %rsp
> xorl %eax, %eax
> vmovaps %xmm0, (%rsp)
> call g
> vmovaps (%rsp), %xmm1
> addq $24, %rsp
> vinsertps $64, %xmm0, %xmm1, %xmm0
> ret
I'm thinking of something like below so it can be matched both by
expand_vselect_vconcat in ix86_expand_vec_perm_const_1 and patterns created by
pass_combine(theoretically).
+(define_insn_and_split "*sse4_1_insertps_1"
+ [(set (match_operand:VI4F_128 0 "register_operand")
+ (vec_select:VI4F_128
+ (vec_concat:<ssedoublevecmode>
+ (match_operand:VI4F_128 1 "register_operand")
+ (match_operand:VI4F_128 2 "register_operand"))
+ (match_parallel 3 "insertps_parallel"
+ [(match_operand 4 "const_int_operand")])))]
+ "TARGET_SSE4_1 && ix86_pre_reload_split ()"
+ "#"
+ "&& 1"
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (7 preceding siblings ...)
2023-03-09 4:22 ` crazylht at gmail dot com
@ 2023-03-09 14:27 ` ubizjak at gmail dot com
2023-03-09 14:32 ` ubizjak at gmail dot com
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-03-09 14:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #8)
> I'm thinking of something like below so it can be matched both by
> expand_vselect_vconcat in ix86_expand_vec_perm_const_1 and patterns created
> by pass_combine(theoretically).
>
> +(define_insn_and_split "*sse4_1_insertps_1"
> + [(set (match_operand:VI4F_128 0 "register_operand")
> + (vec_select:VI4F_128
> + (vec_concat:<ssedoublevecmode>
> + (match_operand:VI4F_128 1 "register_operand")
> + (match_operand:VI4F_128 2 "register_operand"))
> + (match_parallel 3 "insertps_parallel"
> + [(match_operand 4 "const_int_operand")])))]
> + "TARGET_SSE4_1 && ix86_pre_reload_split ()"
> + "#"
> + "&& 1"
If you want to go that way, then the resulting pattern should look like
combination of:
(define_insn "*vec_setv4sf_sse4_1"
[(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v")
(vec_merge:V4SF
(vec_duplicate:V4SF
(match_operand:SF 2 "nonimmediate_operand" "Yrm,*xm,vm"))
(match_operand:V4SF 1 "register_operand" "0,0,v")
(match_operand:SI 3 "const_0_to_3_operand")))]
"TARGET_SSE4_1
&& ((unsigned) exact_log2 (INTVAL (operands[3]))
< GET_MODE_NUNITS (V4SFmode))"
(define_insn_and_split "*sse4_1_extractps"
[(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,Yv,Yv")
(vec_select:SF
(match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
(parallel [(match_operand:SI 2 "const_0_to_3_operand")])))]
"TARGET_SSE4_1"
where the later pattern propagates into the former in place of operand 2. This
combination is created only for scalar insert of an extracted value, so I doubt
it is ever created...
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (8 preceding siblings ...)
2023-03-09 14:27 ` ubizjak at gmail dot com
@ 2023-03-09 14:32 ` ubizjak at gmail dot com
2023-04-18 16:59 ` cvs-commit at gcc dot gnu.org
2023-04-18 17:01 ` ubizjak at gmail dot com
11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-03-09 14:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #54607|0 |1
is obsolete| |
--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 54624
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54624&action=edit
Proposed patch v2
New version with some code shamelessly stolen from aarch64.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (9 preceding siblings ...)
2023-03-09 14:32 ` ubizjak at gmail dot com
@ 2023-04-18 16:59 ` cvs-commit at gcc dot gnu.org
2023-04-18 17:01 ` ubizjak at gmail dot com
11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-18 16:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:
https://gcc.gnu.org/g:95b99e47f4f2df2d0c5680f45e3ec0a3170218ad
commit r14-47-g95b99e47f4f2df2d0c5680f45e3ec0a3170218ad
Author: Uros Bizjak <ubizjak@gmail.com>
Date: Tue Apr 18 17:50:37 2023 +0200
i386: Improve permutations with INSERTPS instruction [PR94908]
INSERTPS can select any element from src and insert into any place
of the dest. For SSE4.1 targets, compiler can generate e.g.
insertps $64, %xmm0, %xmm1
to insert element 1 from %xmm1 to element 0 of %xmm0.
gcc/ChangeLog:
PR target/94908
* config/i386/i386-builtin.def (__builtin_ia32_insertps128):
Use CODE_FOR_sse4_1_insertps_v4sf.
* config/i386/i386-expand.cc (expand_vec_perm_insertps): New.
(expand_vec_perm_1): Call expand_vec_per_insertps.
* config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here.
* config/i386/mmx.md (mmxscalarmode): New mode attribute.
(@sse4_1_insertps_<mode>): New insn pattern.
* config/i386/sse.md (@sse4_1_insertps_<mode>): Macroize insn
pattern from sse4_1_insertps using VI4F_128 mode iterator.
gcc/testsuite/ChangeLog:
PR target/94908
* gcc.target/i386/pr94908.c: New test.
* gcc.target/i386/sse4_1-insertps-5.c: New test.
* gcc.target/i386/vperm-v4sf-2-sse4.c: New test.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/94908] Failure to optimally optimize certain shuffle patterns
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
` (10 preceding siblings ...)
2023-04-18 16:59 ` cvs-commit at gcc dot gnu.org
@ 2023-04-18 17:01 ` ubizjak at gmail dot com
11 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2023-04-18 17:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
--- Comment #12 from Uroš Bizjak <ubizjak at gmail dot com> ---
Implemented also for x86.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-04-18 17:01 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-01 19:02 [Bug tree-optimization/94908] New: Failure to optimally optimize certain shuffle patterns gabravier at gmail dot com
2020-05-01 21:24 ` [Bug tree-optimization/94908] " glisse at gcc dot gnu.org
2020-05-04 6:30 ` rguenth at gcc dot gnu.org
2023-02-17 20:49 ` gabravier at gmail dot com
2023-02-17 21:05 ` [Bug target/94908] " pinskia at gcc dot gnu.org
2023-02-18 9:35 ` ubizjak at gmail dot com
2023-02-20 3:32 ` crazylht at gmail dot com
2023-03-08 13:19 ` ubizjak at gmail dot com
2023-03-09 4:22 ` crazylht at gmail dot com
2023-03-09 14:27 ` ubizjak at gmail dot com
2023-03-09 14:32 ` ubizjak at gmail dot com
2023-04-18 16:59 ` cvs-commit at gcc dot gnu.org
2023-04-18 17:01 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).