public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq
@ 2020-04-30 0:21 gabravier at gmail dot com
2020-04-30 7:13 ` [Bug target/94866] " rguenth at gcc dot gnu.org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: gabravier at gmail dot com @ 2020-04-30 0:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Bug ID: 94866
Summary: Failure to optimize pinsrq of 0 with index 1 into movq
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef int64_t v2di __attribute__((vector_size(16)));
typedef int32_t v2si __attribute__((vector_size(8)));
v2di _mm_move_epi64(v2di a)
{
return v2di{a[0], 0LL};
}
LLVM with `-O3 -msse4.1` compiles this to this :
_mm_move_epi64(long __vector(2)): # @_mm_move_epi64(long __vector(2))
movq xmm0, xmm0 # xmm0 = xmm0[0],zero
ret
GCC gives :
_mm_move_epi64(long __vector(2)):
xor eax, eax
pinsrq xmm0, rax, 1
ret
GCC's output seems like it would naturally be much slower, so unless there is
something seriously messed up with x86 chips that I've missed, LLVM's version
should be faster
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
@ 2020-04-30 7:13 ` rguenth at gcc dot gnu.org
2023-08-22 9:35 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-30 7:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2020-04-30
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Depends on| |94864
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
We're expanding from
_3 = BIT_INSERT_EXPR <a_1(D), 0, 64 (64 bits)>;
return _3;
which ends up using
(insn 8 7 9 (set (reg:V2DI 85)
(vec_merge:V2DI (vec_duplicate:V2DI (reg:DI 86))
(reg:V2DI 85)
(const_int 2 [0x2]))) "y.c":6:28 -1
so likely the vec_merge "issue" again.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864
[Bug 94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
2020-04-30 7:13 ` [Bug target/94866] " rguenth at gcc dot gnu.org
@ 2023-08-22 9:35 ` rguenth at gcc dot gnu.org
2023-08-22 12:43 ` ubizjak at gmail dot com
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-22 9:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Bug 94866 depends on bug 94864, which changed state.
Bug 94864 Summary: Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
2020-04-30 7:13 ` [Bug target/94866] " rguenth at gcc dot gnu.org
2023-08-22 9:35 ` rguenth at gcc dot gnu.org
@ 2023-08-22 12:43 ` ubizjak at gmail dot com
2023-08-23 1:12 ` crazylht at gmail dot com
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-22 12:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com
--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 55776
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55776&action=edit
Proposed patch
Patch that introduces alternative MOVQ RTX definition.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
` (2 preceding siblings ...)
2023-08-22 12:43 ` ubizjak at gmail dot com
@ 2023-08-23 1:12 ` crazylht at gmail dot com
2023-08-23 11:21 ` ubizjak at gmail dot com
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2023-08-23 1:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
!one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
which means we'd better to use vec_merge instead of vec_select:vec_concat when
available in out backend pattern match.
Also for the view of avx512 kmask instructions, use vec_merge will help
constant propagation.
20107 /* Try the SSE4.1 blend variable merge instructions. */
20108 if (expand_vec_perm_blend (d))
20109 return true;
20110
20111 /* Try movss/movsd instructions. */
20112 if (expand_vec_perm_movs (d))
20113 return true;
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
` (3 preceding siblings ...)
2023-08-23 1:12 ` crazylht at gmail dot com
@ 2023-08-23 11:21 ` ubizjak at gmail dot com
2023-08-23 11:23 ` ubizjak at gmail dot com
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-23 11:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
> !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
> which means we'd better to use vec_merge instead of vec_select:vec_concat
> when available in out backend pattern match.
In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but
the patch regressed:
-FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq
-FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor
-FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor
-FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times
(?n)(?:mov|psrldq).*%xmm[0-9] 12
So, the compiler still expects vec_concat/vec_select patterns to be present.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
` (4 preceding siblings ...)
2023-08-23 11:21 ` ubizjak at gmail dot com
@ 2023-08-23 11:23 ` ubizjak at gmail dot com
2023-08-23 13:39 ` crazylht at gmail dot com
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-23 11:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
--- Comment #5 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 55778
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55778&action=edit
Failing patch, for reference
Patch that converts vec_concat/vec_select sse2_movq128 patterns to vec_merge.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
` (5 preceding siblings ...)
2023-08-23 11:23 ` ubizjak at gmail dot com
@ 2023-08-23 13:39 ` crazylht at gmail dot com
2023-08-23 14:54 ` ubizjak at gmail dot com
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2023-08-23 13:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #4)
> (In reply to Hongtao.liu from comment #3)
> > in x86 backend expand_vec_perm_1, we always tries vec_merge frist for
> > !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed
> > which means we'd better to use vec_merge instead of vec_select:vec_concat
> > when available in out backend pattern match.
>
> In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but
> the patch regressed:
>
> -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq
> -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor
> -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor
> -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times
> (?n)(?:mov|psrldq).*%xmm[0-9] 12
>
> So, the compiler still expects vec_concat/vec_select patterns to be present.
v2df foo_v2df (v2df x)
{
return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
}
The testcase is not a typical vec_merge case, for vec_merge, the shuffle index
should be {0, 3}. Here it happened to be a vec_merge because the second vector
is all zero. And yes for this case, we still need to vec_concat:vec_select
pattern.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
` (6 preceding siblings ...)
2023-08-23 13:39 ` crazylht at gmail dot com
@ 2023-08-23 14:54 ` ubizjak at gmail dot com
2023-08-24 1:12 ` crazylht at gmail dot com
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-23 14:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #6)
> > So, the compiler still expects vec_concat/vec_select patterns to be present.
>
> v2df foo_v2df (v2df x)
> {
> return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
> }
>
> The testcase is not a typical vec_merge case, for vec_merge, the shuffle
> index should be {0, 3}. Here it happened to be a vec_merge because the
> second vector is all zero. And yes for this case, we still need to
> vec_concat:vec_select pattern.
I guess the original patch is the way to go then.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
` (7 preceding siblings ...)
2023-08-23 14:54 ` ubizjak at gmail dot com
@ 2023-08-24 1:12 ` crazylht at gmail dot com
2023-08-24 20:25 ` cvs-commit at gcc dot gnu.org
2023-08-24 20:27 ` ubizjak at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2023-08-24 1:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #7)
> (In reply to Hongtao.liu from comment #6)
> > > So, the compiler still expects vec_concat/vec_select patterns to be present.
> >
> > v2df foo_v2df (v2df x)
> > {
> > return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 });
> > }
> >
> > The testcase is not a typical vec_merge case, for vec_merge, the shuffle
> > index should be {0, 3}. Here it happened to be a vec_merge because the
> > second vector is all zero. And yes for this case, we still need to
> > vec_concat:vec_select pattern.
>
> I guess the original patch is the way to go then.
Yes.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
` (8 preceding siblings ...)
2023-08-24 1:12 ` crazylht at gmail dot com
@ 2023-08-24 20:25 ` cvs-commit at gcc dot gnu.org
2023-08-24 20:27 ` ubizjak at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-24 20:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:
https://gcc.gnu.org/g:6dd73f0f00f454a05552b008a1d56560bd3f1d4a
commit r14-3471-g6dd73f0f00f454a05552b008a1d56560bd3f1d4a
Author: Uros Bizjak <ubizjak@gmail.com>
Date: Thu Aug 24 22:23:52 2023 +0200
i386: Optimize pinsrq of 0 with index 1 into movq [PR94866]
Add new pattern involving vec_merge RTX that is produced by combine from
the
combination of sse4_1_pinsrq and *movdi_internal:
7: r86:DI=0
8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2)
REG_DEAD r87:V2DI
REG_DEAD r86:DI
Successfully matched this instruction:
(set (reg:V2DI 85 [ a ])
(vec_merge:V2DI (reg:V2DI 87)
(const_vector:V2DI [
(const_int 0 [0]) repeated x2
])
(const_int 1 [0x1])))
PR target/94866
gcc/ChangeLog:
* config/i386/sse.md (*sse2_movq128_<mode>_1): New insn pattern.
gcc/testsuite/ChangeLog:
* g++.target/i386/pr94866.C: New test.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
` (9 preceding siblings ...)
2023-08-24 20:25 ` cvs-commit at gcc dot gnu.org
@ 2023-08-24 20:27 ` ubizjak at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2023-08-24 20:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |14.0
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
Implemented for gcc-14.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-08-24 20:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-30 0:21 [Bug target/94866] New: Failure to optimize pinsrq of 0 with index 1 into movq gabravier at gmail dot com
2020-04-30 7:13 ` [Bug target/94866] " rguenth at gcc dot gnu.org
2023-08-22 9:35 ` rguenth at gcc dot gnu.org
2023-08-22 12:43 ` ubizjak at gmail dot com
2023-08-23 1:12 ` crazylht at gmail dot com
2023-08-23 11:21 ` ubizjak at gmail dot com
2023-08-23 11:23 ` ubizjak at gmail dot com
2023-08-23 13:39 ` crazylht at gmail dot com
2023-08-23 14:54 ` ubizjak at gmail dot com
2023-08-24 1:12 ` crazylht at gmail dot com
2023-08-24 20:25 ` cvs-commit at gcc dot gnu.org
2023-08-24 20:27 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).