public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/107548] New: STV doesn't consider vec_select
@ 2022-11-07 10:25 rguenth at gcc dot gnu.org
2022-12-23 9:58 ` [Bug target/107548] " cvs-commit at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-07 10:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548
Bug ID: 107548
Summary: STV doesn't consider vec_select
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
typedef unsigned int v4si __attribute__((vector_size(16)));
unsigned f (v4si a, v4si b)
{
a[0] += b[0];
return a[0] + a[1];
}
gets optimized to
f:
.LFB0:
.cfi_startproc
vpextrd $1, %xmm0, %edx
vmovd %xmm0, %eax
addl %edx, %eax
vmovd %xmm1, %edx
addl %edx, %eax
ret
with znver2 arch, but similar with others while it seems to be beneficial
to shuffle a[1] to a'[0] and perform the add on the vector side eliding
two xmm->gpr moves. STV2 sees
19: r94:V4SI=xmm0:V4SI
REG_DEAD xmm0:V4SI
2: r87:V4SI=r94:V4SI
REG_DEAD r94:V4SI
20: r95:V4SI=xmm1:V4SI
REG_DEAD xmm1:V4SI
3: NOTE_INSN_DELETED
4: NOTE_INSN_FUNCTION_BEG
7: r90:SI=vec_select(r87:V4SI,parallel)
8: r91:SI=vec_select(r87:V4SI,parallel)
REG_DEAD r87:V4SI
9: {r92:SI=r90:SI+r91:SI;clobber flags:CC;}
REG_DEAD r91:SI
REG_DEAD r90:SI
REG_UNUSED flags:CC
10: r93:SI=vec_select(r95:V4SI,parallel)
REG_DEAD r95:V4SI
11: {r89:SI=r92:SI+r93:SI;clobber flags:CC;}
REG_DEAD r93:SI
REG_DEAD r92:SI
REG_UNUSED flags:CC
16: ax:SI=r89:SI
REG_DEAD r89:SI
17: use ax:SI
but it lacks vec_select support:
Created a new instruction chain #1
Building chain #1...
Adding insn 9 to chain #1
Adding insn 11 into chain's #1 queue
r90 def in insn 7 isn't convertible
Mark r90 def in insn 7 as requiring both modes in chain #1
r91 def in insn 8 isn't convertible
Mark r91 def in insn 8 as requiring both modes in chain #1
Adding insn 11 to chain #1
r89 use in insn 16 isn't convertible
Mark r89 def in insn 11 as requiring both modes in chain #1
r93 def in insn 10 isn't convertible
Mark r93 def in insn 10 as requiring both modes in chain #1
Collected chain #1...
insns: 9, 11
defs to convert: r89, r90, r91, r93
Computing gain for chain #1...
Instruction conversion gain: 0
Registers conversion cost: 24
Total gain: -24
Chain #1 conversion is not profitable
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/107548] STV doesn't consider vec_select
2022-11-07 10:25 [Bug target/107548] New: STV doesn't consider vec_select rguenth at gcc dot gnu.org
@ 2022-12-23 9:58 ` cvs-commit at gcc dot gnu.org
2022-12-24 22:09 ` cvs-commit at gcc dot gnu.org
2022-12-26 13:27 ` roger at nextmovesoftware dot com
2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-23 9:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548
--- Comment #1 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:0b2c1369d035e92847cca81fd9f7b4e9ab9da710
commit r13-4873-g0b2c1369d035e92847cca81fd9f7b4e9ab9da710
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Fri Dec 23 09:56:30 2022 +0000
PR target/107548: Handle vec_select in STV on x86.
This patch enhances x86's STV pass to handle VEC_SELECT during general
scalar chain conversion, performing SImode scalar extraction from V4SI
and DImode scalar extraction from V2DI in vector registers.
The motivating test case from bugzilla is:
typedef unsigned int v4si __attribute__((vector_size(16)));
unsigned int f (v4si a, v4si b)
{
a[0] += b[0];
return a[0] + a[1];
}
currently with -O2 -march=znver2 this generates:
vpextrd $1, %xmm0, %edx
vmovd %xmm0, %eax
addl %edx, %eax
vmovd %xmm1, %edx
addl %edx, %eax
ret
which performs three transfers from the vector unit to the scalar unit,
and performs the two additions there. With this patch, we now generate:
vmovdqa %xmm0, %xmm2
vpshufd $85, %xmm0, %xmm0
vpaddd %xmm0, %xmm2, %xmm0
vpaddd %xmm1, %xmm0, %xmm0
vmovd %xmm0, %eax
ret
which performs the two additions in the vector unit, and then transfers
the result to the scalar unit. Technically the (cheap) movdqa isn't
needed with better register allocation (or this could be cleaned up
during peephole2), but even so this transform is still a win.
2022-12-23 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/107548
* config/i386/i386-features.cc (scalar_chain::add_insn): The
operands of a VEC_SELECT don't need to added to the scalar chain.
(general_scalar_chain::compute_convert_gain) <case VEC_SELECT>:
Provide gains for performing STV on a VEC_SELECT.
(general_scalar_chain::convert_insn): Convert VEC_SELECT to pshufd,
psrldq or no-op.
(general_scalar_to_vector_candidate_p): Handle VEC_SELECT of a
single element from a vector register to a scalar register.
gcc/testsuite/ChangeLog
PR target/107548
* gcc.target/i386/pr107548-1.c: New test V4SI case.
* gcc.target/i386/pr107548-2.c: New test V2DI case.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/107548] STV doesn't consider vec_select
2022-11-07 10:25 [Bug target/107548] New: STV doesn't consider vec_select rguenth at gcc dot gnu.org
2022-12-23 9:58 ` [Bug target/107548] " cvs-commit at gcc dot gnu.org
@ 2022-12-24 22:09 ` cvs-commit at gcc dot gnu.org
2022-12-26 13:27 ` roger at nextmovesoftware dot com
2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-24 22:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548
--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:3cf6d0e1830231dd47740e66926499db600b9ae4
commit r13-4886-g3cf6d0e1830231dd47740e66926499db600b9ae4
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Sat Dec 24 22:07:11 2022 +0000
[Committed] Tweak new gcc.target/i386/pr107548-1.c for -march=cascadelake.
My recently added testcases gcc.target/i386/pr107548-[12].c need to be
tweaked slightly for -march=cascadelake. Committed as obvious.
2022-12-24 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
PR target/107548
* gcc.target/i386/pr107548-1.c: Match both vmovd and movd.
* gcc.target/i386/pr107548-2.c: Match both vpaddq and paddq.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/107548] STV doesn't consider vec_select
2022-11-07 10:25 [Bug target/107548] New: STV doesn't consider vec_select rguenth at gcc dot gnu.org
2022-12-23 9:58 ` [Bug target/107548] " cvs-commit at gcc dot gnu.org
2022-12-24 22:09 ` cvs-commit at gcc dot gnu.org
@ 2022-12-26 13:27 ` roger at nextmovesoftware dot com
2 siblings, 0 replies; 4+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-12-26 13:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
CC| |roger at nextmovesoftware dot com
Target Milestone|--- |13.0
--- Comment #3 from Roger Sayle <roger at nextmovesoftware dot com> ---
This should now be fixed/implemented on mainline.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-12-26 13:27 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-07 10:25 [Bug target/107548] New: STV doesn't consider vec_select rguenth at gcc dot gnu.org
2022-12-23 9:58 ` [Bug target/107548] " cvs-commit at gcc dot gnu.org
2022-12-24 22:09 ` cvs-commit at gcc dot gnu.org
2022-12-26 13:27 ` roger at nextmovesoftware dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).