public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate
@ 2023-07-24 8:40 rguenth at gcc dot gnu.org
2023-07-27 7:02 ` [Bug target/110788] " crazylht at gmail dot com
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-24 8:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788
Bug ID: 110788
Summary: Spilling to mask register for GPR vec_duplicate
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
double a[1024], b[1024];
void foo (int n)
{
for (int i = 0; i < n; ++i)
a[i] = b[i] * 3.;
}
compiled with -O3 -march=cascadelake --param vect-partial-vector-usage=2
produces the inner loop
.L3:
vmovapd b(%rax), %ymm0{%k1}
movl %edx, %ecx
subl $4, %edx
kmovw %edx, %k0
vmulpd %ymm3, %ymm0, %ymm1{%k1}{z}
vmovapd %ymm1, a(%rax){%k1}
vpbroadcastmw2d %k0, %xmm1
addq $32, %rax
vpcmpud $6, %xmm2, %xmm1, %k1
cmpw $4, %cx
ja .L3
where we implement the splat of %edx as
kmovw %edx, %k0
vpbroadcastmw2d %k0, %xmm1
instead of
vpbroadcastw %edx, %xmm1
we expand to
(insn 14 13 15 (set (reg:V4SI 96)
(vec_duplicate:V4SI (reg:SI 93 [ _27 ]))) 8167
{*avx512vl_vec_dup_gprv4si}
(nil))
but at IRA time we instead match that do
(insn 14 13 15 3 (set (reg:V4SI 96)
(vec_duplicate:V4SI (zero_extend:SI (subreg:HI (reg/v:SI 95 [ n ])
0)))) 8247 {avx512cd_maskw_vec_dupv4si}
(expr_list:REG_DEAD (reg/v:SI 95 [ n ])
(nil)))
where combine created this via
Trying 13 -> 14:
13: r93:SI=zero_extend(r95:SI#0)
REG_DEAD r95:SI
14: r96:V4SI=vec_duplicate(r93:SI)
REG_DEAD r93:SI
Successfully matched this instruction:
(set (reg:V4SI 96)
(vec_duplicate:V4SI (zero_extend:SI (subreg:HI (reg/v:SI 95 [ n ]) 0))))
allowing combination of insns 13 and 14
original costs 4 + 4 = 8
replacement cost 4
but it didn't anticipate that reg 95 could be allocated to a GPR? The
vectorizer uses an unsigned short IV for the loop, that's possibly
sub-optimal in this case but important in others.
I suppose it could also be a missed optimization in REE since I think
the HImode regs should already be zero-extended?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110788] Spilling to mask register for GPR vec_duplicate
2023-07-24 8:40 [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate rguenth at gcc dot gnu.org
@ 2023-07-27 7:02 ` crazylht at gmail dot com
2023-07-27 7:09 ` crazylht at gmail dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2023-07-27 7:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
I prefer to add an UNSPEC to vpbroadcastm, don't want to mix gpr and kmask too
much for vec_duplicate:zero_extend pattern.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110788] Spilling to mask register for GPR vec_duplicate
2023-07-24 8:40 [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate rguenth at gcc dot gnu.org
2023-07-27 7:02 ` [Bug target/110788] " crazylht at gmail dot com
@ 2023-07-27 7:09 ` crazylht at gmail dot com
2023-07-27 11:29 ` ubizjak at gmail dot com
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2023-07-27 7:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #1)
> I prefer to add an UNSPEC to vpbroadcastm, don't want to mix gpr and kmask
> too much for vec_duplicate:zero_extend pattern.
And got this:
.L3:
vmovapd b(%rax), %ymm1{%k1}
movl %edx, %esi
subl $4, %edx
movzwl %dx, %ecx
vmulpd %ymm3, %ymm1, %ymm0{%k1}{z}
vmovapd %ymm0, a(%rax){%k1}
vpbroadcastd %ecx, %xmm0
addq $32, %rax
vpcmpud $6, %xmm2, %xmm0, %k1
cmpw $4, %si
ja .L3
vzeroupper
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110788] Spilling to mask register for GPR vec_duplicate
2023-07-24 8:40 [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate rguenth at gcc dot gnu.org
2023-07-27 7:02 ` [Bug target/110788] " crazylht at gmail dot com
2023-07-27 7:09 ` crazylht at gmail dot com
@ 2023-07-27 11:29 ` ubizjak at gmail dot com
2023-07-27 13:55 ` crazylht at gmail dot com
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-27 11:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788
--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Richard Biener from comment #0)
> I suppose it could also be a missed optimization in REE since I think
> the HImode regs should already be zero-extended?
No, only SImode moves have implicit zero extensions. Plain HImode and QImode
moves behave as inserts into the lowpart of the wide register.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110788] Spilling to mask register for GPR vec_duplicate
2023-07-24 8:40 [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate rguenth at gcc dot gnu.org
` (2 preceding siblings ...)
2023-07-27 11:29 ` ubizjak at gmail dot com
@ 2023-07-27 13:55 ` crazylht at gmail dot com
2023-07-28 2:07 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2023-07-27 13:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
> kmovw %edx, %k0
> vpbroadcastmw2d %k0, %xmm1
>
> instead of
>
> vpbroadcastw %edx, %xmm1
>
It's not vpbroadcastw, it's
movzwl %dx, %ecx
vpbroadcastd %ecx, %xmm0.
And non-kmask version should be better.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110788] Spilling to mask register for GPR vec_duplicate
2023-07-24 8:40 [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate rguenth at gcc dot gnu.org
` (3 preceding siblings ...)
2023-07-27 13:55 ` crazylht at gmail dot com
@ 2023-07-28 2:07 ` cvs-commit at gcc dot gnu.org
2023-07-28 2:30 ` crazylht at gmail dot com
2023-11-30 10:51 ` liuhongt at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-28 2:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:54e54f77c1012ab53126314181c51eaee146ad5d
commit r14-2833-g54e54f77c1012ab53126314181c51eaee146ad5d
Author: liuhongt <hongtao.liu@intel.com>
Date: Thu Jul 27 15:14:39 2023 +0800
Add UNSPEC_MASKOP to vpbroadcastm pattern.
Prevent rtl optimization of vec_duplicate + zero_extend to
vpbroadcastm since there could be an extra kmov after RA.
gcc/ChangeLog:
PR target/110788
* config/i386/sse.md (avx512cd_maskb_vec_dup<mode>): Add
UNSPEC_MASKOP.
(avx512cd_maskw_vec_dup<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr110788.c: New test.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110788] Spilling to mask register for GPR vec_duplicate
2023-07-24 8:40 [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate rguenth at gcc dot gnu.org
` (4 preceding siblings ...)
2023-07-28 2:07 ` cvs-commit at gcc dot gnu.org
@ 2023-07-28 2:30 ` crazylht at gmail dot com
2023-11-30 10:51 ` liuhongt at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2023-07-28 2:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in trunk.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/110788] Spilling to mask register for GPR vec_duplicate
2023-07-24 8:40 [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate rguenth at gcc dot gnu.org
` (5 preceding siblings ...)
2023-07-28 2:30 ` crazylht at gmail dot com
@ 2023-11-30 10:51 ` liuhongt at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2023-11-30 10:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788
liuhongt at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
CC| |liuhongt at gcc dot gnu.org
Resolution|--- |FIXED
--- Comment #7 from liuhongt at gcc dot gnu.org ---
.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-11-30 10:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 8:40 [Bug target/110788] New: Spilling to mask register for GPR vec_duplicate rguenth at gcc dot gnu.org
2023-07-27 7:02 ` [Bug target/110788] " crazylht at gmail dot com
2023-07-27 7:09 ` crazylht at gmail dot com
2023-07-27 11:29 ` ubizjak at gmail dot com
2023-07-27 13:55 ` crazylht at gmail dot com
2023-07-28 2:07 ` cvs-commit at gcc dot gnu.org
2023-07-28 2:30 ` crazylht at gmail dot com
2023-11-30 10:51 ` liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).