public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/104124] New: Poor optimization for vector splat DW with small consts
@ 2022-01-19 17:40 munroesj at gcc dot gnu.org
2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-01-19 17:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124
Bug ID: 104124
Summary: Poor optimization for vector splat DW with small
consts
Product: gcc
Version: 11.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: munroesj at gcc dot gnu.org
Target Milestone: ---
It looks to me like the compiler is seeing register pressure caused by loading
all the vector long long constants I need in my code. This is leaf code of a
size it can run out of volatilizes (no stack-frame). But this puts more
pressure on volatile VRs, VSRs, and GPRs. Especially GPRs because it loading
from .rodata when it could (and should) use a vector immediate.
For example:
vui64_t
__test_splatudi_0_V0 (void)
{
return vec_splats ((unsigned long long) 0);
}
vi64_t
__test_splatudi_1_V0 (void)
{
return vec_splats ((signed long long) -1);
}
Generate:
00000000000001a0 <__test_splatudi_0_V0>:
1a0: 8c 03 40 10 vspltisw v2,0
1a4: 20 00 80 4e blr
00000000000001c0 <__test_splatudi_1_V0>:
1c0: 8c 03 5f 10 vspltisw v2,-1
1c4: 20 00 80 4e blr
...
But other cases that could use immedates like:
vui64_t
__test_splatudi_12_V0 (void)
{
return vec_splats ((unsigned long long) 12);
}
GCC 9/10/11 Generates for power8:
0000000000000170 <__test_splatudi_12_V0>:
170: 00 00 4c 3c addis r2,r12,0
170: R_PPC64_REL16_HA .TOC.
174: 00 00 42 38 addi r2,r2,0
174: R_PPC64_REL16_LO .TOC.+0x4
178: 00 00 22 3d addis r9,r2,0
178: R_PPC64_TOC16_HA .rodata.cst16+0x20
17c: 00 00 29 39 addi r9,r9,0
17c: R_PPC64_TOC16_LO .rodata.cst16+0x20
180: ce 48 40 7c lvx v2,0,r9
184: 20 00 80 4e blr
and for Power9:
0000000000000000 <__test_splatisd_12_PWR9>:
0: d1 62 40 f0 xxspltib vs34,12
4: 02 16 58 10 vextsb2d v2,v2
8: 20 00 80 4e blr
So why can't the power8 target generate:
00000000000000f0 <__test_splatudi_12_V1>:
f0: 8c 03 4c 10 vspltisw v2,12
f4: 4e 16 40 10 vupkhsw v2,v2
f8: 20 00 80 4e blr
This is 4 cycles vs 9 ((best case) and it is always 9 cycles because GCC does
not exploit immediate fusion).
In fact GCC 8 (AT12) does this.
So I tried defining my own vec_splatudi:
vi64_t
__test_splatudi_12_V1 (void)
{
vi32_t vwi = vec_splat_s32 (12);
return vec_unpackl (vwi);
}
Which generates the <__test_splatudi_12_V1> sequence above for GCC 8. But for
GCC 9/10/11 it generates:
0000000000000110 <__test_splatudi_12_V1>:
110: 00 00 4c 3c addis r2,r12,0
110: R_PPC64_REL16_HA .TOC.
114: 00 00 42 38 addi r2,r2,0
114: R_PPC64_REL16_LO .TOC.+0x4
118: 00 00 22 3d addis r9,r2,0
118: R_PPC64_TOC16_HA .rodata.cst16+0x20
11c: 00 00 29 39 addi r9,r9,0
11c: R_PPC64_TOC16_LO .rodata.cst16+0x20
120: ce 48 40 7c lvx v2,0,r9
124: 20 00 80 4e blr
Again! GCC has gone out of its way to be this clever! Badly! While it can be
appropriately clever for power9!
I have tried many permutations of this and the only way I have found to prevent
this (GCC 9/10/11) cleverness is to use inline __asm (which has other bad side
effects).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/104124] Poor optimization for vector splat DW with small consts
2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
@ 2022-01-19 17:51 ` munroesj at gcc dot gnu.org
2022-01-27 20:11 ` munroesj at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-01-19 17:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124
Steven Munroe <munroesj at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |munroesj at gcc dot gnu.org
--- Comment #1 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Created attachment 52236
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52236&action=edit
Attempts to load small int consts to vector DW via splat
Multiple attempt to convince GCC to load small integer (-16 - 15) constants via
splat. Current GCC versions (9/10/11) convert vec_splats(<small const>) and
explicit vec_splat_s32/vec_unpackl sequences into to loads from .rodata. This
generates more instruction, takes more cycles, and causes register pressure
that results in unnecessary spill/reload and load-hit-store rejects.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/104124] Poor optimization for vector splat DW with small consts
2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
@ 2022-01-27 20:11 ` munroesj at gcc dot gnu.org
2022-01-27 21:16 ` meissner at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-01-27 20:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124
Steven Munroe <munroesj at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #52236|0 |1
is obsolete| |
--- Comment #2 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Created attachment 52307
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52307&action=edit
Enhansed test case that also shows CSE failure
Original test case that adds example where CSE should common a splat immediate
or even .rodata load, but fails to do even that.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/104124] Poor optimization for vector splat DW with small consts
2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
2022-01-27 20:11 ` munroesj at gcc dot gnu.org
@ 2022-01-27 21:16 ` meissner at gcc dot gnu.org
2023-06-28 8:39 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: meissner at gcc dot gnu.org @ 2022-01-27 21:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124
--- Comment #3 from Michael Meissner <meissner at gcc dot gnu.org> ---
There are two things going on.
1) There is no vspltisd instruction, so we can't generate a single instruction
to load constants other than 0 or -1. Unfortunately, this was not added in
either power9 or power10.
2) On the power9 and power10 we have the xxspltib and vecsb2d instructions, and
we generate those if -mcpu=power9.
To add support for new types of constants, the procedure is:
1) You need to modify easy_altivec_constant and gen_altivec_constant in
rs6000.c (or rs6000.cc in GCC 12). Then add new predicates in predicate.md for
these new patterns.
2) Look for the predicates "easy_vector_constant_add_self" and so forth in
predicates.md and add a new predicate here.
3) Then in altivec.md, look for the define_splits that use the various
easy_vector_const_* functions and add a new pattern.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/104124] Poor optimization for vector splat DW with small consts
2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
` (2 preceding siblings ...)
2022-01-27 21:16 ` meissner at gcc dot gnu.org
@ 2023-06-28 8:39 ` cvs-commit at gcc dot gnu.org
2023-06-28 20:21 ` munroesj at gcc dot gnu.org
2023-07-13 7:22 ` guihaoc at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-28 8:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by HaoChen Gui <guihaoc@gcc.gnu.org>:
https://gcc.gnu.org/g:f3d87219dd502d5c11608ffb83fbe66c79baf784
commit r14-2153-gf3d87219dd502d5c11608ffb83fbe66c79baf784
Author: Haochen Gui <guihaoc@gcc.gnu.org>
Date: Wed Jun 28 16:30:44 2023 +0800
rs6000: Splat vector small V2DI constants with vspltisw and vupkhsw
This patch adds a new insn for vector splat with small V2DI constants on
P8.
If the value of constant is in RANGE (-16, 15) but not 0 or -1, it can be
loaded with vspltisw and vupkhsw on P8.
gcc/
PR target/104124
* config/rs6000/altivec.md (*altivec_vupkhs<VU_char>_direct):
Rename
to...
(altivec_vupkhs<VU_char>_direct): ...this.
* config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split):
New
predicate to test if a constant can be loaded with vspltisw and
vupkhsw.
(easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check
if
a vector constant can be synthesized with a vspltisw and a vupkhsw.
* config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p):
Declare.
* config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): New
function to return true if OP mode is V2DI and can be synthesized
with vupkhsw and vspltisw.
* config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
constants with vspltisw and vupkhsw.
gcc/testsuite/
PR target/104124
* gcc.target/powerpc/pr104124.c: New.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/104124] Poor optimization for vector splat DW with small consts
2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
` (3 preceding siblings ...)
2023-06-28 8:39 ` cvs-commit at gcc dot gnu.org
@ 2023-06-28 20:21 ` munroesj at gcc dot gnu.org
2023-07-13 7:22 ` guihaoc at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: munroesj at gcc dot gnu.org @ 2023-06-28 20:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124
--- Comment #5 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Thanks
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/104124] Poor optimization for vector splat DW with small consts
2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
` (4 preceding siblings ...)
2023-06-28 20:21 ` munroesj at gcc dot gnu.org
@ 2023-07-13 7:22 ` guihaoc at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: guihaoc at gcc dot gnu.org @ 2023-07-13 7:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124
HaoChen Gui <guihaoc at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #6 from HaoChen Gui <guihaoc at gcc dot gnu.org> ---
fixed
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-13 7:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
2022-01-27 20:11 ` munroesj at gcc dot gnu.org
2022-01-27 21:16 ` meissner at gcc dot gnu.org
2023-06-28 8:39 ` cvs-commit at gcc dot gnu.org
2023-06-28 20:21 ` munroesj at gcc dot gnu.org
2023-07-13 7:22 ` guihaoc at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).