[Bug target/104124] New: Poor optimization for vector splat DW with small consts

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/104124] New: Poor optimization for vector splat DW with small consts
@ 2022-01-19 17:40 munroesj at gcc dot gnu.org
  2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-01-19 17:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

            Bug ID: 104124
           Summary: Poor optimization for vector splat DW with small
                    consts
           Product: gcc
           Version: 11.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: munroesj at gcc dot gnu.org
  Target Milestone: ---

It looks to me like the compiler is seeing register pressure caused by loading
all the vector long long constants I need in my code. This is leaf code of a
size it can run out of volatilizes (no stack-frame). But this puts more
pressure on volatile VRs, VSRs, and GPRs. Especially GPRs because it loading
from .rodata when it could (and should) use a vector immediate.

For example:

vui64_t
__test_splatudi_0_V0 (void)
{
  return vec_splats ((unsigned long long) 0);
}

vi64_t
__test_splatudi_1_V0 (void)
{
  return vec_splats ((signed long long) -1);
}

Generate:
00000000000001a0 <__test_splatudi_0_V0>:
     1a0:       8c 03 40 10     vspltisw v2,0
     1a4:       20 00 80 4e     blr

00000000000001c0 <__test_splatudi_1_V0>:
     1c0:       8c 03 5f 10     vspltisw v2,-1
     1c4:       20 00 80 4e     blr
        ...

But other cases that could use immedates like:

vui64_t
__test_splatudi_12_V0 (void)
{
  return vec_splats ((unsigned long long) 12);
}

GCC 9/10/11 Generates for power8:

0000000000000170 <__test_splatudi_12_V0>:
     170:       00 00 4c 3c     addis   r2,r12,0
                        170: R_PPC64_REL16_HA   .TOC.
     174:       00 00 42 38     addi    r2,r2,0
                        174: R_PPC64_REL16_LO   .TOC.+0x4
     178:       00 00 22 3d     addis   r9,r2,0
                        178: R_PPC64_TOC16_HA   .rodata.cst16+0x20
     17c:       00 00 29 39     addi    r9,r9,0
                        17c: R_PPC64_TOC16_LO   .rodata.cst16+0x20
     180:       ce 48 40 7c     lvx     v2,0,r9
     184:       20 00 80 4e     blr

and for Power9:
0000000000000000 <__test_splatisd_12_PWR9>:
       0:       d1 62 40 f0     xxspltib vs34,12
       4:       02 16 58 10     vextsb2d v2,v2
       8:       20 00 80 4e     blr

So why can't the power8 target generate:

00000000000000f0 <__test_splatudi_12_V1>:
      f0:       8c 03 4c 10     vspltisw v2,12
      f4:       4e 16 40 10     vupkhsw v2,v2
      f8:       20 00 80 4e     blr

This is 4 cycles vs 9 ((best case) and it is always 9 cycles because GCC does
not exploit immediate fusion).
In fact GCC 8 (AT12) does this.

So I tried defining my own vec_splatudi:

vi64_t
__test_splatudi_12_V1 (void)
{
  vi32_t vwi = vec_splat_s32 (12);
  return vec_unpackl (vwi);
}

Which generates the <__test_splatudi_12_V1> sequence above for GCC 8. But for
GCC 9/10/11 it generates:

0000000000000110 <__test_splatudi_12_V1>:
     110:       00 00 4c 3c     addis   r2,r12,0
                        110: R_PPC64_REL16_HA   .TOC.
     114:       00 00 42 38     addi    r2,r2,0
                        114: R_PPC64_REL16_LO   .TOC.+0x4
     118:       00 00 22 3d     addis   r9,r2,0
                        118: R_PPC64_TOC16_HA   .rodata.cst16+0x20
     11c:       00 00 29 39     addi    r9,r9,0
                        11c: R_PPC64_TOC16_LO   .rodata.cst16+0x20
     120:       ce 48 40 7c     lvx     v2,0,r9
     124:       20 00 80 4e     blr

Again! GCC has gone out of its way to be this clever! Badly! While it can be
appropriately clever for power9!

I have tried many permutations of this and the only way I have found to prevent
this (GCC 9/10/11) cleverness is to use inline __asm (which has other bad side
effects).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/104124] Poor optimization for vector splat DW with small consts
  2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
@ 2022-01-19 17:51 ` munroesj at gcc dot gnu.org
  2022-01-27 20:11 ` munroesj at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-01-19 17:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

Steven Munroe <munroesj at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |munroesj at gcc dot gnu.org

--- Comment #1 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Created attachment 52236
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52236&action=edit
Attempts to load small int consts to vector DW via splat

Multiple attempt to convince GCC to load small integer (-16 - 15) constants via
splat. Current GCC versions (9/10/11) convert vec_splats(<small const>) and
explicit vec_splat_s32/vec_unpackl sequences into to loads from .rodata. This
generates more instruction, takes more cycles, and causes register pressure
that results in unnecessary spill/reload and load-hit-store rejects.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/104124] Poor optimization for vector splat DW with small consts
  2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
  2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
@ 2022-01-27 20:11 ` munroesj at gcc dot gnu.org
  2022-01-27 21:16 ` meissner at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: munroesj at gcc dot gnu.org @ 2022-01-27 20:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

Steven Munroe <munroesj at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #52236|0                           |1
        is obsolete|                            |

--- Comment #2 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Created attachment 52307
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52307&action=edit
Enhansed test case that also shows CSE failure

Original test case that adds example where CSE should common a splat immediate
or even .rodata load, but fails to do even that.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/104124] Poor optimization for vector splat DW with small consts
  2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
  2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
  2022-01-27 20:11 ` munroesj at gcc dot gnu.org
@ 2022-01-27 21:16 ` meissner at gcc dot gnu.org
  2023-06-28  8:39 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: meissner at gcc dot gnu.org @ 2022-01-27 21:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

--- Comment #3 from Michael Meissner <meissner at gcc dot gnu.org> ---
There are two things going on.

1) There is no vspltisd instruction, so we can't generate a single instruction
to load constants other than 0 or -1.  Unfortunately, this was not added in
either power9 or power10.

2) On the power9 and power10 we have the xxspltib and vecsb2d instructions, and
we generate those if -mcpu=power9.

To add support for new types of constants, the procedure is:

1) You need to modify easy_altivec_constant and gen_altivec_constant in
rs6000.c (or rs6000.cc in GCC 12).  Then add new predicates in predicate.md for
these new patterns.

2) Look for the predicates "easy_vector_constant_add_self" and so forth in
predicates.md and add a new predicate here.

3) Then in altivec.md, look for the define_splits that use the various
easy_vector_const_* functions and add a new pattern.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/104124] Poor optimization for vector splat DW with small consts
  2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-01-27 21:16 ` meissner at gcc dot gnu.org
@ 2023-06-28  8:39 ` cvs-commit at gcc dot gnu.org
  2023-06-28 20:21 ` munroesj at gcc dot gnu.org
  2023-07-13  7:22 ` guihaoc at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-28  8:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by HaoChen Gui <guihaoc@gcc.gnu.org>:

https://gcc.gnu.org/g:f3d87219dd502d5c11608ffb83fbe66c79baf784

commit r14-2153-gf3d87219dd502d5c11608ffb83fbe66c79baf784
Author: Haochen Gui <guihaoc@gcc.gnu.org>
Date:   Wed Jun 28 16:30:44 2023 +0800

    rs6000: Splat vector small V2DI constants with vspltisw and vupkhsw

    This patch adds a new insn for vector splat with small V2DI constants on
P8.
    If the value of constant is in RANGE (-16, 15) but not 0 or -1, it can be
    loaded with vspltisw and vupkhsw on P8.

    gcc/
            PR target/104124
            * config/rs6000/altivec.md (*altivec_vupkhs<VU_char>_direct):
Rename
            to...
            (altivec_vupkhs<VU_char>_direct): ...this.
            * config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split):
New
            predicate to test if a constant can be loaded with vspltisw and
            vupkhsw.
            (easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check
if
            a vector constant can be synthesized with a vspltisw and a vupkhsw.
            * config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p):
            Declare.
            * config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): New
            function to return true if OP mode is V2DI and can be synthesized
            with vupkhsw and vspltisw.
            * config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
            constants with vspltisw and vupkhsw.

    gcc/testsuite/
            PR target/104124
            * gcc.target/powerpc/pr104124.c: New.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/104124] Poor optimization for vector splat DW with small consts
  2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-06-28  8:39 ` cvs-commit at gcc dot gnu.org
@ 2023-06-28 20:21 ` munroesj at gcc dot gnu.org
  2023-07-13  7:22 ` guihaoc at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: munroesj at gcc dot gnu.org @ 2023-06-28 20:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

--- Comment #5 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/104124] Poor optimization for vector splat DW with small consts
  2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-06-28 20:21 ` munroesj at gcc dot gnu.org
@ 2023-07-13  7:22 ` guihaoc at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: guihaoc at gcc dot gnu.org @ 2023-07-13  7:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

HaoChen Gui <guihaoc at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #6 from HaoChen Gui <guihaoc at gcc dot gnu.org> ---
fixed

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-13  7:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-19 17:40 [Bug target/104124] New: Poor optimization for vector splat DW with small consts munroesj at gcc dot gnu.org
2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
2022-01-27 20:11 ` munroesj at gcc dot gnu.org
2022-01-27 21:16 ` meissner at gcc dot gnu.org
2023-06-28  8:39 ` cvs-commit at gcc dot gnu.org
2023-06-28 20:21 ` munroesj at gcc dot gnu.org
2023-07-13  7:22 ` guihaoc at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).