[Bug target/104124] New: Poor optimization for vector splat DW with small consts

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "munroesj at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/104124] New: Poor optimization for vector splat DW with small consts
Date: Wed, 19 Jan 2022 17:40:36 +0000	[thread overview]
Message-ID: <bug-104124-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124

            Bug ID: 104124
           Summary: Poor optimization for vector splat DW with small
                    consts
           Product: gcc
           Version: 11.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: munroesj at gcc dot gnu.org
  Target Milestone: ---

It looks to me like the compiler is seeing register pressure caused by loading
all the vector long long constants I need in my code. This is leaf code of a
size it can run out of volatilizes (no stack-frame). But this puts more
pressure on volatile VRs, VSRs, and GPRs. Especially GPRs because it loading
from .rodata when it could (and should) use a vector immediate.

For example:

vui64_t
__test_splatudi_0_V0 (void)
{
  return vec_splats ((unsigned long long) 0);
}

vi64_t
__test_splatudi_1_V0 (void)
{
  return vec_splats ((signed long long) -1);
}

Generate:
00000000000001a0 <__test_splatudi_0_V0>:
     1a0:       8c 03 40 10     vspltisw v2,0
     1a4:       20 00 80 4e     blr

00000000000001c0 <__test_splatudi_1_V0>:
     1c0:       8c 03 5f 10     vspltisw v2,-1
     1c4:       20 00 80 4e     blr
        ...

But other cases that could use immedates like:

vui64_t
__test_splatudi_12_V0 (void)
{
  return vec_splats ((unsigned long long) 12);
}

GCC 9/10/11 Generates for power8:

0000000000000170 <__test_splatudi_12_V0>:
     170:       00 00 4c 3c     addis   r2,r12,0
                        170: R_PPC64_REL16_HA   .TOC.
     174:       00 00 42 38     addi    r2,r2,0
                        174: R_PPC64_REL16_LO   .TOC.+0x4
     178:       00 00 22 3d     addis   r9,r2,0
                        178: R_PPC64_TOC16_HA   .rodata.cst16+0x20
     17c:       00 00 29 39     addi    r9,r9,0
                        17c: R_PPC64_TOC16_LO   .rodata.cst16+0x20
     180:       ce 48 40 7c     lvx     v2,0,r9
     184:       20 00 80 4e     blr

and for Power9:
0000000000000000 <__test_splatisd_12_PWR9>:
       0:       d1 62 40 f0     xxspltib vs34,12
       4:       02 16 58 10     vextsb2d v2,v2
       8:       20 00 80 4e     blr

So why can't the power8 target generate:

00000000000000f0 <__test_splatudi_12_V1>:
      f0:       8c 03 4c 10     vspltisw v2,12
      f4:       4e 16 40 10     vupkhsw v2,v2
      f8:       20 00 80 4e     blr

This is 4 cycles vs 9 ((best case) and it is always 9 cycles because GCC does
not exploit immediate fusion).
In fact GCC 8 (AT12) does this.

So I tried defining my own vec_splatudi:

vi64_t
__test_splatudi_12_V1 (void)
{
  vi32_t vwi = vec_splat_s32 (12);
  return vec_unpackl (vwi);
}

Which generates the <__test_splatudi_12_V1> sequence above for GCC 8. But for
GCC 9/10/11 it generates:

0000000000000110 <__test_splatudi_12_V1>:
     110:       00 00 4c 3c     addis   r2,r12,0
                        110: R_PPC64_REL16_HA   .TOC.
     114:       00 00 42 38     addi    r2,r2,0
                        114: R_PPC64_REL16_LO   .TOC.+0x4
     118:       00 00 22 3d     addis   r9,r2,0
                        118: R_PPC64_TOC16_HA   .rodata.cst16+0x20
     11c:       00 00 29 39     addi    r9,r9,0
                        11c: R_PPC64_TOC16_LO   .rodata.cst16+0x20
     120:       ce 48 40 7c     lvx     v2,0,r9
     124:       20 00 80 4e     blr

Again! GCC has gone out of its way to be this clever! Badly! While it can be
appropriately clever for power9!

I have tried many permutations of this and the only way I have found to prevent
this (GCC 9/10/11) cleverness is to use inline __asm (which has other bad side
effects).

next             reply	other threads:[~2022-01-19 17:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-19 17:40 munroesj at gcc dot gnu.org [this message]
2022-01-19 17:51 ` [Bug target/104124] " munroesj at gcc dot gnu.org
2022-01-27 20:11 ` munroesj at gcc dot gnu.org
2022-01-27 21:16 ` meissner at gcc dot gnu.org
2023-06-28  8:39 ` cvs-commit at gcc dot gnu.org
2023-06-28 20:21 ` munroesj at gcc dot gnu.org
2023-07-13  7:22 ` guihaoc at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-104124-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).