public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/99293] New: Built-in vec_splat generates sub-optimal code for -mcpu=power10
@ 2021-02-26 17:02 munroesj at gcc dot gnu.org
  2021-02-26 17:07 ` [Bug middle-end/99293] " munroesj at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: munroesj at gcc dot gnu.org @ 2021-02-26 17:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293

            Bug ID: 99293
           Summary: Built-in vec_splat generates sub-optimal code for
                    -mcpu=power10
           Product: gcc
           Version: 10.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: munroesj at gcc dot gnu.org
  Target Milestone: ---

Created attachment 50263
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50263&action=edit
Simplified test case

While adding code to Power Vector Library (PVECLIB), for the POWER10 target, I
see strange code generation for Altivec built-in vec_splat for the vector long
long type. I would expect a xxpermdi (xxspltd) based on the "Power Vector
Intrinsic Programming Reference".

But I see the following generated:

0000000000000300 <test_vec_rlq_PWR10>:
     300:   67 02 69 7c     mfvsrld r9,vs35
     304:   67 4b 09 7c     mtvsrdd vs32,r9,r9
     308:   05 00 42 10     vrlq    v2,v2,v0
     30c:   20 00 80 4e     blr

While these seems to functionally correct, the trip through the GPR seems
unnecessary. It requires two serially dependent instructions where a single
xxspltd would do. I expected:

0000000000000300 <test_vec_rlq_PWR10>:
 300:   57 1b 63 f0     xxspltd vs35,vs35,1
 304:   05 18 42 10     vrlq    v2,v2,v3
 308:   20 00 80 4e     blr


The compiler was:

Compiler: gcc version 10.2.1 20210104 (Advance-Toolchain 14.0-2) [2093e873bb6c]
(GCC)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/99293] Built-in vec_splat generates sub-optimal code for -mcpu=power10
  2021-02-26 17:02 [Bug middle-end/99293] New: Built-in vec_splat generates sub-optimal code for -mcpu=power10 munroesj at gcc dot gnu.org
@ 2021-02-26 17:07 ` munroesj at gcc dot gnu.org
  2021-02-27  0:52 ` segher at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: munroesj at gcc dot gnu.org @ 2021-02-26 17:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293

--- Comment #1 from Steven Munroe <munroesj at gcc dot gnu.org> ---
Created attachment 50264
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50264&action=edit
Compile test for simplied test case

Download vec_dummy.c and vec_int128_ppc.h into a local directory and compile

gcc -O3 -mcpu=power10 -m64 -c vec_dummy.c

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/99293] Built-in vec_splat generates sub-optimal code for -mcpu=power10
  2021-02-26 17:02 [Bug middle-end/99293] New: Built-in vec_splat generates sub-optimal code for -mcpu=power10 munroesj at gcc dot gnu.org
  2021-02-26 17:07 ` [Bug middle-end/99293] " munroesj at gcc dot gnu.org
@ 2021-02-27  0:52 ` segher at gcc dot gnu.org
  2021-06-05  5:09 ` [Bug target/99293] " meissner at gcc dot gnu.org
  2021-06-05  5:09 ` meissner at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: segher at gcc dot gnu.org @ 2021-02-27  0:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-02-27
     Ever confirmed|0                           |1

--- Comment #2 from Segher Boessenkool <segher at gcc dot gnu.org> ---
It generates non-optimal code for older CPUs as well (it does two splats
instead of one:

        xxpermdi 0,35,35,3       # 7    [c=4 l=4]  vsx_extract_v2di/1
        xxpermdi 35,0,0,0        # 9    [c=4 l=4]  vsx_splat_v2di_reg/0
        vrlq 2,2,3

This is because we get things like

Trying 7 -> 9:
    7: r117:DI=vec_select(r127:V1TI#0,parallel)
      REG_DEAD r127:V1TI
    9: r124:V2DI=vec_duplicate(r117:DI)
      REG_DEAD r117:DI
Failed to match this instruction:
(set (reg:V2DI 124)
    (vec_duplicate:V2DI (vec_select:DI (subreg:V2DI (reg:V1TI 127) 0)
            (parallel [
                    (const_int 0 [0])
                ]))))

(the patterns we do have use vec_concat instead).

Confirmed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/99293] Built-in vec_splat generates sub-optimal code for -mcpu=power10
  2021-02-26 17:02 [Bug middle-end/99293] New: Built-in vec_splat generates sub-optimal code for -mcpu=power10 munroesj at gcc dot gnu.org
  2021-02-26 17:07 ` [Bug middle-end/99293] " munroesj at gcc dot gnu.org
  2021-02-27  0:52 ` segher at gcc dot gnu.org
@ 2021-06-05  5:09 ` meissner at gcc dot gnu.org
  2021-06-05  5:09 ` meissner at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: meissner at gcc dot gnu.org @ 2021-06-05  5:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293

--- Comment #3 from Michael Meissner <meissner at gcc dot gnu.org> ---
Created attachment 50947
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50947&action=edit
Proposed patch

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/99293] Built-in vec_splat generates sub-optimal code for -mcpu=power10
  2021-02-26 17:02 [Bug middle-end/99293] New: Built-in vec_splat generates sub-optimal code for -mcpu=power10 munroesj at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-06-05  5:09 ` [Bug target/99293] " meissner at gcc dot gnu.org
@ 2021-06-05  5:09 ` meissner at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: meissner at gcc dot gnu.org @ 2021-06-05  5:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99293

Michael Meissner <meissner at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |meissner at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-05  5:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26 17:02 [Bug middle-end/99293] New: Built-in vec_splat generates sub-optimal code for -mcpu=power10 munroesj at gcc dot gnu.org
2021-02-26 17:07 ` [Bug middle-end/99293] " munroesj at gcc dot gnu.org
2021-02-27  0:52 ` segher at gcc dot gnu.org
2021-06-05  5:09 ` [Bug target/99293] " meissner at gcc dot gnu.org
2021-06-05  5:09 ` meissner at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).