[Bug target/100866] New: PPC: Inefficient code for vec

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9
@ 2021-06-02  7:14 jens.seifert at de dot ibm.com
  2021-06-02 15:03 ` [Bug target/100866] " segher at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: jens.seifert at de dot ibm.com @ 2021-06-02  7:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

            Bug ID: 100866
           Summary: PPC: Inefficient code for vec_revb(vector unsigned
                    short) < P9
           Product: gcc
           Version: 8.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jens.seifert at de dot ibm.com
  Target Milestone: ---

Input:

vector unsigned short revb(vector unsigned short a)
{
   return vec_revb(a);
}

creates:

_Z4revbDv8_t:
.LFB1:
        .cfi_startproc
.LCF1:
0:      addis 2,12,.TOC.-.LCF1@ha
        addi 2,2,.TOC.-.LCF1@l
        .localentry     _Z4revbDv8_t,.-_Z4revbDv8_t
        addis 9,2,.LC1@toc@ha
        addi 9,9,.LC1@toc@l
        lvx 0,0,9
        xxlnor 32,32,32
        vperm 2,2,2,0
        blr


Optimal code sequence:
vector unsigned short revb_pwr7(vector unsigned short a)
{
   return vec_rl(a, vec_splats((unsigned short)8));
}

_Z9revb_pwr7Dv8_t:
.LFB2:
        .cfi_startproc
        .localentry     _Z9revb_pwr7Dv8_t,1
        vspltish 0,8
        vrlh 2,2,0
        blr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
@ 2021-06-02 15:03 ` segher at gcc dot gnu.org
  2021-06-15  9:22 ` luoxhu at gcc dot gnu.org
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-02 15:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

Segher Boessenkool <segher at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|powerpc-*-*-*               |powerpc*
           Severity|normal                      |enhancement
   Last reconfirmed|                            |2021-06-02
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Confirmed.  There are other ways to write this in two insns, but this
is probably the cheapest and simplest, and the immediate can be reused
potentially.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
  2021-06-02 15:03 ` [Bug target/100866] " segher at gcc dot gnu.org
@ 2021-06-15  9:22 ` luoxhu at gcc dot gnu.org
  2021-06-15  9:56 ` luoxhu at gcc dot gnu.org
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-06-15  9:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

luoxhu at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |luoxhu at gcc dot gnu.org

--- Comment #2 from luoxhu at gcc dot gnu.org ---
But it only works for V8HImode, no better code generation for other modes like
V4SI/V2DI/V1TI to do byte swap with only two instructions vspltish+vrlh?

  unsigned int swap1[16] = {15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0};
  unsigned int swap2[16] = {7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8};
  unsigned int swap4[16] = {3,2,1,0,7,6,5,4,11,10,9,8,15,14,13,12};
  unsigned int swap8[16] = {1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14};

For example V4SI, need swap short first,  then swap word, it seems not so
straight forward than vperm?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
  2021-06-02 15:03 ` [Bug target/100866] " segher at gcc dot gnu.org
  2021-06-15  9:22 ` luoxhu at gcc dot gnu.org
@ 2021-06-15  9:56 ` luoxhu at gcc dot gnu.org
  2021-06-15 13:50 ` segher at gcc dot gnu.org
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-06-15  9:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #3 from luoxhu at gcc dot gnu.org ---

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 097a127be07..35b3f1a0e1a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1932,7 +1932,7 @@ (define_insn "altivec_vpku<VI_char>um_direct"
 }
   [(set_attr "type" "vecperm")])

-(define_insn "*altivec_vrl<VI_char>"
+(define_insn "altivec_vrl<VI_char>"
   [(set (match_operand:VI2 0 "register_operand" "=v")
         (rotate:VI2 (match_operand:VI2 1 "register_operand" "v")
                    (match_operand:VI2 2 "register_operand" "v")))]
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 8c5865b8c34..88b34a2285a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5849,9 +5849,18 @@ (define_expand "revb_<mode>"
       /* Want to have the elements in reverse order relative
         to the endian mode in use, i.e. in LE mode, put elements
         in BE order.  */
-      rtx sel = swap_endian_selector_for_mode(<MODE>mode);
-      emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
-                                          operands[1], sel));
+      if (<MODE>mode == V8HImode)
+       {
+         rtx splt = gen_reg_rtx (V8HImode);
+         emit_insn (gen_altivec_vspltish (splt, GEN_INT (8)));
+         emit_insn (gen_altivec_vrlh (operands[0], operands[1], splt));
+       }
+      else
+       {
+         rtx sel = swap_endian_selector_for_mode (<MODE> mode);
+         emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
+                                              operands[1], sel));
+       }
     }


With above change, it could generate the expected code:

revb:
.LFB0:
        .cfi_startproc
        vspltisw 0,8
        vrlw 2,2,0
        blr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (2 preceding siblings ...)
  2021-06-15  9:56 ` luoxhu at gcc dot gnu.org
@ 2021-06-15 13:50 ` segher at gcc dot gnu.org
  2021-06-16  5:53 ` luoxhu at gcc dot gnu.org
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-15 13:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #4 from Segher Boessenkool <segher at gcc dot gnu.org> ---
This PR is specifically about the vec_revb builtin.  But yes, we should
look at what is generated for all other code (having only the builtin
generate good code is suboptimal for a generic thing like this), and for
other sizes as well.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (3 preceding siblings ...)
  2021-06-15 13:50 ` segher at gcc dot gnu.org
@ 2021-06-16  5:53 ` luoxhu at gcc dot gnu.org
  2021-06-18  1:37 ` luoxhu at gcc dot gnu.org
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-06-16  5:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #5 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #4)
> This PR is specifically about the vec_revb builtin.  But yes, we should
> look at what is generated for all other code (having only the builtin
> generate good code is suboptimal for a generic thing like this), and for
> other sizes as well.

Sorry I don't quite understand what you mean. IMO vec_revb is expanded by
CODE_FOR_revb_v8hi through revb_<mode> pattern. So this is where we should
change to make better code generation... 
For V8HI, it is natural to use vspltish 8+vrlh to turn
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15} to
{1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14}.

But for V4SI, we need use vspltish+vrlh to turn it to
{1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14} first, and a "vrlw 16" to turn it to 
{3,2,1,0,7,6,5,4,11,10,9,8,15,14,13,12}. I am not sure whether this is better
than lvx+xxlnor+vperm especially for V2DI&V1TI with additional "vrld 32" or
"vrld 32"+"vrlq 64"? (Those are all operations on register without load from
memory like lvx.)


bt 5
#0  gen_revb_v8hi (operand0=0x7ffff4d4ce40, operand1=0x7ffff4d4cf60) at
../../gcc/gcc/config/rs6000/vsx.md:5858
#1  0x0000000010b05360 in insn_gen_fn::operator()<rtx_def*, rtx_def*>
(this=0x130ab188 <insn_data+163016>) at../../gcc/gcc/recog.h:407
#2  0x0000000011aa1e30 in rs6000_expand_unop_builtin (icode=CODE_FOR_revb_v8hi,
exp=<call_expr 0x7ffff4f509a0>
, target=0x7ffff4d4ce40) at ../../gcc/gcc/config/rs6000/rs6000-call.c:9451
#3  0x0000000011ab27a4 in rs6000_expand_builtin (exp=<call_expr
0x7ffff4f509a0>, target=0x7ffff4d4ce40, subtarget=0x0, mode=E_V8HImode,
ignore=0) at ../../gcc/gcc/config/rs6000/rs6000-call.c:13157
#4  0x0000000010815268 in expand_builtin (exp=<call_expr 0x7ffff4f509a0>,
target=0x7ffff4d4ce40, subtarget=0x0, mode=E_V8HImode, ignore=0) at
../../gcc/gcc/builtins.c:9559

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (4 preceding siblings ...)
  2021-06-16  5:53 ` luoxhu at gcc dot gnu.org
@ 2021-06-18  1:37 ` luoxhu at gcc dot gnu.org
  2021-06-18  8:32 ` jens.seifert at de dot ibm.com
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-06-18  1:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #6 from luoxhu at gcc dot gnu.org ---
For V4SI, it is also better to use vector splat and vector rotate operations.

revb:
.LFB0:
        .cfi_startproc
        vspltish %v1,8
        vspltisw %v0,-16
        vrlh %v2,%v2,%v1
        vrlw %v2,%v2,%v0
        blr


Performance improved from 7.322s to 2.445s with a small benchmark due to load
instruction replaced.

But for V2DI, we don't have "vspltisd" to splat {32,32} to vector register
before Power9, so lvx is still required?

vector unsigned long long revb_pwr7_l(vector unsigned long long a)
{
     return vec_rl(a, vec_splats((unsigned long long)32));
} 

generates:

revb_pwr7_l:
.LFB1:
        .cfi_startproc
.LCF1:
0:      addis 2,12,.TOC.-.LCF1@ha
        addi 2,2,.TOC.-.LCF1@l
        .localentry     revb_pwr7_l,.-revb_pwr7_l
        addis %r9,%r2,.LC0@toc@ha
        addi %r9,%r9,.LC0@toc@l
        lvx %v0,0,%r9
        vrld %v2,%v2,%v0
        blr
.LC0:
        .quad   32
        .quad   32
        .align 4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (5 preceding siblings ...)
  2021-06-18  1:37 ` luoxhu at gcc dot gnu.org
@ 2021-06-18  8:32 ` jens.seifert at de dot ibm.com
  2021-06-21  2:29 ` luoxhu at gcc dot gnu.org
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jens.seifert at de dot ibm.com @ 2021-06-18  8:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #7 from Jens Seifert <jens.seifert at de dot ibm.com> ---
Regarding vec_revb for vector unsigned int. I agree that
revb:
.LFB0:
        .cfi_startproc
        vspltish %v1,8
        vspltisw %v0,-16
        vrlh %v2,%v2,%v1
        vrlw %v2,%v2,%v0
        blr

works. But in this case, I would prefer the vperm approach assuming that the
loaded constant for the permute vector can be re-used multiple times.
But please get rid of the xxlnor 32,32,32. That does not make sense after
loading a constant. Change the constant that need to be loaded.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (6 preceding siblings ...)
  2021-06-18  8:32 ` jens.seifert at de dot ibm.com
@ 2021-06-21  2:29 ` luoxhu at gcc dot gnu.org
  2021-06-21  4:20 ` jens.seifert at de dot ibm.com
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-06-21  2:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #8 from luoxhu at gcc dot gnu.org ---
(In reply to Jens Seifert from comment #7)
> Regarding vec_revb for vector unsigned int. I agree that
> revb:
> .LFB0:
>         .cfi_startproc
>         vspltish %v1,8
>         vspltisw %v0,-16
>         vrlh %v2,%v2,%v1
>         vrlw %v2,%v2,%v0
>         blr
> 
> works. But in this case, I would prefer the vperm approach assuming that the
> loaded constant for the permute vector can be re-used multiple times.
> But please get rid of the xxlnor 32,32,32. That does not make sense after
> loading a constant. Change the constant that need to be loaded.

xxlnor is LE specific requirement(not existed if build with -mbig), we need to
turn the index {0,1,2,3} to {31, 30,29,28} for vperm usage, it is required
otherwise produces incorrect result:

 6|    0x0000000010000630 <+16>:    lvx     v0,0,r9
 7+>   0x0000000010000634 <+20>:    xxlnor  vs32,vs32,vs32
 8|    0x0000000010000638 <+24>:    vperm   v2,v2,v2,v0
 9|    0x000000001000063c <+28>:    blr

(gdb)
0x0000000010000634 in revb ()
2: /x $vs34.uint128 = 0x42345678323456782234567812345678
5: /x $vs32.uint128 = 0xc0d0e0f08090a0b0405060700010203
(gdb) si
0x0000000010000638 in revb ()
2: /x $vs34.uint128 = 0x42345678323456782234567812345678
5: /x $vs32.uint128 = 0xf3f2f1f0f7f6f5f4fbfaf9f8fffefdfc
(gdb) si
0x000000001000063c in revb ()
2: /x $vs34.uint128 = 0x78563442785634327856342278563412
5: /x $vs32.uint128 = 0xf3f2f1f0f7f6f5f4fbfaf9f8fffefdfc



Quoted from the ISA:

vperm VRT,VRA,VRB,VRC

vsrc.qword[0] ← VSR[VRA+32]
vsrc.qword[1] ← VSR[VRB+32]
do i = 0 to 15
index ← VSR[VRC+32].byte[i].bit[3:7]
VSR[VRT+32].byte[i] ← src.byte[index]
end

Let the source vector be the concatenation of the
contents of VSR[VRA+32] followed by the contents of
VSR[VRB+32].
For each integer value i from 0 to 15, do the following.
Let index be the value specified by bits 3:7 of byte
element i of VSR[VRC+32].
The contents of byte element index of src are
placed into byte element i of VSR[VRT+32].

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (7 preceding siblings ...)
  2021-06-21  2:29 ` luoxhu at gcc dot gnu.org
@ 2021-06-21  4:20 ` jens.seifert at de dot ibm.com
  2021-06-21 12:42 ` wschmidt at gcc dot gnu.org
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: jens.seifert at de dot ibm.com @ 2021-06-21  4:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #9 from Jens Seifert <jens.seifert at de dot ibm.com> ---
I know that if I would use vec_perm builtin as an end user, that you then need
to fulfill to the LE specification, but you can always optimize the code as you
like as long as it creates correct results afterwards.

load constant
xxlnor constant

can always be transformed to 

load inverse constant.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (8 preceding siblings ...)
  2021-06-21  4:20 ` jens.seifert at de dot ibm.com
@ 2021-06-21 12:42 ` wschmidt at gcc dot gnu.org
  2021-06-21 12:46 ` wschmidt at gcc dot gnu.org
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-06-21 12:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #10 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Right, it would be a good optimization.  We've stopped focusing much on P8
optimization work at this point simply because of lack of resources.

The needed transform is to recognize load-xxlnor-vperm as a group and combine
into invload-vperm.  But this requires the loaded constant not be used
elsewhere (unlikely, but possible), or if it is, that all such uses are also
xxlnor-vperm, so dataflow analysis for reached uses is required.  Not
completely trivial.

Because it's a P8-only optimization, it's bit lower on the priority list.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (9 preceding siblings ...)
  2021-06-21 12:42 ` wschmidt at gcc dot gnu.org
@ 2021-06-21 12:46 ` wschmidt at gcc dot gnu.org
  2021-06-21 19:27 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2021-06-21 12:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #11 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
Segher, does this fit naturally in combine?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (10 preceding siblings ...)
  2021-06-21 12:46 ` wschmidt at gcc dot gnu.org
@ 2021-06-21 19:27 ` segher at gcc dot gnu.org
  2021-06-22  2:05 ` luoxhu at gcc dot gnu.org
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-21 19:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #12 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Bill Schmidt from comment #11)
> Segher, does this fit naturally in combine?

This is just constant folding, combine won't have much to do with it.

It is always better (namely, lower latency) to use one vector permute
than to have multiple dependent permutation-class instructions.  Combine
will automatically pick this up when it gets the chance.  Does it here
though, or are there still some unspecs here that make all this non-clear?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (11 preceding siblings ...)
  2021-06-21 19:27 ` segher at gcc dot gnu.org
@ 2021-06-22  2:05 ` luoxhu at gcc dot gnu.org
  2021-06-23 20:06 ` segher at gcc dot gnu.org
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2021-06-22  2:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #13 from luoxhu at gcc dot gnu.org ---
It is not visible in combine due to the constant data is in *.LC0 and
UNSPEC_VPERM. Will shelf this and switch to other high priority issues.

pr100866.c.277r.combine:

(note 4 0 20 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 20 4 2 2 (set (reg:V8HI 126)
        (reg:V8HI 66 %v2 [ a ])) "pr100866.c":18:1 1132 {vsx_movv8hi_64bit}
     (expr_list:REG_DEAD (reg:V8HI 66 %v2 [ a ])
        (nil)))
(note 2 20 3 2 NOTE_INSN_DELETED)
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 18 2 (set (reg/f:DI 122)
        (unspec:DI [
                (symbol_ref/u:DI ("*.LC0") [flags 0x82])
                (reg:DI 2 %r2)
            ] UNSPEC_TOCREL)) "pr100866.c":19:13 719 {*tocrefdi}
     (expr_list:REG_EQUAL (symbol_ref/u:DI ("*.LC0") [flags 0x82])
        (nil)))
(insn 18 6 9 2 (set (reg:V16QI 123)
        (mem/u/c:V16QI (and:DI (reg/f:DI 122)
                (const_int -16 [0xfffffffffffffff0])) [0  S16 A128]))
"pr100866.c":19:13 1131 {vsx_movv16qi_64bit}
     (expr_list:REG_DEAD (reg/f:DI 122)
        (nil)))
(insn 9 18 10 2 (set (reg:V16QI 124)
        (not:V16QI (reg:V16QI 123))) "pr100866.c":19:13 508 {one_cmplv16qi2}
     (expr_list:REG_DEAD (reg:V16QI 123)
        (nil)))
(note 10 9 15 2 NOTE_INSN_DELETED)
(insn 15 10 16 2 (set (reg/i:V8HI 66 %v2)
        (unspec:V8HI [
                (reg:V8HI 126) repeated x2
                (reg:V16QI 124)
            ] UNSPEC_VPERM)) "pr100866.c":20:1 1830 {altivec_vperm_v8hi_direct}
     (expr_list:REG_DEAD (reg:V16QI 124)
        (expr_list:REG_DEAD (reg:V8HI 126)
            (nil))))
(insn 16 15 0 2 (use (reg/i:V8HI 66 %v2)) "pr100866.c":20:1 -1
     (nil))

;; Combiner totals: 12 attempts, 12 substitutions (2 requiring new space),

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (12 preceding siblings ...)
  2021-06-22  2:05 ` luoxhu at gcc dot gnu.org
@ 2021-06-23 20:06 ` segher at gcc dot gnu.org
  2022-11-02  8:42 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: segher at gcc dot gnu.org @ 2021-06-23 20:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #14 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to luoxhu from comment #13)
> It is not visible in combine due to the constant data is in *.LC0 and

combine can see things in the constant pool in various ways though (just
like many other parts of the compiler).  But yeah, unspecs are a big
hurdle to optimisation always.  If we would express this as some "real"
RTL we would need a few variants: one that takes only one register as
data input and another that takes two; one that has all permutation
indices in range and another that masks them; and maybe a few more.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (13 preceding siblings ...)
  2021-06-23 20:06 ` segher at gcc dot gnu.org
@ 2022-11-02  8:42 ` cvs-commit at gcc dot gnu.org
  2022-12-01  2:07 ` cvs-commit at gcc dot gnu.org
  2022-12-14  5:52 ` guihaoc at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-02  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by HaoChen Gui <guihaoc@gcc.gnu.org>:

https://gcc.gnu.org/g:eaba55ffef961c28f6a15d845a4d6b77b8a8bab1

commit r13-3603-geaba55ffef961c28f6a15d845a4d6b77b8a8bab1
Author: Xionghu Luo <xionghuluo@tencent.com>
Date:   Wed Oct 12 10:43:38 2022 +0800

    rs6000: Byte reverse V8HI on Power8 by vector rotation.

    gcc/
            PR target/100866
            * config/rs6000/altivec.md: (*altivec_vrl<VI_char>): Named to...
            (altivec_vrl<VI_char>): ...this.
            * config/rs6000/vsx.md (revb_<mode>): Call vspltish and vrlh when
            target is Power8 and mode is V8HI.

    gcc/testsuite/
            PR target/100866
            * gcc.target/powerpc/pr100866-2.c: New.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (14 preceding siblings ...)
  2022-11-02  8:42 ` cvs-commit at gcc dot gnu.org
@ 2022-12-01  2:07 ` cvs-commit at gcc dot gnu.org
  2022-12-14  5:52 ` guihaoc at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-01  2:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by HaoChen Gui <guihaoc@gcc.gnu.org>:

https://gcc.gnu.org/g:9d68cba5eb20442f8075b8f92d1b20a00022852f

commit r13-4423-g9d68cba5eb20442f8075b8f92d1b20a00022852f
Author: Haochen Gui <guihaoc@gcc.gnu.org>
Date:   Wed Nov 30 15:05:59 2022 +0800

    rs6000: Generates permute index directly for little endian targets
(PR100866)

    2022-10-11  Haochen Gui <guihaoc@linux.ibm.com>

    gcc/
            PR target/100866
            * config/rs6000/rs6000-call.cc (swap_endian_selector_for_mode):
            Generate permute index directly for little endian targets.
            * config/rs6000/vsx.md (revb_<mode>): Call vprem directly with
            corresponding permute indexes.

    gcc/testsuite/
            PR target/100866
            * gcc.target/powerpc/pr100866-1.c: New.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Bug target/100866] PPC: Inefficient code for vec_revb(vector unsigned short) < P9
  2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
                   ` (15 preceding siblings ...)
  2022-12-01  2:07 ` cvs-commit at gcc dot gnu.org
@ 2022-12-14  5:52 ` guihaoc at gcc dot gnu.org
  16 siblings, 0 replies; 18+ messages in thread
From: guihaoc at gcc dot gnu.org @ 2022-12-14  5:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100866

HaoChen Gui <guihaoc at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED
           Assignee|unassigned at gcc dot gnu.org      |guihaoc at gcc dot gnu.org
                 CC|                            |guihaoc at gcc dot gnu.org

--- Comment #17 from HaoChen Gui <guihaoc at gcc dot gnu.org> ---
Both issues are fixed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-12-14  5:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02  7:14 [Bug target/100866] New: PPC: Inefficient code for vec_revb(vector unsigned short) < P9 jens.seifert at de dot ibm.com
2021-06-02 15:03 ` [Bug target/100866] " segher at gcc dot gnu.org
2021-06-15  9:22 ` luoxhu at gcc dot gnu.org
2021-06-15  9:56 ` luoxhu at gcc dot gnu.org
2021-06-15 13:50 ` segher at gcc dot gnu.org
2021-06-16  5:53 ` luoxhu at gcc dot gnu.org
2021-06-18  1:37 ` luoxhu at gcc dot gnu.org
2021-06-18  8:32 ` jens.seifert at de dot ibm.com
2021-06-21  2:29 ` luoxhu at gcc dot gnu.org
2021-06-21  4:20 ` jens.seifert at de dot ibm.com
2021-06-21 12:42 ` wschmidt at gcc dot gnu.org
2021-06-21 12:46 ` wschmidt at gcc dot gnu.org
2021-06-21 19:27 ` segher at gcc dot gnu.org
2021-06-22  2:05 ` luoxhu at gcc dot gnu.org
2021-06-23 20:06 ` segher at gcc dot gnu.org
2022-11-02  8:42 ` cvs-commit at gcc dot gnu.org
2022-12-01  2:07 ` cvs-commit at gcc dot gnu.org
2022-12-14  5:52 ` guihaoc at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).