public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] rs6000: Add vec_unpacku_{hi,lo}_v4si
@ 2021-08-05  2:06 Kewen.Lin
  2021-08-06 13:10 ` Bill Schmidt
  0 siblings, 1 reply; 6+ messages in thread
From: Kewen.Lin @ 2021-08-05  2:06 UTC (permalink / raw)
  To: GCC Patches; +Cc: Segher Boessenkool, Bill Schmidt, David Edelsohn

[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]

Hi,

The existing vec_unpacku_{hi,lo} supports emulated unsigned
unpacking for short and char but misses the support for int.
This patch adds the support for vec_unpacku_{hi,lo}_v4si.

Meanwhile, the current implementation uses vector permutation
way, which requires one extra customized constant vector as
the permutation control vector.  It's better to use vector
merge high/low with zero constant vector, to save the space
in constant area as well as the cost to initialize pcv in
prologue.  This patch updates it with vector merging and
simplify it with iterators.

Bootstrapped & regtested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8.

btw, the loop in unpack-vectorize-2.c doesn't get vectorized
without this patch, unpack-vectorize-[13]* is to verify
the vector merging and simplification works expectedly.

Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

	* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
	(vec_unpacku_hi_v8hi): Likewise.
	(vec_unpacku_lo_v16qi): Likewise.
	(vec_unpacku_lo_v8hi): Likewise.
	(vec_unpacku_hi_<VP_small_lc>): New define_expand.
	(vec_unpacku_lo_<VP_small_lc>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/unpack-vectorize-1.c: New test.
	* gcc.target/powerpc/unpack-vectorize-1.h: New test.
	* gcc.target/powerpc/unpack-vectorize-2.c: New test.
	* gcc.target/powerpc/unpack-vectorize-2.h: New test.
	* gcc.target/powerpc/unpack-vectorize-3.c: New test.
	* gcc.target/powerpc/unpack-vectorize-3.h: New test.
	* gcc.target/powerpc/unpack-vectorize-run-1.c: New test.
	* gcc.target/powerpc/unpack-vectorize-run-2.c: New test.
	* gcc.target/powerpc/unpack-vectorize-run-3.c: New test.
	* gcc.target/powerpc/unpack-vectorize.h: New test.

[-- Attachment #2: 0002-rs6000-Add-vec_unpacku_-hi-lo-_v4si.patch --]
[-- Type: text/plain, Size: 17806 bytes --]

---
 gcc/config/rs6000/altivec.md                  | 158 ++++--------------
 .../gcc.target/powerpc/unpack-vectorize-1.c   |  18 ++
 .../gcc.target/powerpc/unpack-vectorize-1.h   |  14 ++
 .../gcc.target/powerpc/unpack-vectorize-2.c   |  12 ++
 .../gcc.target/powerpc/unpack-vectorize-2.h   |   7 +
 .../gcc.target/powerpc/unpack-vectorize-3.c   |  11 ++
 .../gcc.target/powerpc/unpack-vectorize-3.h   |   7 +
 .../powerpc/unpack-vectorize-run-1.c          |  24 +++
 .../powerpc/unpack-vectorize-run-2.c          |  16 ++
 .../powerpc/unpack-vectorize-run-3.c          |  16 ++
 .../gcc.target/powerpc/unpack-vectorize.h     |  42 +++++
 11 files changed, 196 insertions(+), 129 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index d70c17e6bc2..0e8b66cd6a5 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -134,10 +134,8 @@ (define_c_enum "unspec"
    UNSPEC_VMULWLUH
    UNSPEC_VMULWHSH
    UNSPEC_VMULWLSH
-   UNSPEC_VUPKHUB
-   UNSPEC_VUPKHUH
-   UNSPEC_VUPKLUB
-   UNSPEC_VUPKLUH
+   UNSPEC_VUPKHUBHW
+   UNSPEC_VUPKLUBHW
    UNSPEC_VPERMSI
    UNSPEC_VPERMHI
    UNSPEC_INTERHI
@@ -3885,143 +3883,45 @@ (define_insn "xxeval"
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
-(define_expand "vec_unpacku_hi_v16qi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
-                     UNSPEC_VUPKHUB))]
-  "TARGET_ALTIVEC"      
-{  
-  rtx vzero = gen_reg_rtx (V8HImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
-   
-  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
-   
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
-  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
-
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
-  DONE;
-})
-
-(define_expand "vec_unpacku_hi_v8hi"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
-                     UNSPEC_VUPKHUH))]
+(define_expand "vec_unpacku_hi_<VP_small_lc>"
+  [(set (match_operand:VP 0 "register_operand" "=v")
+        (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+         UNSPEC_VUPKHUBHW))]
   "TARGET_ALTIVEC"
 {
-  rtx vzero = gen_reg_rtx (V4SImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
+  rtx vzero = gen_reg_rtx (<VP_small>mode);
+  emit_insn (gen_altivec_vspltis<VU_char> (vzero, const0_rtx));
 
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
- 
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 :  6);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  0 : 17);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 :  4);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ?  2 : 17);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
-  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 :  2);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ?  4 : 17);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  0);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
-
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
-  DONE;
-})
+  rtx res = gen_reg_rtx (<VP_small>mode);
+  rtx op1 = operands[1];
 
-(define_expand "vec_unpacku_lo_v16qi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
-                     UNSPEC_VUPKLUB))]
-  "TARGET_ALTIVEC"
-{
-  rtx vzero = gen_reg_rtx (V8HImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
-
-  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
-
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  8 : 16);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
-  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1));
+  else
+    emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero));
 
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
+  emit_insn (gen_move_insn (operands[0], gen_lowpart (<MODE>mode, res)));
   DONE;
 })
 
-(define_expand "vec_unpacku_lo_v8hi"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
-                     UNSPEC_VUPKLUH))]
+(define_expand "vec_unpacku_lo_<VP_small_lc>"
+  [(set (match_operand:VP 0 "register_operand" "=v")
+        (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+         UNSPEC_VUPKLUBHW))]
   "TARGET_ALTIVEC"
 {
-  rtx vzero = gen_reg_rtx (V4SImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
+  rtx vzero = gen_reg_rtx (<VP_small>mode);
+  emit_insn (gen_altivec_vspltis<VU_char> (vzero, const0_rtx));
 
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
- 
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  8 : 17);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
-  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  8);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
+  rtx res = gen_reg_rtx (<VP_small>mode);
+  rtx op1 = operands[1];
 
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrgl<VU_char> (res, vzero, op1));
+  else
+    emit_insn (gen_altivec_vmrgh<VU_char> (res, op1, vzero));
+
+  emit_insn (gen_move_insn (operands[0], gen_lowpart (<MODE>mode, res)));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
new file mode 100644
index 00000000000..2621d753baa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+/* Test if unpack vectorization succeeds for type signed/unsigned
+   short and char.  */
+
+#include "unpack-vectorize-1.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
+/* { dg-final { scan-assembler {\mvupkhsb\M} } } */
+/* { dg-final { scan-assembler {\mvupklsb\M} } } */
+/* { dg-final { scan-assembler {\mvupkhsh\M} } } */
+/* { dg-final { scan-assembler {\mvupklsh\M} } } */
+/* { dg-final { scan-assembler {\mvmrghb\M} } } */
+/* { dg-final { scan-assembler {\mvmrglb\M} } } */
+/* { dg-final { scan-assembler {\mvmrghh\M} } } */
+/* { dg-final { scan-assembler {\mvmrglh\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
new file mode 100644
index 00000000000..1cb89aba392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
@@ -0,0 +1,14 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (si)
+DEF_ARR (ui)
+DEF_ARR (sh)
+DEF_ARR (uh)
+DEF_ARR (sc)
+DEF_ARR (uc)
+
+TEST1 (sh, si)
+TEST1 (uh, ui)
+TEST1 (sc, sh)
+TEST1 (uc, uh)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
new file mode 100644
index 00000000000..3e7e97da43c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+/* Test if unsigned int unpack vectorization succeeds.  V2DImode is
+   supported since Power7 so guard it under Power7 and up.  */
+
+#include "unpack-vectorize-2.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-assembler {\mxxmrghw\M} } } */
+/* { dg-final { scan-assembler {\mxxmrglw\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
new file mode 100644
index 00000000000..e199229e6f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
@@ -0,0 +1,7 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (ui)
+DEF_ARR (ull)
+
+TEST1 (ui, ull)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
new file mode 100644
index 00000000000..a246e7e26b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+/* Test if signed int unpack vectorization succeeds.  */
+
+#include "unpack-vectorize-3.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-assembler {\mvupkhsw\M} } } */
+/* { dg-final { scan-assembler {\mvupklsw\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
new file mode 100644
index 00000000000..6a5191d28a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
@@ -0,0 +1,7 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (si)
+DEF_ARR (sll)
+
+TEST1 (si, sll)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
new file mode 100644
index 00000000000..51f0e67524f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-1.h"
+
+/* Test if unpack vectorization cases on signed/unsigned short and char
+   run successfully.  */
+
+CHECK1 (sh, si)
+CHECK1 (uh, ui)
+CHECK1 (sc, sh)
+CHECK1 (uc, uh)
+
+int
+main ()
+{
+  check1_sh_si ();
+  check1_uh_ui ();
+  check1_sc_sh ();
+  check1_uc_uh ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
new file mode 100644
index 00000000000..6d243602bbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-2.h"
+
+/* Test if unpack vectorization cases on unsigned int run successfully.  */
+
+CHECK1 (ui, ull)
+
+int
+main ()
+{
+  check1_ui_ull ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
new file mode 100644
index 00000000000..fec33c46abc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-3.h"
+
+/* Test if unpack vectorization cases on signed int run successfully.  */
+
+CHECK1 (si, sll)
+
+int
+main ()
+{
+  check1_si_sll ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
new file mode 100644
index 00000000000..11fa7d4aa6f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
@@ -0,0 +1,42 @@
+typedef signed long long sll;
+typedef unsigned long long ull;
+typedef signed int si;
+typedef unsigned int ui;
+typedef signed short sh;
+typedef unsigned short uh;
+typedef signed char sc;
+typedef unsigned char uc;
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#define ALIGN_ATTR __attribute__((__aligned__(ALIGN)))
+
+#define N 128
+
+#define DEF_ARR(TYPE)                                                         \
+  TYPE TYPE##_a[N] ALIGN_ATTR;                                                \
+  TYPE TYPE##_b[N] ALIGN_ATTR;                                                \
+  TYPE TYPE##_c[N] ALIGN_ATTR;
+
+#define TEST1(NTYPE, WTYPE)                                                    \
+  __attribute__((noipa)) void test1_##NTYPE##_##WTYPE() {                      \
+    for (int i = 0; i < N; i++)                                                \
+      WTYPE##_c[i] = NTYPE##_a[i] + NTYPE##_b[i];                              \
+  }
+
+#define CHECK1(NTYPE, WTYPE)                                                   \
+  __attribute__((noipa, optimize(0))) void check1_##NTYPE##_##WTYPE() {        \
+    for (int i = 0; i < N; i++) {                                              \
+      NTYPE##_a[i] = 2 * i * sizeof(NTYPE) + 10;                               \
+      NTYPE##_b[i] = 7 * i * sizeof(NTYPE) / 5 - 10;                           \
+    }                                                                          \
+    test1_##NTYPE##_##WTYPE();                                                 \
+    for (int i = 0; i < N; i++) {                                              \
+      WTYPE exp = NTYPE##_a[i] + NTYPE##_b[i];                                 \
+      if (WTYPE##_c[i] != exp)                                                 \
+        __builtin_abort();                                                     \
+    }                                                                          \
+  }
+
-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rs6000: Add vec_unpacku_{hi,lo}_v4si
  2021-08-05  2:06 [PATCH] rs6000: Add vec_unpacku_{hi,lo}_v4si Kewen.Lin
@ 2021-08-06 13:10 ` Bill Schmidt
  2021-08-06 17:37   ` Segher Boessenkool
  2021-08-09  2:53   ` [PATCH v2] " Kewen.Lin
  0 siblings, 2 replies; 6+ messages in thread
From: Bill Schmidt @ 2021-08-06 13:10 UTC (permalink / raw)
  To: Kewen.Lin, GCC Patches; +Cc: Segher Boessenkool, David Edelsohn

Hi Kewen,

On 8/4/21 9:06 PM, Kewen.Lin wrote:
> Hi,
>
> The existing vec_unpacku_{hi,lo} supports emulated unsigned
> unpacking for short and char but misses the support for int.
> This patch adds the support for vec_unpacku_{hi,lo}_v4si.
>
> Meanwhile, the current implementation uses vector permutation
> way, which requires one extra customized constant vector as
> the permutation control vector.  It's better to use vector
> merge high/low with zero constant vector, to save the space
> in constant area as well as the cost to initialize pcv in
> prologue.  This patch updates it with vector merging and
> simplify it with iterators.
>
> Bootstrapped & regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> btw, the loop in unpack-vectorize-2.c doesn't get vectorized
> without this patch, unpack-vectorize-[13]* is to verify
> the vector merging and simplification works expectedly.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
> 	* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
> 	(vec_unpacku_hi_v8hi): Likewise.
> 	(vec_unpacku_lo_v16qi): Likewise.
> 	(vec_unpacku_lo_v8hi): Likewise.
> 	(vec_unpacku_hi_<VP_small_lc>): New define_expand.
> 	(vec_unpacku_lo_<VP_small_lc>): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/unpack-vectorize-1.c: New test.
> 	* gcc.target/powerpc/unpack-vectorize-1.h: New test.
> 	* gcc.target/powerpc/unpack-vectorize-2.c: New test.
> 	* gcc.target/powerpc/unpack-vectorize-2.h: New test.
> 	* gcc.target/powerpc/unpack-vectorize-3.c: New test.
> 	* gcc.target/powerpc/unpack-vectorize-3.h: New test.
> 	* gcc.target/powerpc/unpack-vectorize-run-1.c: New test.
> 	* gcc.target/powerpc/unpack-vectorize-run-2.c: New test.
> 	* gcc.target/powerpc/unpack-vectorize-run-3.c: New test.
> 	* gcc.target/powerpc/unpack-vectorize.h: New test.
> ---
>  gcc/config/rs6000/altivec.md                  | 158 ++++--------------
>  .../gcc.target/powerpc/unpack-vectorize-1.c   |  18 ++
>  .../gcc.target/powerpc/unpack-vectorize-1.h   |  14 ++
>  .../gcc.target/powerpc/unpack-vectorize-2.c   |  12 ++
>  .../gcc.target/powerpc/unpack-vectorize-2.h   |   7 +
>  .../gcc.target/powerpc/unpack-vectorize-3.c   |  11 ++
>  .../gcc.target/powerpc/unpack-vectorize-3.h   |   7 +
>  .../powerpc/unpack-vectorize-run-1.c          |  24 +++
>  .../powerpc/unpack-vectorize-run-2.c          |  16 ++
>  .../powerpc/unpack-vectorize-run-3.c          |  16 ++
>  .../gcc.target/powerpc/unpack-vectorize.h     |  42 +++++
>  11 files changed, 196 insertions(+), 129 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index d70c17e6bc2..0e8b66cd6a5 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -134,10 +134,8 @@ (define_c_enum "unspec"
>     UNSPEC_VMULWLUH
>     UNSPEC_VMULWHSH
>     UNSPEC_VMULWLSH
> -   UNSPEC_VUPKHUB
> -   UNSPEC_VUPKHUH
> -   UNSPEC_VUPKLUB
> -   UNSPEC_VUPKLUH
> +   UNSPEC_VUPKHUBHW
> +   UNSPEC_VUPKLUBHW


Up to you, but... maybe just UNSPEC_VUPKHU and UNSPEC_VUPKLU, in case we 
extend this later to other types.  Fine either way.

>     UNSPEC_VPERMSI
>     UNSPEC_VPERMHI
>     UNSPEC_INTERHI
> @@ -3885,143 +3883,45 @@ (define_insn "xxeval"
>     [(set_attr "type" "vecsimple")
>      (set_attr "prefixed" "yes")])
>
> -(define_expand "vec_unpacku_hi_v16qi"
> -  [(set (match_operand:V8HI 0 "register_operand" "=v")
> -        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
> -                     UNSPEC_VUPKHUB))]
> -  "TARGET_ALTIVEC"
> -{
> -  rtx vzero = gen_reg_rtx (V8HImode);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -  rtvec v = rtvec_alloc (16);
> -  bool be = BYTES_BIG_ENDIAN;
> -
> -  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
> -
> -  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
> -  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
> -  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
> -  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
> -  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
> -  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
> -  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
> -  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
> -  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
> -  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, 
> v)));
> -  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, 
> mask));
> -  DONE;
> -})
> -
> -(define_expand "vec_unpacku_hi_v8hi"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v")
> -        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
> -                     UNSPEC_VUPKHUH))]
> +(define_expand "vec_unpacku_hi_<VP_small_lc>"
> +  [(set (match_operand:VP 0 "register_operand" "=v")
> +        (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
> +         UNSPEC_VUPKHUBHW))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx vzero = gen_reg_rtx (V4SImode);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -  rtvec v = rtvec_alloc (16);
> -  bool be = BYTES_BIG_ENDIAN;
> +  rtx vzero = gen_reg_rtx (<VP_small>mode);
> +  emit_insn (gen_altivec_vspltis<VU_char> (vzero, const0_rtx));
>
> -  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
> -
> -  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
> -  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 :  6);
> -  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  0 : 17);
> -  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
> -  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
> -  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 :  4);
> -  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ?  2 : 17);
> -  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
> -  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
> -  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 :  2);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ?  4 : 17);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  0);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, 
> v)));
> -  emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
> -  DONE;
> -})
> +  rtx res = gen_reg_rtx (<VP_small>mode);
> +  rtx op1 = operands[1];
>
> -(define_expand "vec_unpacku_lo_v16qi"
> -  [(set (match_operand:V8HI 0 "register_operand" "=v")
> -        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
> -                     UNSPEC_VUPKLUB))]
> -  "TARGET_ALTIVEC"
> -{
> -  rtx vzero = gen_reg_rtx (V8HImode);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -  rtvec v = rtvec_alloc (16);
> -  bool be = BYTES_BIG_ENDIAN;
> -
> -  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
> -
> -  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
> -  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  8 : 16);
> -  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
> -  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
> -  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
> -  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
> -  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
> -  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
> -  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
> -  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1));
> +  else
> +    emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero));
>
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, 
> v)));
> -  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, 
> mask));
> +  emit_insn (gen_move_insn (operands[0], gen_lowpart (<MODE>mode, res)));
>    DONE;
>  })
>
> -(define_expand "vec_unpacku_lo_v8hi"
> -  [(set (match_operand:V4SI 0 "register_operand" "=v")
> -        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
> -                     UNSPEC_VUPKLUH))]
> +(define_expand "vec_unpacku_lo_<VP_small_lc>"
> +  [(set (match_operand:VP 0 "register_operand" "=v")
> +        (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
> +         UNSPEC_VUPKLUBHW))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx vzero = gen_reg_rtx (V4SImode);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -  rtvec v = rtvec_alloc (16);
> -  bool be = BYTES_BIG_ENDIAN;
> +  rtx vzero = gen_reg_rtx (<VP_small>mode);
> +  emit_insn (gen_altivec_vspltis<VU_char> (vzero, const0_rtx));
>
> -  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
> -
> -  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
> -  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
> -  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  8 : 17);
> -  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
> -  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
> -  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
> -  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
> -  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
> -  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
> -  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  8);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
> +  rtx res = gen_reg_rtx (<VP_small>mode);
> +  rtx op1 = operands[1];
>
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, 
> v)));
> -  emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrgl<VU_char> (res, vzero, op1));
> +  else
> +    emit_insn (gen_altivec_vmrgh<VU_char> (res, op1, vzero));
> +
> +  emit_insn (gen_move_insn (operands[0], gen_lowpart (<MODE>mode, res)));
>    DONE;
>  })
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
> new file mode 100644
> index 00000000000..2621d753baa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */


I guess powerpc_altivec_ok is fine.  I was initially concerned since 
unpack-vectorize.h mentions vector long long, but the types aren't 
actually used here.  OK.

> +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model 
> -fdump-tree-vect-details" } */
> +
> +/* Test if unpack vectorization succeeds for type signed/unsigned
> +   short and char.  */
> +
> +#include "unpack-vectorize-1.h"
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
> +/* { dg-final { scan-assembler {\mvupkhsb\M} } } */
> +/* { dg-final { scan-assembler {\mvupklsb\M} } } */
> +/* { dg-final { scan-assembler {\mvupkhsh\M} } } */
> +/* { dg-final { scan-assembler {\mvupklsh\M} } } */
> +/* { dg-final { scan-assembler {\mvmrghb\M} } } */
> +/* { dg-final { scan-assembler {\mvmrglb\M} } } */
> +/* { dg-final { scan-assembler {\mvmrghh\M} } } */
> +/* { dg-final { scan-assembler {\mvmrglh\M} } } */


Suggest that you consider scan-assembler-times 1 to make the tests more 
robust, here and for other tests.

Otherwise the patch looks fine to me.  Recommend maintainers approve 
with or without changes.

Thanks for the improvements!
Bill

> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
> new file mode 100644
> index 00000000000..1cb89aba392
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
> @@ -0,0 +1,14 @@
> +#include "unpack-vectorize.h"
> +
> +DEF_ARR (si)
> +DEF_ARR (ui)
> +DEF_ARR (sh)
> +DEF_ARR (uh)
> +DEF_ARR (sc)
> +DEF_ARR (uc)
> +
> +TEST1 (sh, si)
> +TEST1 (uh, ui)
> +TEST1 (sc, sh)
> +TEST1 (uc, uh)
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
> new file mode 100644
> index 00000000000..3e7e97da43c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize 
> -fno-vect-cost-model -fdump-tree-vect-details" } */
> +
> +/* Test if unsigned int unpack vectorization succeeds.  V2DImode is
> +   supported since Power7 so guard it under Power7 and up.  */
> +
> +#include "unpack-vectorize-2.h"
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { scan-assembler {\mxxmrghw\M} } } */
> +/* { dg-final { scan-assembler {\mxxmrglw\M} } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
> new file mode 100644
> index 00000000000..e199229e6f7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
> @@ -0,0 +1,7 @@
> +#include "unpack-vectorize.h"
> +
> +DEF_ARR (ui)
> +DEF_ARR (ull)
> +
> +TEST1 (ui, ull)
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
> new file mode 100644
> index 00000000000..a246e7e26b6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize 
> -fno-vect-cost-model -fdump-tree-vect-details" } */
> +
> +/* Test if signed int unpack vectorization succeeds.  */
> +
> +#include "unpack-vectorize-3.h"
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> +/* { dg-final { scan-assembler {\mvupkhsw\M} } } */
> +/* { dg-final { scan-assembler {\mvupklsw\M} } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
> new file mode 100644
> index 00000000000..6a5191d28a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
> @@ -0,0 +1,7 @@
> +#include "unpack-vectorize.h"
> +
> +DEF_ARR (si)
> +DEF_ARR (sll)
> +
> +TEST1 (si, sll)
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
> new file mode 100644
> index 00000000000..51f0e67524f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vmx_hw } */
> +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model" 
> } */
> +
> +#include "unpack-vectorize-1.h"
> +
> +/* Test if unpack vectorization cases on signed/unsigned short and char
> +   run successfully.  */
> +
> +CHECK1 (sh, si)
> +CHECK1 (uh, ui)
> +CHECK1 (sc, sh)
> +CHECK1 (uc, uh)
> +
> +int
> +main ()
> +{
> +  check1_sh_si ();
> +  check1_uh_ui ();
> +  check1_sc_sh ();
> +  check1_uc_uh ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
> new file mode 100644
> index 00000000000..6d243602bbf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize 
> -fno-vect-cost-model" } */
> +
> +#include "unpack-vectorize-2.h"
> +
> +/* Test if unpack vectorization cases on unsigned int run 
> successfully.  */
> +
> +CHECK1 (ui, ull)
> +
> +int
> +main ()
> +{
> +  check1_ui_ull ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
> new file mode 100644
> index 00000000000..fec33c46abc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
> @@ -0,0 +1,16 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize 
> -fno-vect-cost-model" } */
> +
> +#include "unpack-vectorize-3.h"
> +
> +/* Test if unpack vectorization cases on signed int run successfully.  */
> +
> +CHECK1 (si, sll)
> +
> +int
> +main ()
> +{
> +  check1_si_sll ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h 
> b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
> new file mode 100644
> index 00000000000..11fa7d4aa6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
> @@ -0,0 +1,42 @@
> +typedef signed long long sll;
> +typedef unsigned long long ull;
> +typedef signed int si;
> +typedef unsigned int ui;
> +typedef signed short sh;
> +typedef unsigned short uh;
> +typedef signed char sc;
> +typedef unsigned char uc;
> +
> +#ifndef ALIGN
> +#define ALIGN 32
> +#endif
> +
> +#define ALIGN_ATTR __attribute__((__aligned__(ALIGN)))
> +
> +#define N 128
> +
> +#define DEF_ARR(TYPE) \
> +  TYPE TYPE##_a[N] 
> ALIGN_ATTR;                                                \
> +  TYPE TYPE##_b[N] 
> ALIGN_ATTR;                                                \
> +  TYPE TYPE##_c[N] ALIGN_ATTR;
> +
> +#define TEST1(NTYPE, 
> WTYPE)                                                    \
> +  __attribute__((noipa)) void test1_##NTYPE##_##WTYPE() 
> {                      \
> +    for (int i = 0; i < N; 
> i++)                                                \
> +      WTYPE##_c[i] = NTYPE##_a[i] + 
> NTYPE##_b[i];                              \
> +  }
> +
> +#define CHECK1(NTYPE, 
> WTYPE)                                                   \
> +  __attribute__((noipa, optimize(0))) void check1_##NTYPE##_##WTYPE() 
> {        \
> +    for (int i = 0; i < N; i++) 
> {                                              \
> +      NTYPE##_a[i] = 2 * i * sizeof(NTYPE) + 
> 10;                               \
> +      NTYPE##_b[i] = 7 * i * sizeof(NTYPE) / 5 - 
> 10;                           \
> + } \
> + test1_##NTYPE##_##WTYPE(); \
> +    for (int i = 0; i < N; i++) 
> {                                              \
> +      WTYPE exp = NTYPE##_a[i] + 
> NTYPE##_b[i];                                 \
> +      if (WTYPE##_c[i] != 
> exp)                                                 \
> + __builtin_abort(); \
> + } \
> +  }
> +
> -- 
> 2.17.1
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rs6000: Add vec_unpacku_{hi,lo}_v4si
  2021-08-06 13:10 ` Bill Schmidt
@ 2021-08-06 17:37   ` Segher Boessenkool
  2021-08-09  2:53   ` [PATCH v2] " Kewen.Lin
  1 sibling, 0 replies; 6+ messages in thread
From: Segher Boessenkool @ 2021-08-06 17:37 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Kewen.Lin, GCC Patches, David Edelsohn

On Fri, Aug 06, 2021 at 08:10:05AM -0500, Bill Schmidt wrote:
> On 8/4/21 9:06 PM, Kewen.Lin wrote:
> >-   UNSPEC_VUPKHUB
> >-   UNSPEC_VUPKHUH
> >-   UNSPEC_VUPKLUB
> >-   UNSPEC_VUPKLUH
> >+   UNSPEC_VUPKHUBHW
> >+   UNSPEC_VUPKLUBHW
> 
> Up to you, but... maybe just UNSPEC_VUPKHU and UNSPEC_VUPKLU, in case we 
> extend this later to other types.

Yes please.  The "BHW" isn't useful.

> Otherwise the patch looks fine to me.  Recommend maintainers approve 
> with or without changes.

With.  I'll reply to Ke Wen's mail separately, your reply is whitespace
damaged (format=flawed it looks like).


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] rs6000: Add vec_unpacku_{hi,lo}_v4si
  2021-08-06 13:10 ` Bill Schmidt
  2021-08-06 17:37   ` Segher Boessenkool
@ 2021-08-09  2:53   ` Kewen.Lin
  2021-08-24 13:02     ` Segher Boessenkool
  1 sibling, 1 reply; 6+ messages in thread
From: Kewen.Lin @ 2021-08-09  2:53 UTC (permalink / raw)
  To: wschmidt; +Cc: Segher Boessenkool, David Edelsohn, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4639 bytes --]

Hi Bill,

Thanks for the comments!

on 2021/8/6 下午9:10, Bill Schmidt wrote:
> Hi Kewen,
> 
> On 8/4/21 9:06 PM, Kewen.Lin wrote:
>> Hi,
>>
>> The existing vec_unpacku_{hi,lo} supports emulated unsigned
>> unpacking for short and char but misses the support for int.
>> This patch adds the support for vec_unpacku_{hi,lo}_v4si.
>>
>> Meanwhile, the current implementation uses vector permutation
>> way, which requires one extra customized constant vector as
>> the permutation control vector.  It's better to use vector
>> merge high/low with zero constant vector, to save the space
>> in constant area as well as the cost to initialize pcv in
>> prologue.  This patch updates it with vector merging and
>> simplify it with iterators.
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8.
>>
>> btw, the loop in unpack-vectorize-2.c doesn't get vectorized
>> without this patch, unpack-vectorize-[13]* is to verify
>> the vector merging and simplification works expectedly.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
...
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index d70c17e6bc2..0e8b66cd6a5 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -134,10 +134,8 @@ (define_c_enum "unspec"
>>     UNSPEC_VMULWLUH
>>     UNSPEC_VMULWHSH
>>     UNSPEC_VMULWLSH
>> -   UNSPEC_VUPKHUB
>> -   UNSPEC_VUPKHUH
>> -   UNSPEC_VUPKLUB
>> -   UNSPEC_VUPKLUH
>> +   UNSPEC_VUPKHUBHW
>> +   UNSPEC_VUPKLUBHW
> 
> 
> Up to you, but... maybe just UNSPEC_VUPKHU and UNSPEC_VUPKLU, in case we extend this later to other types.  Fine either way.
> 

Good point!  Fixed.

>>     UNSPEC_VPERMSI
>>     UNSPEC_VPERMHI
>>     UNSPEC_INTERHI
>> @@ -3885,143 +3883,45 @@ (define_insn "xxeval"
>>     [(set_attr "type" "vecsimple")
>>      (set_attr "prefixed" "yes")])
>>
...
>> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
>> new file mode 100644
>> index 00000000000..2621d753baa
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
>> @@ -0,0 +1,18 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
> 
> 
> I guess powerpc_altivec_ok is fine.  I was initially concerned since unpack-vectorize.h mentions vector long long, but the types aren't actually used here.  OK.
> 

Yeah, I think it's fine since unpack-vectorize.h only typedef long long and it doesn't
even have type vector long long.

>> +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
>> +
>> +/* Test if unpack vectorization succeeds for type signed/unsigned
>> +   short and char.  */
>> +
>> +#include "unpack-vectorize-1.h"
>> +
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
>> +/* { dg-final { scan-assembler {\mvupkhsb\M} } } */
>> +/* { dg-final { scan-assembler {\mvupklsb\M} } } */
>> +/* { dg-final { scan-assembler {\mvupkhsh\M} } } */
>> +/* { dg-final { scan-assembler {\mvupklsh\M} } } */
>> +/* { dg-final { scan-assembler {\mvmrghb\M} } } */
>> +/* { dg-final { scan-assembler {\mvmrglb\M} } } */
>> +/* { dg-final { scan-assembler {\mvmrghh\M} } } */
>> +/* { dg-final { scan-assembler {\mvmrglh\M} } } */
> 
> 
> Suggest that you consider scan-assembler-times 1 to make the tests more robust, here and for other tests.
> 

Updated, thanks!  I was worried that possible future unrolling tweaking can make
the hardcoded times fragile and thought it might be trivial to check the times.
"-fno-unroll-loops" has been added to disable unrolling explicitly as well.

Re-tested on BE and LE, the test results looks fine.

BR,
Kewen
-----
gcc/ChangeLog:

	* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
	(vec_unpacku_hi_v8hi): Likewise.
	(vec_unpacku_lo_v16qi): Likewise.
	(vec_unpacku_lo_v8hi): Likewise.
	(vec_unpacku_hi_<VP_small_lc>): New define_expand.
	(vec_unpacku_lo_<VP_small_lc>): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/unpack-vectorize-1.c: New test.
	* gcc.target/powerpc/unpack-vectorize-1.h: New test.
	* gcc.target/powerpc/unpack-vectorize-2.c: New test.
	* gcc.target/powerpc/unpack-vectorize-2.h: New test.
	* gcc.target/powerpc/unpack-vectorize-3.c: New test.
	* gcc.target/powerpc/unpack-vectorize-3.h: New test.
	* gcc.target/powerpc/unpack-vectorize-run-1.c: New test.
	* gcc.target/powerpc/unpack-vectorize-run-2.c: New test.
	* gcc.target/powerpc/unpack-vectorize-run-3.c: New test.
	* gcc.target/powerpc/unpack-vectorize.h: New test.


[-- Attachment #2: unpack-v2.diff --]
[-- Type: text/plain, Size: 17944 bytes --]

---
 gcc/config/rs6000/altivec.md                  | 158 ++++--------------
 .../gcc.target/powerpc/unpack-vectorize-1.c   |  18 ++
 .../gcc.target/powerpc/unpack-vectorize-1.h   |  14 ++
 .../gcc.target/powerpc/unpack-vectorize-2.c   |  12 ++
 .../gcc.target/powerpc/unpack-vectorize-2.h   |   7 +
 .../gcc.target/powerpc/unpack-vectorize-3.c   |  11 ++
 .../gcc.target/powerpc/unpack-vectorize-3.h   |   7 +
 .../powerpc/unpack-vectorize-run-1.c          |  24 +++
 .../powerpc/unpack-vectorize-run-2.c          |  16 ++
 .../powerpc/unpack-vectorize-run-3.c          |  16 ++
 .../gcc.target/powerpc/unpack-vectorize.h     |  42 +++++
 11 files changed, 196 insertions(+), 129 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index d70c17e6bc2..5a4a824804b 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -134,10 +134,8 @@ (define_c_enum "unspec"
    UNSPEC_VMULWLUH
    UNSPEC_VMULWHSH
    UNSPEC_VMULWLSH
-   UNSPEC_VUPKHUB
-   UNSPEC_VUPKHUH
-   UNSPEC_VUPKLUB
-   UNSPEC_VUPKLUH
+   UNSPEC_VUPKHU
+   UNSPEC_VUPKLU
    UNSPEC_VPERMSI
    UNSPEC_VPERMHI
    UNSPEC_INTERHI
@@ -3885,143 +3883,45 @@ (define_insn "xxeval"
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
-(define_expand "vec_unpacku_hi_v16qi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
-                     UNSPEC_VUPKHUB))]
-  "TARGET_ALTIVEC"      
-{  
-  rtx vzero = gen_reg_rtx (V8HImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
-   
-  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
-   
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
-  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
-
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
-  DONE;
-})
-
-(define_expand "vec_unpacku_hi_v8hi"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
-                     UNSPEC_VUPKHUH))]
+(define_expand "vec_unpacku_hi_<VP_small_lc>"
+  [(set (match_operand:VP 0 "register_operand" "=v")
+        (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+         UNSPEC_VUPKHU))]
   "TARGET_ALTIVEC"
 {
-  rtx vzero = gen_reg_rtx (V4SImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
+  rtx vzero = gen_reg_rtx (<VP_small>mode);
+  emit_insn (gen_altivec_vspltis<VU_char> (vzero, const0_rtx));
 
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
- 
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 :  6);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  0 : 17);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 :  4);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ?  2 : 17);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
-  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 :  2);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ?  4 : 17);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  0);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ?  6 : 17);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
-
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
-  DONE;
-})
+  rtx res = gen_reg_rtx (<VP_small>mode);
+  rtx op1 = operands[1];
 
-(define_expand "vec_unpacku_lo_v16qi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
-                     UNSPEC_VUPKLUB))]
-  "TARGET_ALTIVEC"
-{
-  rtx vzero = gen_reg_rtx (V8HImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
-
-  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
-
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  8 : 16);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
-  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  8);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1));
+  else
+    emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero));
 
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
+  emit_insn (gen_move_insn (operands[0], gen_lowpart (<MODE>mode, res)));
   DONE;
 })
 
-(define_expand "vec_unpacku_lo_v8hi"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
-                     UNSPEC_VUPKLUH))]
+(define_expand "vec_unpacku_lo_<VP_small_lc>"
+  [(set (match_operand:VP 0 "register_operand" "=v")
+        (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+         UNSPEC_VUPKLU))]
   "TARGET_ALTIVEC"
 {
-  rtx vzero = gen_reg_rtx (V4SImode);
-  rtx mask = gen_reg_rtx (V16QImode);
-  rtvec v = rtvec_alloc (16);
-  bool be = BYTES_BIG_ENDIAN;
+  rtx vzero = gen_reg_rtx (<VP_small>mode);
+  emit_insn (gen_altivec_vspltis<VU_char> (vzero, const0_rtx));
 
-  emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
- 
-  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
-  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
-  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ?  8 : 17);
-  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  9 : 16);
-  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
-  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
-  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
-  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
-  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
-  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
-  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
-  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
-  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  9);
-  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 :  8);
-  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
-  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
+  rtx res = gen_reg_rtx (<VP_small>mode);
+  rtx op1 = operands[1];
 
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
+  if (BYTES_BIG_ENDIAN)
+    emit_insn (gen_altivec_vmrgl<VU_char> (res, vzero, op1));
+  else
+    emit_insn (gen_altivec_vmrgh<VU_char> (res, op1, vzero));
+
+  emit_insn (gen_move_insn (operands[0], gen_lowpart (<MODE>mode, res)));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
new file mode 100644
index 00000000000..dceb5b89bd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */
+
+/* Test if unpack vectorization succeeds for type signed/unsigned
+   short and char.  */
+
+#include "unpack-vectorize-1.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
+/* { dg-final { scan-assembler-times {\mvupkhsb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvupklsb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvupkhsh\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvupklsh\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvmrghb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvmrglb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvmrghh\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvmrglh\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
new file mode 100644
index 00000000000..1cb89aba392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
@@ -0,0 +1,14 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (si)
+DEF_ARR (ui)
+DEF_ARR (sh)
+DEF_ARR (uh)
+DEF_ARR (sc)
+DEF_ARR (uc)
+
+TEST1 (sh, si)
+TEST1 (uh, ui)
+TEST1 (sc, sh)
+TEST1 (uc, uh)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
new file mode 100644
index 00000000000..4f2e6ebb07b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */
+
+/* Test if unsigned int unpack vectorization succeeds.  V2DImode is
+   supported since Power7 so guard it under Power7 and up.  */
+
+#include "unpack-vectorize-2.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-assembler-times {\mxxmrghw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmrglw\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
new file mode 100644
index 00000000000..e199229e6f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
@@ -0,0 +1,7 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (ui)
+DEF_ARR (ull)
+
+TEST1 (ui, ull)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
new file mode 100644
index 00000000000..520a279ac1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */
+
+/* Test if signed int unpack vectorization succeeds.  */
+
+#include "unpack-vectorize-3.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-assembler-times {\mvupkhsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvupklsw\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
new file mode 100644
index 00000000000..6a5191d28a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
@@ -0,0 +1,7 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (si)
+DEF_ARR (sll)
+
+TEST1 (si, sll)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
new file mode 100644
index 00000000000..51f0e67524f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-1.h"
+
+/* Test if unpack vectorization cases on signed/unsigned short and char
+   run successfully.  */
+
+CHECK1 (sh, si)
+CHECK1 (uh, ui)
+CHECK1 (sc, sh)
+CHECK1 (uc, uh)
+
+int
+main ()
+{
+  check1_sh_si ();
+  check1_uh_ui ();
+  check1_sc_sh ();
+  check1_uc_uh ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
new file mode 100644
index 00000000000..6d243602bbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-2.h"
+
+/* Test if unpack vectorization cases on unsigned int run successfully.  */
+
+CHECK1 (ui, ull)
+
+int
+main ()
+{
+  check1_ui_ull ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
new file mode 100644
index 00000000000..fec33c46abc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-3.h"
+
+/* Test if unpack vectorization cases on signed int run successfully.  */
+
+CHECK1 (si, sll)
+
+int
+main ()
+{
+  check1_si_sll ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
new file mode 100644
index 00000000000..11fa7d4aa6f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
@@ -0,0 +1,42 @@
+typedef signed long long sll;
+typedef unsigned long long ull;
+typedef signed int si;
+typedef unsigned int ui;
+typedef signed short sh;
+typedef unsigned short uh;
+typedef signed char sc;
+typedef unsigned char uc;
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#define ALIGN_ATTR __attribute__((__aligned__(ALIGN)))
+
+#define N 128
+
+#define DEF_ARR(TYPE)                                                         \
+  TYPE TYPE##_a[N] ALIGN_ATTR;                                                \
+  TYPE TYPE##_b[N] ALIGN_ATTR;                                                \
+  TYPE TYPE##_c[N] ALIGN_ATTR;
+
+#define TEST1(NTYPE, WTYPE)                                                    \
+  __attribute__((noipa)) void test1_##NTYPE##_##WTYPE() {                      \
+    for (int i = 0; i < N; i++)                                                \
+      WTYPE##_c[i] = NTYPE##_a[i] + NTYPE##_b[i];                              \
+  }
+
+#define CHECK1(NTYPE, WTYPE)                                                   \
+  __attribute__((noipa, optimize(0))) void check1_##NTYPE##_##WTYPE() {        \
+    for (int i = 0; i < N; i++) {                                              \
+      NTYPE##_a[i] = 2 * i * sizeof(NTYPE) + 10;                               \
+      NTYPE##_b[i] = 7 * i * sizeof(NTYPE) / 5 - 10;                           \
+    }                                                                          \
+    test1_##NTYPE##_##WTYPE();                                                 \
+    for (int i = 0; i < N; i++) {                                              \
+      WTYPE exp = NTYPE##_a[i] + NTYPE##_b[i];                                 \
+      if (WTYPE##_c[i] != exp)                                                 \
+        __builtin_abort();                                                     \
+    }                                                                          \
+  }
+
-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] rs6000: Add vec_unpacku_{hi,lo}_v4si
  2021-08-09  2:53   ` [PATCH v2] " Kewen.Lin
@ 2021-08-24 13:02     ` Segher Boessenkool
  2021-08-25  4:48       ` Kewen.Lin
  0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2021-08-24 13:02 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: wschmidt, David Edelsohn, GCC Patches

Hi Ke Wen,

On Mon, Aug 09, 2021 at 10:53:00AM +0800, Kewen.Lin wrote:
> on 2021/8/6 下午9:10, Bill Schmidt wrote:
> > On 8/4/21 9:06 PM, Kewen.Lin wrote:
> >> The existing vec_unpacku_{hi,lo} supports emulated unsigned
> >> unpacking for short and char but misses the support for int.
> >> This patch adds the support for vec_unpacku_{hi,lo}_v4si.

> 	* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
> 	(vec_unpacku_hi_v8hi): Likewise.
> 	(vec_unpacku_lo_v16qi): Likewise.
> 	(vec_unpacku_lo_v8hi): Likewise.
> 	(vec_unpacku_hi_<VP_small_lc>): New define_expand.
> 	(vec_unpacku_lo_<VP_small_lc>): Likewise.

> -(define_expand "vec_unpacku_hi_v16qi"
> -  [(set (match_operand:V8HI 0 "register_operand" "=v")
> -        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
> -                     UNSPEC_VUPKHUB))]
> -  "TARGET_ALTIVEC"      
> -{  
> -  rtx vzero = gen_reg_rtx (V8HImode);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -  rtvec v = rtvec_alloc (16);
> -  bool be = BYTES_BIG_ENDIAN;
> -   
> -  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
> -   
> -  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
> -  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
> -  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
> -  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
> -  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
> -  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
> -  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
> -  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
> -  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
> -  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> -  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
> -  DONE;
> -})

So I wonder if all this still generates good code.  The unspecs cannot
be optimised properly, the RTL can (in principle, anyway: it is possible
it makes more opportunities to use unpack etc. insns invisible than that
it helps over unspec.  This needs to be tested, and the usual idioms
need testcases, is that what you add here?  (/me reads on...)

> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1));
> +  else
> +    emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero));

Ah, so it is *not* using unspecs?  Excellent.

Okay for trunk.  Thank you!


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] rs6000: Add vec_unpacku_{hi,lo}_v4si
  2021-08-24 13:02     ` Segher Boessenkool
@ 2021-08-25  4:48       ` Kewen.Lin
  0 siblings, 0 replies; 6+ messages in thread
From: Kewen.Lin @ 2021-08-25  4:48 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: wschmidt, David Edelsohn, GCC Patches

on 2021/8/24 下午9:02, Segher Boessenkool wrote:
> Hi Ke Wen,
> 
> On Mon, Aug 09, 2021 at 10:53:00AM +0800, Kewen.Lin wrote:
>> on 2021/8/6 下午9:10, Bill Schmidt wrote:
>>> On 8/4/21 9:06 PM, Kewen.Lin wrote:
>>>> The existing vec_unpacku_{hi,lo} supports emulated unsigned
>>>> unpacking for short and char but misses the support for int.
>>>> This patch adds the support for vec_unpacku_{hi,lo}_v4si.
> 
>> 	* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
>> 	(vec_unpacku_hi_v8hi): Likewise.
>> 	(vec_unpacku_lo_v16qi): Likewise.
>> 	(vec_unpacku_lo_v8hi): Likewise.
>> 	(vec_unpacku_hi_<VP_small_lc>): New define_expand.
>> 	(vec_unpacku_lo_<VP_small_lc>): Likewise.
> 
>> -(define_expand "vec_unpacku_hi_v16qi"
>> -  [(set (match_operand:V8HI 0 "register_operand" "=v")
>> -        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
>> -                     UNSPEC_VUPKHUB))]
>> -  "TARGET_ALTIVEC"      
>> -{  
>> -  rtx vzero = gen_reg_rtx (V8HImode);
>> -  rtx mask = gen_reg_rtx (V16QImode);
>> -  rtvec v = rtvec_alloc (16);
>> -  bool be = BYTES_BIG_ENDIAN;
>> -   
>> -  emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
>> -   
>> -  RTVEC_ELT (v,  0) = gen_rtx_CONST_INT (QImode, be ? 16 :  7);
>> -  RTVEC_ELT (v,  1) = gen_rtx_CONST_INT (QImode, be ?  0 : 16);
>> -  RTVEC_ELT (v,  2) = gen_rtx_CONST_INT (QImode, be ? 16 :  6);
>> -  RTVEC_ELT (v,  3) = gen_rtx_CONST_INT (QImode, be ?  1 : 16);
>> -  RTVEC_ELT (v,  4) = gen_rtx_CONST_INT (QImode, be ? 16 :  5);
>> -  RTVEC_ELT (v,  5) = gen_rtx_CONST_INT (QImode, be ?  2 : 16);
>> -  RTVEC_ELT (v,  6) = gen_rtx_CONST_INT (QImode, be ? 16 :  4);
>> -  RTVEC_ELT (v,  7) = gen_rtx_CONST_INT (QImode, be ?  3 : 16);
>> -  RTVEC_ELT (v,  8) = gen_rtx_CONST_INT (QImode, be ? 16 :  3);
>> -  RTVEC_ELT (v,  9) = gen_rtx_CONST_INT (QImode, be ?  4 : 16);
>> -  RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 :  2);
>> -  RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ?  5 : 16);
>> -  RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 :  1);
>> -  RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ?  6 : 16);
>> -  RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 :  0);
>> -  RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ?  7 : 16);
>> -
>> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>> -  emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
>> -  DONE;
>> -})
> 
> So I wonder if all this still generates good code.  The unspecs cannot
> be optimised properly, the RTL can (in principle, anyway: it is possible
> it makes more opportunities to use unpack etc. insns invisible than that
> it helps over unspec.  This needs to be tested, and the usual idioms
> need testcases, is that what you add here?  (/me reads on...)
> 

Yeah, for existing char/short, it generates better codes with vector
merging high/low instead of permutation, by saving the cost for the
permutation control vector (space in constant area as well as the cost
to initialize it in prologue).  The iterator writing makes it concise
and also add the missing "int" support.  The associated test cases are
to verify new generated assembly and runtime result.

>> +  if (BYTES_BIG_ENDIAN)
>> +    emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1));
>> +  else
>> +    emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero));
> 
> Ah, so it is *not* using unspecs?  Excellent.
> 
> Okay for trunk.  Thank you!
> 

Thanks for the review!  Committed in r12-3134.


BR,
Kewen

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-25  4:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05  2:06 [PATCH] rs6000: Add vec_unpacku_{hi,lo}_v4si Kewen.Lin
2021-08-06 13:10 ` Bill Schmidt
2021-08-06 17:37   ` Segher Boessenkool
2021-08-09  2:53   ` [PATCH v2] " Kewen.Lin
2021-08-24 13:02     ` Segher Boessenkool
2021-08-25  4:48       ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).