From: "Kewen.Lin" <linkw@linux.ibm.com>
To: wschmidt@linux.ibm.com
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
David Edelsohn <dje.gcc@gmail.com>,
GCC Patches <gcc-patches@gcc.gnu.org>
Subject: [PATCH v2] rs6000: Add vec_unpacku_{hi,lo}_v4si
Date: Mon, 9 Aug 2021 10:53:00 +0800 [thread overview]
Message-ID: <b674cc8d-e14e-5d01-4207-abd7f65df49d@linux.ibm.com> (raw)
In-Reply-To: <48675437-01a4-af04-3cb0-8f68c675b9eb@linux.ibm.com>
[-- Attachment #1: Type: text/plain, Size: 4639 bytes --]
Hi Bill,
Thanks for the comments!
on 2021/8/6 下午9:10, Bill Schmidt wrote:
> Hi Kewen,
>
> On 8/4/21 9:06 PM, Kewen.Lin wrote:
>> Hi,
>>
>> The existing vec_unpacku_{hi,lo} supports emulated unsigned
>> unpacking for short and char but misses the support for int.
>> This patch adds the support for vec_unpacku_{hi,lo}_v4si.
>>
>> Meanwhile, the current implementation uses vector permutation
>> way, which requires one extra customized constant vector as
>> the permutation control vector. It's better to use vector
>> merge high/low with zero constant vector, to save the space
>> in constant area as well as the cost to initialize pcv in
>> prologue. This patch updates it with vector merging and
>> simplify it with iterators.
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8.
>>
>> btw, the loop in unpack-vectorize-2.c doesn't get vectorized
>> without this patch, unpack-vectorize-[13]* is to verify
>> the vector merging and simplification works expectedly.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
...
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index d70c17e6bc2..0e8b66cd6a5 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -134,10 +134,8 @@ (define_c_enum "unspec"
>> UNSPEC_VMULWLUH
>> UNSPEC_VMULWHSH
>> UNSPEC_VMULWLSH
>> - UNSPEC_VUPKHUB
>> - UNSPEC_VUPKHUH
>> - UNSPEC_VUPKLUB
>> - UNSPEC_VUPKLUH
>> + UNSPEC_VUPKHUBHW
>> + UNSPEC_VUPKLUBHW
>
>
> Up to you, but... maybe just UNSPEC_VUPKHU and UNSPEC_VUPKLU, in case we extend this later to other types. Fine either way.
>
Good point! Fixed.
>> UNSPEC_VPERMSI
>> UNSPEC_VPERMHI
>> UNSPEC_INTERHI
>> @@ -3885,143 +3883,45 @@ (define_insn "xxeval"
>> [(set_attr "type" "vecsimple")
>> (set_attr "prefixed" "yes")])
>>
...
>> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
>> new file mode 100644
>> index 00000000000..2621d753baa
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
>> @@ -0,0 +1,18 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
>
>
> I guess powerpc_altivec_ok is fine. I was initially concerned since unpack-vectorize.h mentions vector long long, but the types aren't actually used here. OK.
>
Yeah, I think it's fine since unpack-vectorize.h only typedef long long and it doesn't
even have type vector long long.
>> +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */
>> +
>> +/* Test if unpack vectorization succeeds for type signed/unsigned
>> + short and char. */
>> +
>> +#include "unpack-vectorize-1.h"
>> +
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
>> +/* { dg-final { scan-assembler {\mvupkhsb\M} } } */
>> +/* { dg-final { scan-assembler {\mvupklsb\M} } } */
>> +/* { dg-final { scan-assembler {\mvupkhsh\M} } } */
>> +/* { dg-final { scan-assembler {\mvupklsh\M} } } */
>> +/* { dg-final { scan-assembler {\mvmrghb\M} } } */
>> +/* { dg-final { scan-assembler {\mvmrglb\M} } } */
>> +/* { dg-final { scan-assembler {\mvmrghh\M} } } */
>> +/* { dg-final { scan-assembler {\mvmrglh\M} } } */
>
>
> Suggest that you consider scan-assembler-times 1 to make the tests more robust, here and for other tests.
>
Updated, thanks! I was worried that possible future unrolling tweaking can make
the hardcoded times fragile and thought it might be trivial to check the times.
"-fno-unroll-loops" has been added to disable unrolling explicitly as well.
Re-tested on BE and LE, the test results looks fine.
BR,
Kewen
-----
gcc/ChangeLog:
* config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove.
(vec_unpacku_hi_v8hi): Likewise.
(vec_unpacku_lo_v16qi): Likewise.
(vec_unpacku_lo_v8hi): Likewise.
(vec_unpacku_hi_<VP_small_lc>): New define_expand.
(vec_unpacku_lo_<VP_small_lc>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/unpack-vectorize-1.c: New test.
* gcc.target/powerpc/unpack-vectorize-1.h: New test.
* gcc.target/powerpc/unpack-vectorize-2.c: New test.
* gcc.target/powerpc/unpack-vectorize-2.h: New test.
* gcc.target/powerpc/unpack-vectorize-3.c: New test.
* gcc.target/powerpc/unpack-vectorize-3.h: New test.
* gcc.target/powerpc/unpack-vectorize-run-1.c: New test.
* gcc.target/powerpc/unpack-vectorize-run-2.c: New test.
* gcc.target/powerpc/unpack-vectorize-run-3.c: New test.
* gcc.target/powerpc/unpack-vectorize.h: New test.
[-- Attachment #2: unpack-v2.diff --]
[-- Type: text/plain, Size: 17944 bytes --]
---
gcc/config/rs6000/altivec.md | 158 ++++--------------
.../gcc.target/powerpc/unpack-vectorize-1.c | 18 ++
.../gcc.target/powerpc/unpack-vectorize-1.h | 14 ++
.../gcc.target/powerpc/unpack-vectorize-2.c | 12 ++
.../gcc.target/powerpc/unpack-vectorize-2.h | 7 +
.../gcc.target/powerpc/unpack-vectorize-3.c | 11 ++
.../gcc.target/powerpc/unpack-vectorize-3.h | 7 +
.../powerpc/unpack-vectorize-run-1.c | 24 +++
.../powerpc/unpack-vectorize-run-2.c | 16 ++
.../powerpc/unpack-vectorize-run-3.c | 16 ++
.../gcc.target/powerpc/unpack-vectorize.h | 42 +++++
11 files changed, 196 insertions(+), 129 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index d70c17e6bc2..5a4a824804b 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -134,10 +134,8 @@ (define_c_enum "unspec"
UNSPEC_VMULWLUH
UNSPEC_VMULWHSH
UNSPEC_VMULWLSH
- UNSPEC_VUPKHUB
- UNSPEC_VUPKHUH
- UNSPEC_VUPKLUB
- UNSPEC_VUPKLUH
+ UNSPEC_VUPKHU
+ UNSPEC_VUPKLU
UNSPEC_VPERMSI
UNSPEC_VPERMHI
UNSPEC_INTERHI
@@ -3885,143 +3883,45 @@ (define_insn "xxeval"
[(set_attr "type" "vecsimple")
(set_attr "prefixed" "yes")])
-(define_expand "vec_unpacku_hi_v16qi"
- [(set (match_operand:V8HI 0 "register_operand" "=v")
- (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
- UNSPEC_VUPKHUB))]
- "TARGET_ALTIVEC"
-{
- rtx vzero = gen_reg_rtx (V8HImode);
- rtx mask = gen_reg_rtx (V16QImode);
- rtvec v = rtvec_alloc (16);
- bool be = BYTES_BIG_ENDIAN;
-
- emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
-
- RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7);
- RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16);
- RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6);
- RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16);
- RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5);
- RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16);
- RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4);
- RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16);
- RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3);
- RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16);
- RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2);
- RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16);
- RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1);
- RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16);
- RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0);
- RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16);
-
- emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
- emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
- DONE;
-})
-
-(define_expand "vec_unpacku_hi_v8hi"
- [(set (match_operand:V4SI 0 "register_operand" "=v")
- (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
- UNSPEC_VUPKHUH))]
+(define_expand "vec_unpacku_hi_<VP_small_lc>"
+ [(set (match_operand:VP 0 "register_operand" "=v")
+ (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+ UNSPEC_VUPKHU))]
"TARGET_ALTIVEC"
{
- rtx vzero = gen_reg_rtx (V4SImode);
- rtx mask = gen_reg_rtx (V16QImode);
- rtvec v = rtvec_alloc (16);
- bool be = BYTES_BIG_ENDIAN;
+ rtx vzero = gen_reg_rtx (<VP_small>mode);
+ emit_insn (gen_altivec_vspltis<VU_char> (vzero, const0_rtx));
- emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
-
- RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7);
- RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 6);
- RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 0 : 17);
- RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16);
- RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5);
- RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 4);
- RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 2 : 17);
- RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16);
- RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3);
- RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 2);
- RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 4 : 17);
- RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16);
- RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1);
- RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 0);
- RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 6 : 17);
- RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16);
-
- emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
- emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
- DONE;
-})
+ rtx res = gen_reg_rtx (<VP_small>mode);
+ rtx op1 = operands[1];
-(define_expand "vec_unpacku_lo_v16qi"
- [(set (match_operand:V8HI 0 "register_operand" "=v")
- (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")]
- UNSPEC_VUPKLUB))]
- "TARGET_ALTIVEC"
-{
- rtx vzero = gen_reg_rtx (V8HImode);
- rtx mask = gen_reg_rtx (V16QImode);
- rtvec v = rtvec_alloc (16);
- bool be = BYTES_BIG_ENDIAN;
-
- emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
-
- RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
- RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 8 : 16);
- RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
- RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16);
- RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
- RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
- RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
- RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
- RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
- RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
- RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
- RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
- RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9);
- RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
- RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 8);
- RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrgh<VU_char> (res, vzero, op1));
+ else
+ emit_insn (gen_altivec_vmrgl<VU_char> (res, op1, vzero));
- emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
- emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
+ emit_insn (gen_move_insn (operands[0], gen_lowpart (<MODE>mode, res)));
DONE;
})
-(define_expand "vec_unpacku_lo_v8hi"
- [(set (match_operand:V4SI 0 "register_operand" "=v")
- (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")]
- UNSPEC_VUPKLUH))]
+(define_expand "vec_unpacku_lo_<VP_small_lc>"
+ [(set (match_operand:VP 0 "register_operand" "=v")
+ (unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
+ UNSPEC_VUPKLU))]
"TARGET_ALTIVEC"
{
- rtx vzero = gen_reg_rtx (V4SImode);
- rtx mask = gen_reg_rtx (V16QImode);
- rtvec v = rtvec_alloc (16);
- bool be = BYTES_BIG_ENDIAN;
+ rtx vzero = gen_reg_rtx (<VP_small>mode);
+ emit_insn (gen_altivec_vspltis<VU_char> (vzero, const0_rtx));
- emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
-
- RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
- RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
- RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 8 : 17);
- RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16);
- RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
- RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
- RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
- RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
- RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
- RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
- RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
- RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
- RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9);
- RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 8);
- RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
- RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
+ rtx res = gen_reg_rtx (<VP_small>mode);
+ rtx op1 = operands[1];
- emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
- emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrgl<VU_char> (res, vzero, op1));
+ else
+ emit_insn (gen_altivec_vmrgh<VU_char> (res, op1, vzero));
+
+ emit_insn (gen_move_insn (operands[0], gen_lowpart (<MODE>mode, res)));
DONE;
})
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
new file mode 100644
index 00000000000..dceb5b89bd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */
+
+/* Test if unpack vectorization succeeds for type signed/unsigned
+ short and char. */
+
+#include "unpack-vectorize-1.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */
+/* { dg-final { scan-assembler-times {\mvupkhsb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvupklsb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvupkhsh\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvupklsh\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvmrghb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvmrglb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvmrghh\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvmrglh\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
new file mode 100644
index 00000000000..1cb89aba392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h
@@ -0,0 +1,14 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (si)
+DEF_ARR (ui)
+DEF_ARR (sh)
+DEF_ARR (uh)
+DEF_ARR (sc)
+DEF_ARR (uc)
+
+TEST1 (sh, si)
+TEST1 (uh, ui)
+TEST1 (sc, sh)
+TEST1 (uc, uh)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
new file mode 100644
index 00000000000..4f2e6ebb07b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */
+
+/* Test if unsigned int unpack vectorization succeeds. V2DImode is
+ supported since Power7 so guard it under Power7 and up. */
+
+#include "unpack-vectorize-2.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-assembler-times {\mxxmrghw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmrglw\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
new file mode 100644
index 00000000000..e199229e6f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h
@@ -0,0 +1,7 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (ui)
+DEF_ARR (ull)
+
+TEST1 (ui, ull)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
new file mode 100644
index 00000000000..520a279ac1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */
+
+/* Test if signed int unpack vectorization succeeds. */
+
+#include "unpack-vectorize-3.h"
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { scan-assembler-times {\mvupkhsw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvupklsw\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
new file mode 100644
index 00000000000..6a5191d28a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h
@@ -0,0 +1,7 @@
+#include "unpack-vectorize.h"
+
+DEF_ARR (si)
+DEF_ARR (sll)
+
+TEST1 (si, sll)
+
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
new file mode 100644
index 00000000000..51f0e67524f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-1.h"
+
+/* Test if unpack vectorization cases on signed/unsigned short and char
+ run successfully. */
+
+CHECK1 (sh, si)
+CHECK1 (uh, ui)
+CHECK1 (sc, sh)
+CHECK1 (uc, uh)
+
+int
+main ()
+{
+ check1_sh_si ();
+ check1_uh_ui ();
+ check1_sc_sh ();
+ check1_uc_uh ();
+
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
new file mode 100644
index 00000000000..6d243602bbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-2.h"
+
+/* Test if unpack vectorization cases on unsigned int run successfully. */
+
+CHECK1 (ui, ull)
+
+int
+main ()
+{
+ check1_ui_ull ();
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
new file mode 100644
index 00000000000..fec33c46abc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include "unpack-vectorize-3.h"
+
+/* Test if unpack vectorization cases on signed int run successfully. */
+
+CHECK1 (si, sll)
+
+int
+main ()
+{
+ check1_si_sll ();
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
new file mode 100644
index 00000000000..11fa7d4aa6f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h
@@ -0,0 +1,42 @@
+typedef signed long long sll;
+typedef unsigned long long ull;
+typedef signed int si;
+typedef unsigned int ui;
+typedef signed short sh;
+typedef unsigned short uh;
+typedef signed char sc;
+typedef unsigned char uc;
+
+#ifndef ALIGN
+#define ALIGN 32
+#endif
+
+#define ALIGN_ATTR __attribute__((__aligned__(ALIGN)))
+
+#define N 128
+
+#define DEF_ARR(TYPE) \
+ TYPE TYPE##_a[N] ALIGN_ATTR; \
+ TYPE TYPE##_b[N] ALIGN_ATTR; \
+ TYPE TYPE##_c[N] ALIGN_ATTR;
+
+#define TEST1(NTYPE, WTYPE) \
+ __attribute__((noipa)) void test1_##NTYPE##_##WTYPE() { \
+ for (int i = 0; i < N; i++) \
+ WTYPE##_c[i] = NTYPE##_a[i] + NTYPE##_b[i]; \
+ }
+
+#define CHECK1(NTYPE, WTYPE) \
+ __attribute__((noipa, optimize(0))) void check1_##NTYPE##_##WTYPE() { \
+ for (int i = 0; i < N; i++) { \
+ NTYPE##_a[i] = 2 * i * sizeof(NTYPE) + 10; \
+ NTYPE##_b[i] = 7 * i * sizeof(NTYPE) / 5 - 10; \
+ } \
+ test1_##NTYPE##_##WTYPE(); \
+ for (int i = 0; i < N; i++) { \
+ WTYPE exp = NTYPE##_a[i] + NTYPE##_b[i]; \
+ if (WTYPE##_c[i] != exp) \
+ __builtin_abort(); \
+ } \
+ }
+
--
2.17.1
next prev parent reply other threads:[~2021-08-09 2:53 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-05 2:06 [PATCH] " Kewen.Lin
2021-08-06 13:10 ` Bill Schmidt
2021-08-06 17:37 ` Segher Boessenkool
2021-08-09 2:53 ` Kewen.Lin [this message]
2021-08-24 13:02 ` [PATCH v2] " Segher Boessenkool
2021-08-25 4:48 ` Kewen.Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b674cc8d-e14e-5d01-4207-abd7f65df49d@linux.ibm.com \
--to=linkw@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).