* [Patch AArch64] Implement Vector Permute Support
@ 2012-12-04 10:31 James Greenhalgh
2012-12-04 10:36 ` [Patch AArch64] Add zip{1, 2}, uzp{1, 2}, trn{1, 2} support for vector permute James Greenhalgh
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: James Greenhalgh @ 2012-12-04 10:31 UTC (permalink / raw)
To: gcc-patches; +Cc: marcus.shawcroft
[-- Attachment #1: Type: text/plain, Size: 1980 bytes --]
Hi,
This patch adds support for Vector Shuffle style operations
through support for TARGET_VECTORIZE_VEC_PERM_CONST_OK and
the vec_perm and vec_perm_const standard patterns.
In this patch we add the framework and support for the
generic tbl instruction. This can be used to handle any
vector permute operation, but we can do a better job for
some special cases. The second patch of this series does
that better job for the ZIP, UZP and TRN instructions.
Is this OK to commit?
Thanks,
James Greenhalgh
---
gcc/
2012-12-04 James Greenhalgh <james.greenhalgh@arm.com>
* config/aarch64/aarch64-protos.h
(aarch64_split_combinev16qi): New.
(aarch64_expand_vec_perm): Likewise.
(aarch64_expand_vec_perm_const): Likewise.
* config/aarch64/aarch64-simd.md (vec_perm_const<mode>): New.
(vec_perm<mode>): Likewise.
(aarch64_tbl1<mode>): Likewise.
(aarch64_tbl2v16qi): Likewise.
(aarch64_combinev16qi): New.
* config/aarch64/aarch64.c
(aarch64_vectorize_vec_perm_const_ok): New.
(aarch64_split_combinev16qi): Likewise.
(MAX_VECT_LEN): Define.
(expand_vec_perm_d): New.
(aarch64_expand_vec_perm_1): Likewise.
(aarch64_expand_vec_perm): Likewise.
(aarch64_evpc_tbl): Likewise.
(aarch64_expand_vec_perm_const_1): Likewise.
(aarch64_expand_vec_perm_const): Likewise.
(aarch64_vectorize_vec_perm_const_ok): Likewise.
(TARGET_VECTORIZE_VEC_PERM_CONST_OK): Likewise.
* config/aarch64/iterators.md
(unspec): Add UNSPEC_TBL, UNSPEC_CONCAT.
(V_cmp_result): Add mapping for V2DF.
gcc/testsuite/
2012-12-04 James Greenhalgh <james.greenhalgh@arm.com>
* lib/target-supports.exp
(check_effective_target_vect_perm): Allow aarch64*-*-*.
(check_effective_target_vect_perm_byte): Likewise.
(check_effective_target_vect_perm_short): Likewise.
(check_effective_target_vect_char_mult): Likewise.
(check_effective_target_vect_extract_even_odd): Likewise.
(check_effective_target_vect_interleave): Likewise.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Patch-AArch64-Implement-Vector-Permute-Support.patch --]
[-- Type: text/x-patch; name=0001-Patch-AArch64-Implement-Vector-Permute-Support.patch, Size: 16004 bytes --]
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ab84257..7b72ead 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -236,4 +236,9 @@ rtx aarch64_expand_builtin (tree exp,
int ignore ATTRIBUTE_UNUSED);
tree aarch64_builtin_decl (unsigned, bool ATTRIBUTE_UNUSED);
+extern void aarch64_split_combinev16qi (rtx operands[3]);
+extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
+extern bool
+aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
+
#endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b3d01c1..2b0c8d6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3298,6 +3298,74 @@
;; Permuted-store expanders for neon intrinsics.
+;; Permute instructions
+
+;; vec_perm support
+
+(define_expand "vec_perm_const<mode>"
+ [(match_operand:VALL 0 "register_operand")
+ (match_operand:VALL 1 "register_operand")
+ (match_operand:VALL 2 "register_operand")
+ (match_operand:<V_cmp_result> 3)]
+ "TARGET_SIMD"
+{
+ if (aarch64_expand_vec_perm_const (operands[0], operands[1],
+ operands[2], operands[3]))
+ DONE;
+ else
+ FAIL;
+})
+
+(define_expand "vec_perm<mode>"
+ [(match_operand:VB 0 "register_operand")
+ (match_operand:VB 1 "register_operand")
+ (match_operand:VB 2 "register_operand")
+ (match_operand:VB 3 "register_operand")]
+ "TARGET_SIMD"
+{
+ aarch64_expand_vec_perm (operands[0], operands[1],
+ operands[2], operands[3]);
+ DONE;
+})
+
+(define_insn "aarch64_tbl1<mode>"
+ [(set (match_operand:VB 0 "register_operand" "=w")
+ (unspec:VB [(match_operand:V16QI 1 "register_operand" "w")
+ (match_operand:VB 2 "register_operand" "w")]
+ UNSPEC_TBL))]
+ "TARGET_SIMD"
+ "tbl\\t%0.<Vtype>, {%1.16b}, %2.<Vtype>"
+ [(set_attr "simd_type" "simd_tbl")
+ (set_attr "simd_mode" "<MODE>")]
+)
+
+;; Two source registers.
+
+(define_insn "aarch64_tbl2v16qi"
+ [(set (match_operand:V16QI 0 "register_operand" "=w")
+ (unspec:V16QI [(match_operand:OI 1 "register_operand" "w")
+ (match_operand:V16QI 2 "register_operand" "w")]
+ UNSPEC_TBL))]
+ "TARGET_SIMD"
+ "tbl\\t%0.16b, {%S1.16b - %T1.16b}, %2.16b"
+ [(set_attr "simd_type" "simd_tbl")
+ (set_attr "simd_mode" "V16QI")]
+)
+
+(define_insn_and_split "aarch64_combinev16qi"
+ [(set (match_operand:OI 0 "register_operand" "=w")
+ (unspec:OI [(match_operand:V16QI 1 "register_operand" "w")
+ (match_operand:V16QI 2 "register_operand" "w")]
+ UNSPEC_CONCAT))]
+ "TARGET_SIMD"
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+{
+ aarch64_split_combinev16qi (operands);
+ DONE;
+})
+
(define_insn "aarch64_st2<mode>_dreg"
[(set (match_operand:TI 0 "aarch64_simd_struct_operand" "=Utv")
(unspec:TI [(match_operand:OI 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f262ef9..cebc8cb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -110,6 +110,9 @@ static unsigned bit_count (unsigned HOST_WIDE_INT);
static bool aarch64_const_vec_all_same_int_p (rtx,
HOST_WIDE_INT, HOST_WIDE_INT);
+static bool aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
+ const unsigned char *sel);
+
/* The processor for which instructions should be scheduled. */
enum aarch64_processor aarch64_tune = generic;
@@ -6678,6 +6681,292 @@ aarch64_c_mode_for_suffix (char suffix)
return VOIDmode;
}
+/* Split operands into moves from op[1] + op[2] into op[0]. */
+
+void
+aarch64_split_combinev16qi (rtx operands[3])
+{
+ unsigned int dest = REGNO (operands[0]);
+ unsigned int src1 = REGNO (operands[1]);
+ unsigned int src2 = REGNO (operands[2]);
+ enum machine_mode halfmode = GET_MODE (operands[1]);
+ unsigned int halfregs = HARD_REGNO_NREGS (src1, halfmode);
+ rtx destlo, desthi;
+
+ gcc_assert (halfmode == V16QImode);
+
+ if (src1 == dest && src2 == dest + halfregs)
+ {
+ /* No-op move. Can't split to nothing; emit something. */
+ emit_note (NOTE_INSN_DELETED);
+ return;
+ }
+
+ /* Preserve register attributes for variable tracking. */
+ destlo = gen_rtx_REG_offset (operands[0], halfmode, dest, 0);
+ desthi = gen_rtx_REG_offset (operands[0], halfmode, dest + halfregs,
+ GET_MODE_SIZE (halfmode));
+
+ /* Special case of reversed high/low parts. */
+ if (reg_overlap_mentioned_p (operands[2], destlo)
+ && reg_overlap_mentioned_p (operands[1], desthi))
+ {
+ emit_insn (gen_xorv16qi3 (operands[1], operands[1], operands[2]));
+ emit_insn (gen_xorv16qi3 (operands[2], operands[1], operands[2]));
+ emit_insn (gen_xorv16qi3 (operands[1], operands[1], operands[2]));
+ }
+ else if (!reg_overlap_mentioned_p (operands[2], destlo))
+ {
+ /* Try to avoid unnecessary moves if part of the result
+ is in the right place already. */
+ if (src1 != dest)
+ emit_move_insn (destlo, operands[1]);
+ if (src2 != dest + halfregs)
+ emit_move_insn (desthi, operands[2]);
+ }
+ else
+ {
+ if (src2 != dest + halfregs)
+ emit_move_insn (desthi, operands[2]);
+ if (src1 != dest)
+ emit_move_insn (destlo, operands[1]);
+ }
+}
+
+/* vec_perm support. */
+
+#define MAX_VECT_LEN 16
+
+struct expand_vec_perm_d
+{
+ rtx target, op0, op1;
+ unsigned char perm[MAX_VECT_LEN];
+ enum machine_mode vmode;
+ unsigned char nelt;
+ bool one_vector_p;
+ bool testing_p;
+};
+
+/* Generate a variable permutation. */
+
+static void
+aarch64_expand_vec_perm_1 (rtx target, rtx op0, rtx op1, rtx sel)
+{
+ enum machine_mode vmode = GET_MODE (target);
+ bool one_vector_p = rtx_equal_p (op0, op1);
+
+ gcc_checking_assert (vmode == V8QImode || vmode == V16QImode);
+ gcc_checking_assert (GET_MODE (op0) == vmode);
+ gcc_checking_assert (GET_MODE (op1) == vmode);
+ gcc_checking_assert (GET_MODE (sel) == vmode);
+ gcc_checking_assert (TARGET_SIMD);
+
+ if (one_vector_p)
+ {
+ if (vmode == V8QImode)
+ {
+ /* Expand the argument to a V16QI mode by duplicating it. */
+ rtx pair = gen_reg_rtx (V16QImode);
+ emit_insn (gen_aarch64_combinev8qi (pair, op0, op0));
+ emit_insn (gen_aarch64_tbl1v8qi (target, pair, sel));
+ }
+ else
+ {
+ emit_insn (gen_aarch64_tbl1v16qi (target, op0, sel));
+ }
+ }
+ else
+ {
+ rtx pair;
+
+ if (vmode == V8QImode)
+ {
+ pair = gen_reg_rtx (V16QImode);
+ emit_insn (gen_aarch64_combinev8qi (pair, op0, op1));
+ emit_insn (gen_aarch64_tbl1v8qi (target, pair, sel));
+ }
+ else
+ {
+ pair = gen_reg_rtx (OImode);
+ emit_insn (gen_aarch64_combinev16qi (pair, op0, op1));
+ emit_insn (gen_aarch64_tbl2v16qi (target, pair, sel));
+ }
+ }
+}
+
+void
+aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
+{
+ enum machine_mode vmode = GET_MODE (target);
+ unsigned int i, nelt = GET_MODE_NUNITS (vmode);
+ bool one_vector_p = rtx_equal_p (op0, op1);
+ rtx rmask[MAX_VECT_LEN], mask;
+
+ gcc_checking_assert (!BYTES_BIG_ENDIAN);
+
+ /* The TBL instruction does not use a modulo index, so we must take care
+ of that ourselves. */
+ mask = GEN_INT (one_vector_p ? nelt - 1 : 2 * nelt - 1);
+ for (i = 0; i < nelt; ++i)
+ rmask[i] = mask;
+ mask = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rmask));
+ sel = expand_simple_binop (vmode, AND, sel, mask, NULL, 0, OPTAB_LIB_WIDEN);
+
+ aarch64_expand_vec_perm_1 (target, op0, op1, sel);
+}
+
+static bool
+aarch64_evpc_tbl (struct expand_vec_perm_d *d)
+{
+ rtx rperm[MAX_VECT_LEN], sel;
+ enum machine_mode vmode = d->vmode;
+ unsigned int i, nelt = d->nelt;
+
+ /* TODO: ARM's TBL indexing is little-endian. In order to handle GCC's
+ numbering of elements for big-endian, we must reverse the order. */
+ if (BYTES_BIG_ENDIAN)
+ return false;
+
+ if (d->testing_p)
+ return true;
+
+ /* Generic code will try constant permutation twice. Once with the
+ original mode and again with the elements lowered to QImode.
+ So wait and don't do the selector expansion ourselves. */
+ if (vmode != V8QImode && vmode != V16QImode)
+ return false;
+
+ for (i = 0; i < nelt; ++i)
+ rperm[i] = GEN_INT (d->perm[i]);
+ sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
+ sel = force_reg (vmode, sel);
+
+ aarch64_expand_vec_perm_1 (d->target, d->op0, d->op1, sel);
+ return true;
+}
+
+static bool
+aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
+{
+ /* The pattern matching functions above are written to look for a small
+ number to begin the sequence (0, 1, N/2). If we begin with an index
+ from the second operand, we can swap the operands. */
+ if (d->perm[0] >= d->nelt)
+ {
+ unsigned i, nelt = d->nelt;
+ rtx x;
+
+ for (i = 0; i < nelt; ++i)
+ d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);
+
+ x = d->op0;
+ d->op0 = d->op1;
+ d->op1 = x;
+ }
+
+ if (TARGET_SIMD)
+ return aarch64_evpc_tbl (d);
+ return false;
+}
+
+/* Expand a vec_perm_const pattern. */
+
+bool
+aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel)
+{
+ struct expand_vec_perm_d d;
+ int i, nelt, which;
+
+ d.target = target;
+ d.op0 = op0;
+ d.op1 = op1;
+
+ d.vmode = GET_MODE (target);
+ gcc_assert (VECTOR_MODE_P (d.vmode));
+ d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
+ d.testing_p = false;
+
+ for (i = which = 0; i < nelt; ++i)
+ {
+ rtx e = XVECEXP (sel, 0, i);
+ int ei = INTVAL (e) & (2 * nelt - 1);
+ which |= (ei < nelt ? 1 : 2);
+ d.perm[i] = ei;
+ }
+
+ switch (which)
+ {
+ default:
+ gcc_unreachable ();
+
+ case 3:
+ d.one_vector_p = false;
+ if (!rtx_equal_p (op0, op1))
+ break;
+
+ /* The elements of PERM do not suggest that only the first operand
+ is used, but both operands are identical. Allow easier matching
+ of the permutation by folding the permutation into the single
+ input vector. */
+ /* Fall Through. */
+ case 2:
+ for (i = 0; i < nelt; ++i)
+ d.perm[i] &= nelt - 1;
+ d.op0 = op1;
+ d.one_vector_p = true;
+ break;
+
+ case 1:
+ d.op1 = op0;
+ d.one_vector_p = true;
+ break;
+ }
+
+ return aarch64_expand_vec_perm_const_1 (&d);
+}
+
+static bool
+aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
+ const unsigned char *sel)
+{
+ struct expand_vec_perm_d d;
+ unsigned int i, nelt, which;
+ bool ret;
+
+ d.vmode = vmode;
+ d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
+ d.testing_p = true;
+ memcpy (d.perm, sel, nelt);
+
+ /* Calculate whether all elements are in one vector. */
+ for (i = which = 0; i < nelt; ++i)
+ {
+ unsigned char e = d.perm[i];
+ gcc_assert (e < 2 * nelt);
+ which |= (e < nelt ? 1 : 2);
+ }
+
+ /* If all elements are from the second vector, reindex as if from the
+ first vector. */
+ if (which == 2)
+ for (i = 0; i < nelt; ++i)
+ d.perm[i] -= nelt;
+
+ /* Check whether the mask can be applied to a single vector. */
+ d.one_vector_p = (which != 3);
+
+ d.target = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 1);
+ d.op1 = d.op0 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 2);
+ if (!d.one_vector_p)
+ d.op1 = gen_raw_REG (d.vmode, LAST_VIRTUAL_REGISTER + 3);
+
+ start_sequence ();
+ ret = aarch64_expand_vec_perm_const_1 (&d);
+ end_sequence ();
+
+ return ret;
+}
+
#undef TARGET_ADDRESS_COST
#define TARGET_ADDRESS_COST aarch64_address_cost
@@ -6864,6 +7153,12 @@ aarch64_c_mode_for_suffix (char suffix)
#undef TARGET_MAX_ANCHOR_OFFSET
#define TARGET_MAX_ANCHOR_OFFSET 4095
+/* vec_perm support. */
+
+#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK
+#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \
+ aarch64_vectorize_vec_perm_const_ok
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-aarch64.h"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 7a1cdc8..9ea5e0c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -228,6 +228,8 @@
UNSPEC_FMAX ; Used in aarch64-simd.md.
UNSPEC_FMIN ; Used in aarch64-simd.md.
UNSPEC_BSL ; Used in aarch64-simd.md.
+ UNSPEC_TBL ; Used in vector permute patterns.
+ UNSPEC_CONCAT ; Used in vector permute patterns.
])
;; -------------------------------------------------------------------
@@ -415,8 +417,9 @@
(define_mode_attr V_cmp_result [(V8QI "V8QI") (V16QI "V16QI")
(V4HI "V4HI") (V8HI "V8HI")
(V2SI "V2SI") (V4SI "V4SI")
+ (DI "DI") (V2DI "V2DI")
(V2SF "V2SI") (V4SF "V4SI")
- (DI "DI") (V2DI "V2DI")])
+ (V2DF "V2DI")])
;; Vm for lane instructions is restricted to FP_LO_REGS.
(define_mode_attr vwx [(V4HI "x") (V8HI "x") (HI "x")
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5935346..bce98d0 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3014,6 +3014,7 @@ proc check_effective_target_vect_perm { } {
} else {
set et_vect_perm_saved 0
if { [is-effective-target arm_neon_ok]
+ || [istarget aarch64*-*-*]
|| [istarget powerpc*-*-*]
|| [istarget spu-*-*]
|| [istarget i?86-*-*]
@@ -3040,6 +3041,7 @@ proc check_effective_target_vect_perm_byte { } {
} else {
set et_vect_perm_byte_saved 0
if { [is-effective-target arm_neon_ok]
+ || [istarget aarch64*-*-*]
|| [istarget powerpc*-*-*]
|| [istarget spu-*-*] } {
set et_vect_perm_byte_saved 1
@@ -3062,6 +3064,7 @@ proc check_effective_target_vect_perm_short { } {
} else {
set et_vect_perm_short_saved 0
if { [is-effective-target arm_neon_ok]
+ || [istarget aarch64*-*-*]
|| [istarget powerpc*-*-*]
|| [istarget spu-*-*] } {
set et_vect_perm_short_saved 1
@@ -3697,7 +3700,8 @@ proc check_effective_target_vect_char_mult { } {
verbose "check_effective_target_vect_char_mult: using cached result" 2
} else {
set et_vect_char_mult_saved 0
- if { [istarget ia64-*-*]
+ if { [istarget aarch64*-*-*]
+ || [istarget ia64-*-*]
|| [istarget i?86-*-*]
|| [istarget x86_64-*-*]
|| [check_effective_target_arm32] } {
@@ -3768,8 +3772,9 @@ proc check_effective_target_vect_extract_even_odd { } {
verbose "check_effective_target_vect_extract_even_odd: using cached result" 2
} else {
set et_vect_extract_even_odd_saved 0
- if { [istarget powerpc*-*-*]
- || [is-effective-target arm_neon_ok]
+ if { [istarget aarch64*-*-*]
+ || [istarget powerpc*-*-*]
+ || [is-effective-target arm_neon_ok]
|| [istarget i?86-*-*]
|| [istarget x86_64-*-*]
|| [istarget ia64-*-*]
@@ -3793,8 +3798,9 @@ proc check_effective_target_vect_interleave { } {
verbose "check_effective_target_vect_interleave: using cached result" 2
} else {
set et_vect_interleave_saved 0
- if { [istarget powerpc*-*-*]
- || [is-effective-target arm_neon_ok]
+ if { [istarget aarch64*-*-*]
+ || [istarget powerpc*-*-*]
+ || [is-effective-target arm_neon_ok]
|| [istarget i?86-*-*]
|| [istarget x86_64-*-*]
|| [istarget ia64-*-*]
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Patch AArch64] Add zip{1, 2}, uzp{1, 2}, trn{1, 2} support for vector permute.
2012-12-04 10:31 [Patch AArch64] Implement Vector Permute Support James Greenhalgh
@ 2012-12-04 10:36 ` James Greenhalgh
2012-12-04 22:45 ` Marcus Shawcroft
2012-12-04 22:44 ` [Patch AArch64] Implement Vector Permute Support Marcus Shawcroft
2014-01-07 23:10 ` Andrew Pinski
2 siblings, 1 reply; 16+ messages in thread
From: James Greenhalgh @ 2012-12-04 10:36 UTC (permalink / raw)
To: gcc-patches; +Cc: marcus.shawcroft
[-- Attachment #1: Type: text/plain, Size: 967 bytes --]
Hi,
This patch improves our code generation for some cases of
constant vector permutation. In particular, we are able to
generate better code for patterns which match the output
of the zip, uzp and trn instructions.
This patch adds support for these cases.
This patch has been tested with no regressions on
aarch64-none-elf.
OK to commit?
Thanks,
James Greenhalgh
---
gcc/
2012-12-04 James Greenhalgh <james.greenhalgh@arm.com>
* config/aarch64/aarch64-simd-builtins.def: Add new builtins.
* config/aarch64/aarch64-simd.md (simd_type): Add uzp.
(aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>): New.
* config/aarch64/aarch64.c (aarch64_evpc_trn): New.
(aarch64_evpc_uzp): Likewise.
(aarch64_evpc_zip): Likewise.
(aarch64_expand_vec_perm_const_1): Check for trn, zip, uzp patterns.
* config/aarch64/iterators.md (unspec): Add neccessary unspecs.
(PERMUTE): New.
(perm_insn): Likewise.
(perm_hilo): Likewise.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Patch-AArch64-Add-zip-1-2-uzp-1-2-trn-1-2-support-fo.patch --]
[-- Type: text/x-patch; name=0001-Patch-AArch64-Add-zip-1-2-uzp-1-2-trn-1-2-support-fo.patch, Size: 11386 bytes --]
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 2e3c4e1..8730c56 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -206,3 +206,12 @@
BUILTIN_VDQ_BHSI (BINOP, smin)
BUILTIN_VDQ_BHSI (BINOP, umax)
BUILTIN_VDQ_BHSI (BINOP, umin)
+
+ /* Implemented by
+ aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>. */
+ BUILTIN_VALL (BINOP, zip1)
+ BUILTIN_VALL (BINOP, zip2)
+ BUILTIN_VALL (BINOP, uzp1)
+ BUILTIN_VALL (BINOP, uzp2)
+ BUILTIN_VALL (BINOP, trn1)
+ BUILTIN_VALL (BINOP, trn2)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2b0c8d6..df88ef4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -128,7 +128,8 @@
; simd_store4s store single structure from one lane for four registers (ST4 [index]).
; simd_tbl table lookup.
; simd_trn transpose.
-; simd_zip zip/unzip.
+; simd_uzp unzip.
+; simd_zip zip.
(define_attr "simd_type"
"simd_abd,\
@@ -230,6 +231,7 @@
simd_store4s,\
simd_tbl,\
simd_trn,\
+ simd_uzp,\
simd_zip,\
none"
(const_string "none"))
@@ -3366,6 +3368,17 @@
DONE;
})
+(define_insn "aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>"
+ [(set (match_operand:VALL 0 "register_operand" "=w")
+ (unspec:VALL [(match_operand:VALL 1 "register_operand" "w")
+ (match_operand:VALL 2 "register_operand" "w")]
+ PERMUTE))]
+ "TARGET_SIMD"
+ "<PERMUTE:perm_insn><PERMUTE:perm_hilo>\\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+ [(set_attr "simd_type" "simd_<PERMUTE:perm_insn>")
+ (set_attr "simd_mode" "<MODE>")]
+)
+
(define_insn "aarch64_st2<mode>_dreg"
[(set (match_operand:TI 0 "aarch64_simd_struct_operand" "=Utv")
(unspec:TI [(match_operand:OI 1 "register_operand" "w")
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cebc8cb..0eac0b7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -6815,6 +6815,261 @@ aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
aarch64_expand_vec_perm_1 (target, op0, op1, sel);
}
+/* Recognize patterns suitable for the TRN instructions. */
+static bool
+aarch64_evpc_trn (struct expand_vec_perm_d *d)
+{
+ unsigned int i, odd, mask, nelt = d->nelt;
+ rtx out, in0, in1, x;
+ rtx (*gen) (rtx, rtx, rtx);
+ enum machine_mode vmode = d->vmode;
+
+ if (GET_MODE_UNIT_SIZE (vmode) > 8)
+ return false;
+
+ /* Note that these are little-endian tests.
+ We correct for big-endian later. */
+ if (d->perm[0] == 0)
+ odd = 0;
+ else if (d->perm[0] == 1)
+ odd = 1;
+ else
+ return false;
+ mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
+
+ for (i = 0; i < nelt; i += 2)
+ {
+ if (d->perm[i] != i + odd)
+ return false;
+ if (d->perm[i + 1] != ((i + nelt + odd) & mask))
+ return false;
+ }
+
+ /* Success! */
+ if (d->testing_p)
+ return true;
+
+ in0 = d->op0;
+ in1 = d->op1;
+ if (BYTES_BIG_ENDIAN)
+ {
+ x = in0, in0 = in1, in1 = x;
+ odd = !odd;
+ }
+ out = d->target;
+
+ if (odd)
+ {
+ switch (vmode)
+ {
+ case V16QImode: gen = gen_aarch64_trn2v16qi; break;
+ case V8QImode: gen = gen_aarch64_trn2v8qi; break;
+ case V8HImode: gen = gen_aarch64_trn2v8hi; break;
+ case V4HImode: gen = gen_aarch64_trn2v4hi; break;
+ case V4SImode: gen = gen_aarch64_trn2v4si; break;
+ case V2SImode: gen = gen_aarch64_trn2v2si; break;
+ case V2DImode: gen = gen_aarch64_trn2v2di; break;
+ case V4SFmode: gen = gen_aarch64_trn2v4sf; break;
+ case V2SFmode: gen = gen_aarch64_trn2v2sf; break;
+ case V2DFmode: gen = gen_aarch64_trn2v2df; break;
+ default:
+ return false;
+ }
+ }
+ else
+ {
+ switch (vmode)
+ {
+ case V16QImode: gen = gen_aarch64_trn1v16qi; break;
+ case V8QImode: gen = gen_aarch64_trn1v8qi; break;
+ case V8HImode: gen = gen_aarch64_trn1v8hi; break;
+ case V4HImode: gen = gen_aarch64_trn1v4hi; break;
+ case V4SImode: gen = gen_aarch64_trn1v4si; break;
+ case V2SImode: gen = gen_aarch64_trn1v2si; break;
+ case V2DImode: gen = gen_aarch64_trn1v2di; break;
+ case V4SFmode: gen = gen_aarch64_trn1v4sf; break;
+ case V2SFmode: gen = gen_aarch64_trn1v2sf; break;
+ case V2DFmode: gen = gen_aarch64_trn1v2df; break;
+ default:
+ return false;
+ }
+ }
+
+ emit_insn (gen (out, in0, in1));
+ return true;
+}
+
+/* Recognize patterns suitable for the UZP instructions. */
+static bool
+aarch64_evpc_uzp (struct expand_vec_perm_d *d)
+{
+ unsigned int i, odd, mask, nelt = d->nelt;
+ rtx out, in0, in1, x;
+ rtx (*gen) (rtx, rtx, rtx);
+ enum machine_mode vmode = d->vmode;
+
+ if (GET_MODE_UNIT_SIZE (vmode) > 8)
+ return false;
+
+ /* Note that these are little-endian tests.
+ We correct for big-endian later. */
+ if (d->perm[0] == 0)
+ odd = 0;
+ else if (d->perm[0] == 1)
+ odd = 1;
+ else
+ return false;
+ mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
+
+ for (i = 0; i < nelt; i++)
+ {
+ unsigned elt = (i * 2 + odd) & mask;
+ if (d->perm[i] != elt)
+ return false;
+ }
+
+ /* Success! */
+ if (d->testing_p)
+ return true;
+
+ in0 = d->op0;
+ in1 = d->op1;
+ if (BYTES_BIG_ENDIAN)
+ {
+ x = in0, in0 = in1, in1 = x;
+ odd = !odd;
+ }
+ out = d->target;
+
+ if (odd)
+ {
+ switch (vmode)
+ {
+ case V16QImode: gen = gen_aarch64_uzp2v16qi; break;
+ case V8QImode: gen = gen_aarch64_uzp2v8qi; break;
+ case V8HImode: gen = gen_aarch64_uzp2v8hi; break;
+ case V4HImode: gen = gen_aarch64_uzp2v4hi; break;
+ case V4SImode: gen = gen_aarch64_uzp2v4si; break;
+ case V2SImode: gen = gen_aarch64_uzp2v2si; break;
+ case V2DImode: gen = gen_aarch64_uzp2v2di; break;
+ case V4SFmode: gen = gen_aarch64_uzp2v4sf; break;
+ case V2SFmode: gen = gen_aarch64_uzp2v2sf; break;
+ case V2DFmode: gen = gen_aarch64_uzp2v2df; break;
+ default:
+ return false;
+ }
+ }
+ else
+ {
+ switch (vmode)
+ {
+ case V16QImode: gen = gen_aarch64_uzp1v16qi; break;
+ case V8QImode: gen = gen_aarch64_uzp1v8qi; break;
+ case V8HImode: gen = gen_aarch64_uzp1v8hi; break;
+ case V4HImode: gen = gen_aarch64_uzp1v4hi; break;
+ case V4SImode: gen = gen_aarch64_uzp1v4si; break;
+ case V2SImode: gen = gen_aarch64_uzp1v2si; break;
+ case V2DImode: gen = gen_aarch64_uzp1v2di; break;
+ case V4SFmode: gen = gen_aarch64_uzp1v4sf; break;
+ case V2SFmode: gen = gen_aarch64_uzp1v2sf; break;
+ case V2DFmode: gen = gen_aarch64_uzp1v2df; break;
+ default:
+ return false;
+ }
+ }
+
+ emit_insn (gen (out, in0, in1));
+ return true;
+}
+
+/* Recognize patterns suitable for the ZIP instructions. */
+static bool
+aarch64_evpc_zip (struct expand_vec_perm_d *d)
+{
+ unsigned int i, high, mask, nelt = d->nelt;
+ rtx out, in0, in1, x;
+ rtx (*gen) (rtx, rtx, rtx);
+ enum machine_mode vmode = d->vmode;
+
+ if (GET_MODE_UNIT_SIZE (vmode) > 8)
+ return false;
+
+ /* Note that these are little-endian tests.
+ We correct for big-endian later. */
+ high = nelt / 2;
+ if (d->perm[0] == high)
+ /* Do Nothing. */
+ ;
+ else if (d->perm[0] == 0)
+ high = 0;
+ else
+ return false;
+ mask = (d->one_vector_p ? nelt - 1 : 2 * nelt - 1);
+
+ for (i = 0; i < nelt / 2; i++)
+ {
+ unsigned elt = (i + high) & mask;
+ if (d->perm[i * 2] != elt)
+ return false;
+ elt = (elt + nelt) & mask;
+ if (d->perm[i * 2 + 1] != elt)
+ return false;
+ }
+
+ /* Success! */
+ if (d->testing_p)
+ return true;
+
+ in0 = d->op0;
+ in1 = d->op1;
+ if (BYTES_BIG_ENDIAN)
+ {
+ x = in0, in0 = in1, in1 = x;
+ high = !high;
+ }
+ out = d->target;
+
+ if (high)
+ {
+ switch (vmode)
+ {
+ case V16QImode: gen = gen_aarch64_zip2v16qi; break;
+ case V8QImode: gen = gen_aarch64_zip2v8qi; break;
+ case V8HImode: gen = gen_aarch64_zip2v8hi; break;
+ case V4HImode: gen = gen_aarch64_zip2v4hi; break;
+ case V4SImode: gen = gen_aarch64_zip2v4si; break;
+ case V2SImode: gen = gen_aarch64_zip2v2si; break;
+ case V2DImode: gen = gen_aarch64_zip2v2di; break;
+ case V4SFmode: gen = gen_aarch64_zip2v4sf; break;
+ case V2SFmode: gen = gen_aarch64_zip2v2sf; break;
+ case V2DFmode: gen = gen_aarch64_zip2v2df; break;
+ default:
+ return false;
+ }
+ }
+ else
+ {
+ switch (vmode)
+ {
+ case V16QImode: gen = gen_aarch64_zip1v16qi; break;
+ case V8QImode: gen = gen_aarch64_zip1v8qi; break;
+ case V8HImode: gen = gen_aarch64_zip1v8hi; break;
+ case V4HImode: gen = gen_aarch64_zip1v4hi; break;
+ case V4SImode: gen = gen_aarch64_zip1v4si; break;
+ case V2SImode: gen = gen_aarch64_zip1v2si; break;
+ case V2DImode: gen = gen_aarch64_zip1v2di; break;
+ case V4SFmode: gen = gen_aarch64_zip1v4sf; break;
+ case V2SFmode: gen = gen_aarch64_zip1v2sf; break;
+ case V2DFmode: gen = gen_aarch64_zip1v2df; break;
+ default:
+ return false;
+ }
+ }
+
+ emit_insn (gen (out, in0, in1));
+ return true;
+}
+
static bool
aarch64_evpc_tbl (struct expand_vec_perm_d *d)
{
@@ -6865,7 +7120,15 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
}
if (TARGET_SIMD)
- return aarch64_evpc_tbl (d);
+ {
+ if (aarch64_evpc_zip (d))
+ return true;
+ else if (aarch64_evpc_uzp (d))
+ return true;
+ else if (aarch64_evpc_trn (d))
+ return true;
+ return aarch64_evpc_tbl (d);
+ }
return false;
}
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 9ea5e0c..d710ea0 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -230,6 +230,12 @@
UNSPEC_BSL ; Used in aarch64-simd.md.
UNSPEC_TBL ; Used in vector permute patterns.
UNSPEC_CONCAT ; Used in vector permute patterns.
+ UNSPEC_ZIP1 ; Used in vector permute patterns.
+ UNSPEC_ZIP2 ; Used in vector permute patterns.
+ UNSPEC_UZP1 ; Used in vector permute patterns.
+ UNSPEC_UZP2 ; Used in vector permute patterns.
+ UNSPEC_TRN1 ; Used in vector permute patterns.
+ UNSPEC_TRN2 ; Used in vector permute patterns.
])
;; -------------------------------------------------------------------
@@ -649,6 +655,9 @@
(define_int_iterator VCMP_U [UNSPEC_CMHS UNSPEC_CMHI UNSPEC_CMTST])
+(define_int_iterator PERMUTE [UNSPEC_ZIP1 UNSPEC_ZIP2
+ UNSPEC_TRN1 UNSPEC_TRN2
+ UNSPEC_UZP1 UNSPEC_UZP2])
;; -------------------------------------------------------------------
;; Int Iterators Attributes.
@@ -732,3 +741,10 @@
(define_int_attr offsetlr [(UNSPEC_SSLI "1") (UNSPEC_USLI "1")
(UNSPEC_SSRI "0") (UNSPEC_USRI "0")])
+(define_int_attr perm_insn [(UNSPEC_ZIP1 "zip") (UNSPEC_ZIP2 "zip")
+ (UNSPEC_TRN1 "trn") (UNSPEC_TRN2 "trn")
+ (UNSPEC_UZP1 "uzp") (UNSPEC_UZP2 "uzp")])
+
+(define_int_attr perm_hilo [(UNSPEC_ZIP1 "1") (UNSPEC_ZIP2 "2")
+ (UNSPEC_TRN1 "1") (UNSPEC_TRN2 "2")
+ (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")])
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch AArch64] Add zip{1, 2}, uzp{1, 2}, trn{1, 2} support for vector permute.
2012-12-04 10:36 ` [Patch AArch64] Add zip{1, 2}, uzp{1, 2}, trn{1, 2} support for vector permute James Greenhalgh
@ 2012-12-04 22:45 ` Marcus Shawcroft
0 siblings, 0 replies; 16+ messages in thread
From: Marcus Shawcroft @ 2012-12-04 22:45 UTC (permalink / raw)
To: James Greenhalgh; +Cc: gcc-patches, marcus.shawcroft
OK
/Marcus
On 4 December 2012 10:36, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This patch improves our code generation for some cases of
> constant vector permutation. In particular, we are able to
> generate better code for patterns which match the output
> of the zip, uzp and trn instructions.
>
> This patch adds support for these cases.
>
> This patch has been tested with no regressions on
> aarch64-none-elf.
>
> OK to commit?
>
> Thanks,
> James Greenhalgh
>
> ---
> gcc/
> 2012-12-04 James Greenhalgh <james.greenhalgh@arm.com>
>
> * config/aarch64/aarch64-simd-builtins.def: Add new builtins.
> * config/aarch64/aarch64-simd.md (simd_type): Add uzp.
> (aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>): New.
> * config/aarch64/aarch64.c (aarch64_evpc_trn): New.
> (aarch64_evpc_uzp): Likewise.
> (aarch64_evpc_zip): Likewise.
> (aarch64_expand_vec_perm_const_1): Check for trn, zip, uzp patterns.
> * config/aarch64/iterators.md (unspec): Add neccessary unspecs.
> (PERMUTE): New.
> (perm_insn): Likewise.
> (perm_hilo): Likewise.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch AArch64] Implement Vector Permute Support
2012-12-04 10:31 [Patch AArch64] Implement Vector Permute Support James Greenhalgh
2012-12-04 10:36 ` [Patch AArch64] Add zip{1, 2}, uzp{1, 2}, trn{1, 2} support for vector permute James Greenhalgh
@ 2012-12-04 22:44 ` Marcus Shawcroft
2012-12-06 16:25 ` James Greenhalgh
2014-01-07 23:10 ` Andrew Pinski
2 siblings, 1 reply; 16+ messages in thread
From: Marcus Shawcroft @ 2012-12-04 22:44 UTC (permalink / raw)
To: James Greenhalgh; +Cc: gcc-patches, marcus.shawcroft
OK
/Marcus
On 4 December 2012 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This patch adds support for Vector Shuffle style operations
> through support for TARGET_VECTORIZE_VEC_PERM_CONST_OK and
> the vec_perm and vec_perm_const standard patterns.
>
> In this patch we add the framework and support for the
> generic tbl instruction. This can be used to handle any
> vector permute operation, but we can do a better job for
> some special cases. The second patch of this series does
> that better job for the ZIP, UZP and TRN instructions.
>
> Is this OK to commit?
>
> Thanks,
> James Greenhalgh
>
> ---
> gcc/
>
> 2012-12-04 James Greenhalgh <james.greenhalgh@arm.com>
>
> * config/aarch64/aarch64-protos.h
> (aarch64_split_combinev16qi): New.
> (aarch64_expand_vec_perm): Likewise.
> (aarch64_expand_vec_perm_const): Likewise.
> * config/aarch64/aarch64-simd.md (vec_perm_const<mode>): New.
> (vec_perm<mode>): Likewise.
> (aarch64_tbl1<mode>): Likewise.
> (aarch64_tbl2v16qi): Likewise.
> (aarch64_combinev16qi): New.
> * config/aarch64/aarch64.c
> (aarch64_vectorize_vec_perm_const_ok): New.
> (aarch64_split_combinev16qi): Likewise.
> (MAX_VECT_LEN): Define.
> (expand_vec_perm_d): New.
> (aarch64_expand_vec_perm_1): Likewise.
> (aarch64_expand_vec_perm): Likewise.
> (aarch64_evpc_tbl): Likewise.
> (aarch64_expand_vec_perm_const_1): Likewise.
> (aarch64_expand_vec_perm_const): Likewise.
> (aarch64_vectorize_vec_perm_const_ok): Likewise.
> (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Likewise.
> * config/aarch64/iterators.md
> (unspec): Add UNSPEC_TBL, UNSPEC_CONCAT.
> (V_cmp_result): Add mapping for V2DF.
>
> gcc/testsuite/
>
> 2012-12-04 James Greenhalgh <james.greenhalgh@arm.com>
>
> * lib/target-supports.exp
> (check_effective_target_vect_perm): Allow aarch64*-*-*.
> (check_effective_target_vect_perm_byte): Likewise.
> (check_effective_target_vect_perm_short): Likewise.
> (check_effective_target_vect_char_mult): Likewise.
> (check_effective_target_vect_extract_even_odd): Likewise.
> (check_effective_target_vect_interleave): Likewise.
^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [Patch AArch64] Implement Vector Permute Support
2012-12-04 22:44 ` [Patch AArch64] Implement Vector Permute Support Marcus Shawcroft
@ 2012-12-06 16:25 ` James Greenhalgh
0 siblings, 0 replies; 16+ messages in thread
From: James Greenhalgh @ 2012-12-06 16:25 UTC (permalink / raw)
To: Marcus Shawcroft; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 860 bytes --]
> OK
>
> /Marcus
Thanks Marcus,
I've committed this and the follow-up patch to trunk and
back-ported them to AArch64-4.7-branch.
The back-port required back-porting the attached patch,
which fixes up the expected behaviour of
gcc/testsuite/gcc.dg/vect/slp-perm-8.c.
After committing this as a prerequisite, the patch series
regresses clean on aarch64-none-elf.
Thanks,
James Greenhalgh
---
gcc/testsuite
2012-12-06 James Greenhalgh <james.greenhalgh@arm.com>
Backport from mainline.
2012-05-31 Greta Yorsh <Greta.Yorsh@arm.com>
* lib/target-supports.exp (check_effective_target_vect_char_mult):
Add
arm32 to targets.
* gcc.dg/vect/slp-perm-8.c (main): Prevent vectorization
of the initialization loop.
(dg-final): Adjust the expected number of vectorized loops depending
on vect_char_mult target selector.
[-- Attachment #2: 0001-aarch64-4.7-Backport-fix-to-gcc.dg-vect-slp-perm-8.c.patch --]
[-- Type: application/octet-stream, Size: 1605 bytes --]
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
index d211ef9..c4854d5 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-8.c
@@ -32,8 +32,7 @@ int main (int argc, const char* argv[])
{
input[i] = i;
output[i] = 0;
- if (input[i] > 256)
- abort ();
+ __asm__ volatile ("");
}
for (i = 0; i < N / 3; i++)
@@ -52,7 +51,8 @@ int main (int argc, const char* argv[])
return 0;
}
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target vect_perm_byte } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { vect_perm_byte && vect_char_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_perm_byte && {! vect_char_mult } } } } } */
/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_perm_byte } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index ccd3966..d7836eb 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3555,7 +3555,8 @@ proc check_effective_target_vect_char_mult { } {
set et_vect_char_mult_saved 0
if { [istarget ia64-*-*]
|| [istarget i?86-*-*]
- || [istarget x86_64-*-*] } {
+ || [istarget x86_64-*-*]
+ || [check_effective_target_arm32] } {
set et_vect_char_mult_saved 1
}
}
--------------1.7.12.176.g3fc0e4c.dirty--
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Patch AArch64] Implement Vector Permute Support
2012-12-04 10:31 [Patch AArch64] Implement Vector Permute Support James Greenhalgh
2012-12-04 10:36 ` [Patch AArch64] Add zip{1, 2}, uzp{1, 2}, trn{1, 2} support for vector permute James Greenhalgh
2012-12-04 22:44 ` [Patch AArch64] Implement Vector Permute Support Marcus Shawcroft
@ 2014-01-07 23:10 ` Andrew Pinski
[not found] ` <72A61951-68B2-4776-A2B8-05DC4E1F53A7@arm.com>
2 siblings, 1 reply; 16+ messages in thread
From: Andrew Pinski @ 2014-01-07 23:10 UTC (permalink / raw)
To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft
On Tue, Dec 4, 2012 at 2:31 AM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This patch adds support for Vector Shuffle style operations
> through support for TARGET_VECTORIZE_VEC_PERM_CONST_OK and
> the vec_perm and vec_perm_const standard patterns.
>
> In this patch we add the framework and support for the
> generic tbl instruction. This can be used to handle any
> vector permute operation, but we can do a better job for
> some special cases. The second patch of this series does
> that better job for the ZIP, UZP and TRN instructions.
>
> Is this OK to commit?
This breaks big-endian aarch64 in a very bad way. vec_perm<mode> is
enabled for big-endian but aarch64_expand_vec_perm will ICE right
away. Can you please test big-endian also next time?
Here is the shortest testcase which fails at -O3:
void fill_window(unsigned short *p, unsigned wsize)
{
unsigned n, m;
do {
m = *--p;
*p = (unsigned short)(m >= wsize ? m-wsize : 0);
} while (--n);
}
This comes from zlib and it blocks my building of the trunk.
Thanks,
Andrew Pinski
>
> Thanks,
> James Greenhalgh
>
> ---
> gcc/
>
> 2012-12-04 James Greenhalgh <james.greenhalgh@arm.com>
>
> * config/aarch64/aarch64-protos.h
> (aarch64_split_combinev16qi): New.
> (aarch64_expand_vec_perm): Likewise.
> (aarch64_expand_vec_perm_const): Likewise.
> * config/aarch64/aarch64-simd.md (vec_perm_const<mode>): New.
> (vec_perm<mode>): Likewise.
> (aarch64_tbl1<mode>): Likewise.
> (aarch64_tbl2v16qi): Likewise.
> (aarch64_combinev16qi): New.
> * config/aarch64/aarch64.c
> (aarch64_vectorize_vec_perm_const_ok): New.
> (aarch64_split_combinev16qi): Likewise.
> (MAX_VECT_LEN): Define.
> (expand_vec_perm_d): New.
> (aarch64_expand_vec_perm_1): Likewise.
> (aarch64_expand_vec_perm): Likewise.
> (aarch64_evpc_tbl): Likewise.
> (aarch64_expand_vec_perm_const_1): Likewise.
> (aarch64_expand_vec_perm_const): Likewise.
> (aarch64_vectorize_vec_perm_const_ok): Likewise.
> (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Likewise.
> * config/aarch64/iterators.md
> (unspec): Add UNSPEC_TBL, UNSPEC_CONCAT.
> (V_cmp_result): Add mapping for V2DF.
>
> gcc/testsuite/
>
> 2012-12-04 James Greenhalgh <james.greenhalgh@arm.com>
>
> * lib/target-supports.exp
> (check_effective_target_vect_perm): Allow aarch64*-*-*.
> (check_effective_target_vect_perm_byte): Likewise.
> (check_effective_target_vect_perm_short): Likewise.
> (check_effective_target_vect_char_mult): Likewise.
> (check_effective_target_vect_extract_even_odd): Likewise.
> (check_effective_target_vect_interleave): Likewise.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-01-20 18:36 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-04 10:31 [Patch AArch64] Implement Vector Permute Support James Greenhalgh
2012-12-04 10:36 ` [Patch AArch64] Add zip{1, 2}, uzp{1, 2}, trn{1, 2} support for vector permute James Greenhalgh
2012-12-04 22:45 ` Marcus Shawcroft
2012-12-04 22:44 ` [Patch AArch64] Implement Vector Permute Support Marcus Shawcroft
2012-12-06 16:25 ` James Greenhalgh
2014-01-07 23:10 ` Andrew Pinski
[not found] ` <72A61951-68B2-4776-A2B8-05DC4E1F53A7@arm.com>
2014-01-08 0:10 ` Andrew Pinski
2014-01-08 11:00 ` James Greenhalgh
2014-01-14 15:19 ` Alex Velenko
2014-01-14 15:51 ` pinskia
2014-01-16 14:43 ` Alex Velenko
2014-01-17 15:55 ` Richard Earnshaw
2014-01-20 11:15 ` Alex Velenko
2014-01-20 11:17 ` Richard Earnshaw
2014-01-20 17:33 ` Alex Velenko
2014-01-20 18:36 ` Marcus Shawcroft
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).