* [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
@ 2016-06-07 16:57 Kyrill Tkachov
2016-06-14 9:37 ` James Greenhalgh
2016-06-30 14:51 ` Richard Earnshaw (lists)
0 siblings, 2 replies; 5+ messages in thread
From: Kyrill Tkachov @ 2016-06-07 16:57 UTC (permalink / raw)
To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh
[-- Attachment #1: Type: text/plain, Size: 1626 bytes --]
Hi all,
This patch addresses an deficiency we have in handling vector lane-to-lane moves in the AArch64 backend.
Generally we can use the INS (element) instruction but, as a user complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
we don't. James had a patch adding an appropriate combine pattern some time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html)
but it never got applied.
This patch is a rebase of that patch that adds necessary vec_merge+vec_duplicate+vec_select combine pattern.
I chose to use a define_insn rather than the define_insn_and_split in that patch that just deletes the instruction when
the source and destination registers are the same, as I think that's not he combine patterns job to delete the redundant instruction
but rather some other passes job. Also, I was not able to create a testcase where it would make a difference.
Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline assembly to a vget_lane+vset_lane combo.
This can be done as a separate patch on top of this one.
Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.
Ok for trunk?
Thanks,
Kyrill
2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
New define_insn.
(*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>): Likewise.
2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* gcc.target/aarch64/vget_set_lane_1.c: New test.
[-- Attachment #2: aarch64-ins-vec.patch --]
[-- Type: text/x-patch, Size: 6210 bytes --]
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set<mode>"
[(set_attr "type" "neon_from_gp<q>, neon_ins<q>, neon_load1_1reg<q>")]
)
+(define_insn "*aarch64_simd_vec_copy_lane<mode>"
+ [(set (match_operand:VALL 0 "register_operand" "=w")
+ (vec_merge:VALL
+ (vec_duplicate:VALL
+ (vec_select:<VEL>
+ (match_operand:VALL 3 "register_operand" "w")
+ (parallel
+ [(match_operand:SI 4 "immediate_operand" "i")])))
+ (match_operand:VALL 1 "register_operand" "0")
+ (match_operand:SI 2 "immediate_operand" "i")))]
+ "TARGET_SIMD"
+ {
+ int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
+ operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
+ operands[4] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[4])));
+
+ return "ins\t%0.<Vetype>[%p2], %3.<Vetype>[%4]";
+ }
+ [(set_attr "type" "neon_ins<q>")]
+)
+
+(define_insn "*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>"
+ [(set (match_operand:VALL 0 "register_operand" "=w")
+ (vec_merge:VALL
+ (vec_duplicate:VALL
+ (vec_select:<VEL>
+ (match_operand:<VSWAP_WIDTH> 3 "register_operand" "w")
+ (parallel
+ [(match_operand:SI 4 "immediate_operand" "i")])))
+ (match_operand:VALL 1 "register_operand" "0")
+ (match_operand:SI 2 "immediate_operand" "i")))]
+ "TARGET_SIMD"
+ {
+ int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
+ operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
+ operands[4] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
+ INTVAL (operands[4])));
+
+ return "ins\t%0.<Vetype>[%p2], %3.<Vetype>[%4]";
+ }
+ [(set_attr "type" "neon_ins<q>")]
+)
+
(define_insn "aarch64_simd_lshr<mode>"
[(set (match_operand:VDQ_I 0 "register_operand" "=w")
(lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..07a77de319206c5c6dad1c0d2d9bcc998583f9c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
@@ -0,0 +1,72 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include "arm_neon.h"
+
+#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \
+TYPE1 __attribute__((noinline,noclone)) \
+test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) \
+{ \
+ return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\
+ a, INDEX1); \
+}
+
+BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6)
+BUILD_TEST (int8x8_t, int8x8_t, , , s8, 7, 6)
+BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6)
+/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[6\\\]" 3 } } */
+BUILD_TEST (poly16x4_t, poly16x4_t, , , p16, 3, 2)
+BUILD_TEST (int16x4_t, int16x4_t, , , s16, 3, 2)
+BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
+/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[2\\\]" 3 } } */
+BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
+BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
+BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
+/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
+
+BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15)
+BUILD_TEST (int8x8_t, int8x16_t, , q, s8, 7, 15)
+BUILD_TEST (uint8x8_t, uint8x16_t, , q, u8, 7, 15)
+/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[15\\\]" 3 } } */
+BUILD_TEST (poly16x4_t, poly16x8_t, , q, p16, 3, 7)
+BUILD_TEST (int16x4_t, int16x8_t, , q, s16, 3, 7)
+BUILD_TEST (uint16x4_t, uint16x8_t, , q, u16, 3, 7)
+/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[7\\\]" 3 } } */
+BUILD_TEST (float32x2_t, float32x4_t, , q, f32, 1, 3)
+BUILD_TEST (int32x2_t, int32x4_t, , q, s32, 1, 3)
+BUILD_TEST (uint32x2_t, uint32x4_t, , q, u32, 1, 3)
+/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[3\\\]" 3 } } */
+
+BUILD_TEST (poly8x16_t, poly8x8_t, q, , p8, 15, 7)
+BUILD_TEST (int8x16_t, int8x8_t, q, , s8, 15, 7)
+BUILD_TEST (uint8x16_t, uint8x8_t, q, , u8, 15, 7)
+/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[15\\\], v1.b\\\[7\\\]" 3 } } */
+BUILD_TEST (poly16x8_t, poly16x4_t, q, , p16, 7, 3)
+BUILD_TEST (int16x8_t, int16x4_t, q, , s16, 7, 3)
+BUILD_TEST (uint16x8_t, uint16x4_t, q, , u16, 7, 3)
+/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[7\\\], v1.h\\\[3\\\]" 3 } } */
+BUILD_TEST (float32x4_t, float32x2_t, q, , f32, 3, 1)
+BUILD_TEST (int32x4_t, int32x2_t, q, , s32, 3, 1)
+BUILD_TEST (uint32x4_t, uint32x2_t, q, , u32, 3, 1)
+/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[3\\\], v1.s\\\[1\\\]" 3 } } */
+BUILD_TEST (float64x2_t, float64x1_t, q, , f64, 1, 0)
+BUILD_TEST (int64x2_t, int64x1_t, q, , s64, 1, 0)
+BUILD_TEST (uint64x2_t, uint64x1_t, q, , u64, 1, 0)
+/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[0\\\]" 3 } } */
+
+BUILD_TEST (poly8x16_t, poly8x16_t, q, q, p8, 14, 15)
+BUILD_TEST (int8x16_t, int8x16_t, q, q, s8, 14, 15)
+BUILD_TEST (uint8x16_t, uint8x16_t, q, q, u8, 14, 15)
+/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[14\\\], v1.b\\\[15\\\]" 3 } } */
+BUILD_TEST (poly16x8_t, poly16x8_t, q, q, p16, 6, 7)
+BUILD_TEST (int16x8_t, int16x8_t, q, q, s16, 6, 7)
+BUILD_TEST (uint16x8_t, uint16x8_t, q, q, u16, 6, 7)
+/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[6\\\], v1.h\\\[7\\\]" 3 } } */
+BUILD_TEST (float32x4_t, float32x4_t, q, q, f32, 2, 3)
+BUILD_TEST (int32x4_t, int32x4_t, q, q, s32, 2, 3)
+BUILD_TEST (uint32x4_t, uint32x4_t, q, q, u32, 2, 3)
+/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[2\\\], v1.s\\\[3\\\]" 3 } } */
+BUILD_TEST (float64x2_t, float64x2_t, q, q, f64, 1, 1)
+BUILD_TEST (int64x2_t, int64x2_t, q, q, s64, 1, 1)
+BUILD_TEST (uint64x2_t, uint64x2_t, q, q, u64, 1, 1)
+/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[1\\\]" 3 } } */
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
2016-06-07 16:57 [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors Kyrill Tkachov
@ 2016-06-14 9:37 ` James Greenhalgh
2016-06-22 10:08 ` Kyrill Tkachov
2016-06-30 14:51 ` Richard Earnshaw (lists)
1 sibling, 1 reply; 5+ messages in thread
From: James Greenhalgh @ 2016-06-14 9:37 UTC (permalink / raw)
To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, nd
On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote:
> Hi all,
>
> This patch addresses an deficiency we have in handling vector lane-to-lane
> moves in the AArch64 backend. Generally we can use the INS (element)
> instruction but, as a user complains in
> https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
> we don't. James had a patch adding an appropriate combine pattern some time
> ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never
> got applied.
>
> This patch is a rebase of that patch that adds necessary
> vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a
> define_insn rather than the define_insn_and_split in that patch that just
> deletes the instruction when the source and destination registers are the
> same, as I think that's not he combine patterns job to delete the redundant
> instruction but rather some other passes job. Also, I was not able to create
> a testcase where it would make a difference.
>
> Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline
> assembly to a vget_lane+vset_lane combo. This can be done as a separate
> patch on top of this one.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Also tested on aarch64_be-none-elf.
>
> Ok for trunk?
This looks OK to me, but as it is based on my code I probably can't
approve it within the spirit of the write access policies (I only have
localized review permission).
Best wait for Richard/Marcus or a global reviewer to take a look.
>
> Thanks,
> Kyrill
>
> 2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
> New define_insn.
> (*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>): Likewise.
Watch your ChangeLog formatting.
Thanks,
James
>
> 2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/vget_set_lane_1.c: New test.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
2016-06-14 9:37 ` James Greenhalgh
@ 2016-06-22 10:08 ` Kyrill Tkachov
2016-06-30 14:28 ` Kyrill Tkachov
0 siblings, 1 reply; 5+ messages in thread
From: Kyrill Tkachov @ 2016-06-22 10:08 UTC (permalink / raw)
To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, nd
Ping.
Richard, Marcus, do you have any feedback on this?
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html
Thanks,
Kyrill
On 14/06/16 10:36, James Greenhalgh wrote:
> On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote:
>> Hi all,
>>
>> This patch addresses an deficiency we have in handling vector lane-to-lane
>> moves in the AArch64 backend. Generally we can use the INS (element)
>> instruction but, as a user complains in
>> https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
>> we don't. James had a patch adding an appropriate combine pattern some time
>> ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never
>> got applied.
>>
>> This patch is a rebase of that patch that adds necessary
>> vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a
>> define_insn rather than the define_insn_and_split in that patch that just
>> deletes the instruction when the source and destination registers are the
>> same, as I think that's not he combine patterns job to delete the redundant
>> instruction but rather some other passes job. Also, I was not able to create
>> a testcase where it would make a difference.
>>
>> Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline
>> assembly to a vget_lane+vset_lane combo. This can be done as a separate
>> patch on top of this one.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Also tested on aarch64_be-none-elf.
>>
>> Ok for trunk?
> This looks OK to me, but as it is based on my code I probably can't
> approve it within the spirit of the write access policies (I only have
> localized review permission).
>
> Best wait for Richard/Marcus or a global reviewer to take a look.
>
>> Thanks,
>> Kyrill
>>
>> 2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
>> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>> * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
>> New define_insn.
>> (*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>): Likewise.
> Watch your ChangeLog formatting.
>
> Thanks,
> James
>
>> 2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
>> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>
>> * gcc.target/aarch64/vget_set_lane_1.c: New test.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
2016-06-22 10:08 ` Kyrill Tkachov
@ 2016-06-30 14:28 ` Kyrill Tkachov
0 siblings, 0 replies; 5+ messages in thread
From: Kyrill Tkachov @ 2016-06-30 14:28 UTC (permalink / raw)
To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, nd
Ping.
Thanks,
Kyrill
On 22/06/16 11:07, Kyrill Tkachov wrote:
> Ping.
> Richard, Marcus, do you have any feedback on this?
>
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html
>
> Thanks,
> Kyrill
> On 14/06/16 10:36, James Greenhalgh wrote:
>> On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> This patch addresses an deficiency we have in handling vector lane-to-lane
>>> moves in the AArch64 backend. Generally we can use the INS (element)
>>> instruction but, as a user complains in
>>> https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
>>> we don't. James had a patch adding an appropriate combine pattern some time
>>> ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never
>>> got applied.
>>>
>>> This patch is a rebase of that patch that adds necessary
>>> vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a
>>> define_insn rather than the define_insn_and_split in that patch that just
>>> deletes the instruction when the source and destination registers are the
>>> same, as I think that's not he combine patterns job to delete the redundant
>>> instruction but rather some other passes job. Also, I was not able to create
>>> a testcase where it would make a difference.
>>>
>>> Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline
>>> assembly to a vget_lane+vset_lane combo. This can be done as a separate
>>> patch on top of this one.
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>> Also tested on aarch64_be-none-elf.
>>>
>>> Ok for trunk?
>> This looks OK to me, but as it is based on my code I probably can't
>> approve it within the spirit of the write access policies (I only have
>> localized review permission).
>>
>> Best wait for Richard/Marcus or a global reviewer to take a look.
>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
>>> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>
>>> * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
>>> New define_insn.
>>> (*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>): Likewise.
>> Watch your ChangeLog formatting.
>>
>> Thanks,
>> James
>>
>>> 2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
>>> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>>>
>>> * gcc.target/aarch64/vget_set_lane_1.c: New test.
>>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
2016-06-07 16:57 [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors Kyrill Tkachov
2016-06-14 9:37 ` James Greenhalgh
@ 2016-06-30 14:51 ` Richard Earnshaw (lists)
1 sibling, 0 replies; 5+ messages in thread
From: Richard Earnshaw (lists) @ 2016-06-30 14:51 UTC (permalink / raw)
To: Kyrill Tkachov, GCC Patches; +Cc: Marcus Shawcroft, James Greenhalgh
On 07/06/16 17:56, Kyrill Tkachov wrote:
> Hi all,
>
> This patch addresses an deficiency we have in handling vector
> lane-to-lane moves in the AArch64 backend.
> Generally we can use the INS (element) instruction but, as a user
> complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
> we don't. James had a patch adding an appropriate combine pattern some
> time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html)
> but it never got applied.
>
> This patch is a rebase of that patch that adds necessary
> vec_merge+vec_duplicate+vec_select combine pattern.
> I chose to use a define_insn rather than the define_insn_and_split in
> that patch that just deletes the instruction when
> the source and destination registers are the same, as I think that's not
> he combine patterns job to delete the redundant instruction
> but rather some other passes job. Also, I was not able to create a
> testcase where it would make a difference.
>
> Also, this patch doesn't reimplement that vcopy*lane* intrinsics from
> inline assembly to a vget_lane+vset_lane combo.
> This can be done as a separate patch on top of this one.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Also tested on aarch64_be-none-elf.
>
> Ok for trunk?
>
OK.
R.
> Thanks,
> Kyrill
>
> 2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane<mode>):
> New define_insn.
> (*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>): Likewise.
>
> 2016-06-07 James Greenhalgh <james.greenhalgh@arm.com>
> Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> * gcc.target/aarch64/vget_set_lane_1.c: New test.
>
> aarch64-ins-vec.patch
>
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set<mode>"
> [(set_attr "type" "neon_from_gp<q>, neon_ins<q>, neon_load1_1reg<q>")]
> )
>
> +(define_insn "*aarch64_simd_vec_copy_lane<mode>"
> + [(set (match_operand:VALL 0 "register_operand" "=w")
> + (vec_merge:VALL
> + (vec_duplicate:VALL
> + (vec_select:<VEL>
> + (match_operand:VALL 3 "register_operand" "w")
> + (parallel
> + [(match_operand:SI 4 "immediate_operand" "i")])))
> + (match_operand:VALL 1 "register_operand" "0")
> + (match_operand:SI 2 "immediate_operand" "i")))]
> + "TARGET_SIMD"
> + {
> + int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
> + operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
> + operands[4] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[4])));
> +
> + return "ins\t%0.<Vetype>[%p2], %3.<Vetype>[%4]";
> + }
> + [(set_attr "type" "neon_ins<q>")]
> +)
> +
> +(define_insn "*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>"
> + [(set (match_operand:VALL 0 "register_operand" "=w")
> + (vec_merge:VALL
> + (vec_duplicate:VALL
> + (vec_select:<VEL>
> + (match_operand:<VSWAP_WIDTH> 3 "register_operand" "w")
> + (parallel
> + [(match_operand:SI 4 "immediate_operand" "i")])))
> + (match_operand:VALL 1 "register_operand" "0")
> + (match_operand:SI 2 "immediate_operand" "i")))]
> + "TARGET_SIMD"
> + {
> + int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
> + operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
> + operands[4] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
> + INTVAL (operands[4])));
> +
> + return "ins\t%0.<Vetype>[%p2], %3.<Vetype>[%4]";
> + }
> + [(set_attr "type" "neon_ins<q>")]
> +)
> +
> (define_insn "aarch64_simd_lshr<mode>"
> [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> (lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..07a77de319206c5c6dad1c0d2d9bcc998583f9c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> @@ -0,0 +1,72 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "arm_neon.h"
> +
> +#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \
> +TYPE1 __attribute__((noinline,noclone)) \
> +test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) \
> +{ \
> + return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\
> + a, INDEX1); \
> +}
> +
> +BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6)
> +BUILD_TEST (int8x8_t, int8x8_t, , , s8, 7, 6)
> +BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6)
> +/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[6\\\]" 3 } } */
> +BUILD_TEST (poly16x4_t, poly16x4_t, , , p16, 3, 2)
> +BUILD_TEST (int16x4_t, int16x4_t, , , s16, 3, 2)
> +BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
> +/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[2\\\]" 3 } } */
> +BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
> +BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
> +BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
> +/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */
> +
> +BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15)
> +BUILD_TEST (int8x8_t, int8x16_t, , q, s8, 7, 15)
> +BUILD_TEST (uint8x8_t, uint8x16_t, , q, u8, 7, 15)
> +/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[15\\\]" 3 } } */
> +BUILD_TEST (poly16x4_t, poly16x8_t, , q, p16, 3, 7)
> +BUILD_TEST (int16x4_t, int16x8_t, , q, s16, 3, 7)
> +BUILD_TEST (uint16x4_t, uint16x8_t, , q, u16, 3, 7)
> +/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[7\\\]" 3 } } */
> +BUILD_TEST (float32x2_t, float32x4_t, , q, f32, 1, 3)
> +BUILD_TEST (int32x2_t, int32x4_t, , q, s32, 1, 3)
> +BUILD_TEST (uint32x2_t, uint32x4_t, , q, u32, 1, 3)
> +/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[3\\\]" 3 } } */
> +
> +BUILD_TEST (poly8x16_t, poly8x8_t, q, , p8, 15, 7)
> +BUILD_TEST (int8x16_t, int8x8_t, q, , s8, 15, 7)
> +BUILD_TEST (uint8x16_t, uint8x8_t, q, , u8, 15, 7)
> +/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[15\\\], v1.b\\\[7\\\]" 3 } } */
> +BUILD_TEST (poly16x8_t, poly16x4_t, q, , p16, 7, 3)
> +BUILD_TEST (int16x8_t, int16x4_t, q, , s16, 7, 3)
> +BUILD_TEST (uint16x8_t, uint16x4_t, q, , u16, 7, 3)
> +/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[7\\\], v1.h\\\[3\\\]" 3 } } */
> +BUILD_TEST (float32x4_t, float32x2_t, q, , f32, 3, 1)
> +BUILD_TEST (int32x4_t, int32x2_t, q, , s32, 3, 1)
> +BUILD_TEST (uint32x4_t, uint32x2_t, q, , u32, 3, 1)
> +/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[3\\\], v1.s\\\[1\\\]" 3 } } */
> +BUILD_TEST (float64x2_t, float64x1_t, q, , f64, 1, 0)
> +BUILD_TEST (int64x2_t, int64x1_t, q, , s64, 1, 0)
> +BUILD_TEST (uint64x2_t, uint64x1_t, q, , u64, 1, 0)
> +/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[0\\\]" 3 } } */
> +
> +BUILD_TEST (poly8x16_t, poly8x16_t, q, q, p8, 14, 15)
> +BUILD_TEST (int8x16_t, int8x16_t, q, q, s8, 14, 15)
> +BUILD_TEST (uint8x16_t, uint8x16_t, q, q, u8, 14, 15)
> +/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[14\\\], v1.b\\\[15\\\]" 3 } } */
> +BUILD_TEST (poly16x8_t, poly16x8_t, q, q, p16, 6, 7)
> +BUILD_TEST (int16x8_t, int16x8_t, q, q, s16, 6, 7)
> +BUILD_TEST (uint16x8_t, uint16x8_t, q, q, u16, 6, 7)
> +/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[6\\\], v1.h\\\[7\\\]" 3 } } */
> +BUILD_TEST (float32x4_t, float32x4_t, q, q, f32, 2, 3)
> +BUILD_TEST (int32x4_t, int32x4_t, q, q, s32, 2, 3)
> +BUILD_TEST (uint32x4_t, uint32x4_t, q, q, u32, 2, 3)
> +/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[2\\\], v1.s\\\[3\\\]" 3 } } */
> +BUILD_TEST (float64x2_t, float64x2_t, q, q, f64, 1, 1)
> +BUILD_TEST (int64x2_t, int64x2_t, q, q, s64, 1, 1)
> +BUILD_TEST (uint64x2_t, uint64x2_t, q, q, u64, 1, 1)
> +/* { dg-final { scan-assembler-times "ins\\tv0.d\\\[1\\\], v1.d\\\[1\\\]" 3 } } */
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-06-30 14:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-07 16:57 [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors Kyrill Tkachov
2016-06-14 9:37 ` James Greenhalgh
2016-06-22 10:08 ` Kyrill Tkachov
2016-06-30 14:28 ` Kyrill Tkachov
2016-06-30 14:51 ` Richard Earnshaw (lists)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).