public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][AArch64] Add combine pattern for storing lane zero of a vecto
@ 2017-04-21  8:59 Kyrill Tkachov
  2017-05-11 10:39 ` Kyrill Tkachov
  2017-06-02 13:52 ` James Greenhalgh
  0 siblings, 2 replies; 4+ messages in thread
From: Kyrill Tkachov @ 2017-04-21  8:59 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

[-- Attachment #1: Type: text/plain, Size: 1625 bytes --]

Hi all,

Consider the code:
typedef long long v2di __attribute__ ((vector_size (16)));
  
void
store_laned (v2di x, long long *y)
{
   y[0] = x[1];
   y[3] = x[0];
}

AArch64 GCC will generate:
store_laned:
         umov    x1, v0.d[0]
         st1     {v0.d}[1], [x0]
         str     x1, [x0, 24]
         ret

It moves the zero lane into a core register and does a scalar store when instead it could have used a scalar FP store
that supports the required addressing mode:
store_laned:
         st1     {v0.d}[1], [x0]
         str     d0, [x0, 24]
         ret

Combine already tries to match this pattern:

Trying 10 -> 11:
Failed to match this instruction:
(set (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
             (const_int 24 [0x18])) [1 MEM[(long long int *)y_4(D) + 24B]+0 S8 A64])
     (vec_select:DI (reg/v:V2DI 75 [ x ])
         (parallel [
                 (const_int 0 [0])
             ])))

but we don't match it in the backend. It's not hard to add it, so this patch does that for all the relevant vector modes.
With this patch we generate the second sequence above and in SPEC2006 eliminate some address computation instructions
because we use the more expressive STR instead of ST1 or we eliminate such moves to the integer registers because we
can just do the store of the D-reg.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* config/aarch64/aarch64-simd.md (aarch64_store_lane0<mode>):
	New pattern.

2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

	* gcc.target/aarch64/store_lane0_str_1.c: New test.


[-- Attachment #2: aarch64-store-lane0.patch --]
[-- Type: text/x-patch, Size: 3057 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f36665e27bb66e0f1fb42443ce7b506bd2bf6914..bf13f0753a856a13ae92ceeb44291df3dc379a13 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -153,6 +153,19 @@ (define_insn "*aarch64_simd_mov<mode>"
    (set_attr "length" "4,4,4,8,8,8,4")]
 )
 
+;; When storing lane zero we can use the normal STR and its more permissive
+;; addressing modes.
+
+(define_insn "aarch64_store_lane0<mode>"
+  [(set (match_operand:<VEL> 0 "memory_operand" "=m")
+	(vec_select:<VEL> (match_operand:VALL_F16 1 "register_operand" "w")
+			(parallel [(match_operand 2 "const_int_operand" "n")])))]
+  "TARGET_SIMD
+   && ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])) == 0"
+  "str\\t%<Vetype>1, %0"
+  [(set_attr "type" "neon_store1_1reg<q>")]
+)
+
 (define_insn "load_pair<mode>"
   [(set (match_operand:VD 0 "register_operand" "=w")
 	(match_operand:VD 1 "aarch64_mem_pair_operand" "Ump"))
diff --git a/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c b/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..4464fec2c1f24c212be4fc6c94b509843fd0058e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/store_lane0_str_1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int v2si __attribute__ ((vector_size (8)));
+typedef float v2sf __attribute__ ((vector_size (8)));
+typedef short v4hi __attribute__ ((vector_size (8)));
+typedef __fp16 v4hf __attribute__ ((vector_size (8)));
+typedef char v8qi __attribute__ ((vector_size (8)));
+
+typedef int v4si __attribute__ ((vector_size (16)));
+typedef float v4sf __attribute__ ((vector_size (16)));
+typedef short v8hi __attribute__ ((vector_size (16)));
+typedef __fp16 v8hf __attribute__ ((vector_size (16)));
+typedef char v16qi __attribute__ ((vector_size (16)));
+typedef long long v2di __attribute__ ((vector_size (16)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define LANE(N) (N - 1)
+#else
+#define LANE(N) 0
+#endif
+
+#define FUNC(T, E, N)			\
+void					\
+store_lane_##T (T x, E *y)		\
+{					\
+  y[0] = x[N - 1 - LANE (N)];		\
+  y[3] = x[LANE (N)];			\
+}
+
+FUNC (v2si, int, 2)
+FUNC (v2sf, float, 2)
+FUNC (v4hi, short, 4)
+FUNC (v4hf, __fp16, 4)
+FUNC (v8qi, char, 8)
+
+FUNC (v4si, int, 4)
+FUNC (v4sf, float, 4)
+FUNC (v8hi, short, 8)
+FUNC (v8hf, __fp16, 8)
+FUNC (v16qi, char, 16)
+FUNC (v2di, long long, 2)
+FUNC (v2df, double, 2)
+
+/* When storing lane zero of a vector we can use the scalar STR instruction
+   that supports more addressing modes.  */
+
+/* { dg-final { scan-assembler-times "str\ts\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "str\tb\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-times "str\th\[0-9\]+" 4 } } */
+/* { dg-final { scan-assembler-times "str\td\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-not "umov" } } */
+/* { dg-final { scan-assembler-not "dup" } } */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][AArch64] Add combine pattern for storing lane zero of a vecto
  2017-04-21  8:59 [PATCH][AArch64] Add combine pattern for storing lane zero of a vecto Kyrill Tkachov
@ 2017-05-11 10:39 ` Kyrill Tkachov
  2017-06-02 10:39   ` Kyrill Tkachov
  2017-06-02 13:52 ` James Greenhalgh
  1 sibling, 1 reply; 4+ messages in thread
From: Kyrill Tkachov @ 2017-05-11 10:39 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

Ping.

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00934.html

Thanks,
Kyrill

On 21/04/17 09:39, Kyrill Tkachov wrote:
> Hi all,
>
> Consider the code:
> typedef long long v2di __attribute__ ((vector_size (16)));
>
> void
> store_laned (v2di x, long long *y)
> {
>   y[0] = x[1];
>   y[3] = x[0];
> }
>
> AArch64 GCC will generate:
> store_laned:
>         umov    x1, v0.d[0]
>         st1     {v0.d}[1], [x0]
>         str     x1, [x0, 24]
>         ret
>
> It moves the zero lane into a core register and does a scalar store when instead it could have used a scalar FP store
> that supports the required addressing mode:
> store_laned:
>         st1     {v0.d}[1], [x0]
>         str     d0, [x0, 24]
>         ret
>
> Combine already tries to match this pattern:
>
> Trying 10 -> 11:
> Failed to match this instruction:
> (set (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>             (const_int 24 [0x18])) [1 MEM[(long long int *)y_4(D) + 24B]+0 S8 A64])
>     (vec_select:DI (reg/v:V2DI 75 [ x ])
>         (parallel [
>                 (const_int 0 [0])
>             ])))
>
> but we don't match it in the backend. It's not hard to add it, so this patch does that for all the relevant vector modes.
> With this patch we generate the second sequence above and in SPEC2006 eliminate some address computation instructions
> because we use the more expressive STR instead of ST1 or we eliminate such moves to the integer registers because we
> can just do the store of the D-reg.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64-simd.md (aarch64_store_lane0<mode>):
>     New pattern.
>
> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/store_lane0_str_1.c: New test.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][AArch64] Add combine pattern for storing lane zero of a vecto
  2017-05-11 10:39 ` Kyrill Tkachov
@ 2017-06-02 10:39   ` Kyrill Tkachov
  0 siblings, 0 replies; 4+ messages in thread
From: Kyrill Tkachov @ 2017-06-02 10:39 UTC (permalink / raw)
  To: GCC Patches; +Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

Ping.

Thanks,
Kyrill

On 11/05/17 11:15, Kyrill Tkachov wrote:
> Ping.
>
> https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00934.html
>
> Thanks,
> Kyrill
>
> On 21/04/17 09:39, Kyrill Tkachov wrote:
>> Hi all,
>>
>> Consider the code:
>> typedef long long v2di __attribute__ ((vector_size (16)));
>>
>> void
>> store_laned (v2di x, long long *y)
>> {
>>   y[0] = x[1];
>>   y[3] = x[0];
>> }
>>
>> AArch64 GCC will generate:
>> store_laned:
>>         umov    x1, v0.d[0]
>>         st1     {v0.d}[1], [x0]
>>         str     x1, [x0, 24]
>>         ret
>>
>> It moves the zero lane into a core register and does a scalar store when instead it could have used a scalar FP store
>> that supports the required addressing mode:
>> store_laned:
>>         st1     {v0.d}[1], [x0]
>>         str     d0, [x0, 24]
>>         ret
>>
>> Combine already tries to match this pattern:
>>
>> Trying 10 -> 11:
>> Failed to match this instruction:
>> (set (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>>             (const_int 24 [0x18])) [1 MEM[(long long int *)y_4(D) + 24B]+0 S8 A64])
>>     (vec_select:DI (reg/v:V2DI 75 [ x ])
>>         (parallel [
>>                 (const_int 0 [0])
>>             ])))
>>
>> but we don't match it in the backend. It's not hard to add it, so this patch does that for all the relevant vector modes.
>> With this patch we generate the second sequence above and in SPEC2006 eliminate some address computation instructions
>> because we use the more expressive STR instead of ST1 or we eliminate such moves to the integer registers because we
>> can just do the store of the D-reg.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * config/aarch64/aarch64-simd.md (aarch64_store_lane0<mode>):
>>     New pattern.
>>
>> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     * gcc.target/aarch64/store_lane0_str_1.c: New test.
>>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][AArch64] Add combine pattern for storing lane zero of a vecto
  2017-04-21  8:59 [PATCH][AArch64] Add combine pattern for storing lane zero of a vecto Kyrill Tkachov
  2017-05-11 10:39 ` Kyrill Tkachov
@ 2017-06-02 13:52 ` James Greenhalgh
  1 sibling, 0 replies; 4+ messages in thread
From: James Greenhalgh @ 2017-06-02 13:52 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw, nd

On Fri, Apr 21, 2017 at 09:39:29AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> Consider the code:
> typedef long long v2di __attribute__ ((vector_size (16)));
> void
> store_laned (v2di x, long long *y)
> {
>   y[0] = x[1];
>   y[3] = x[0];
> }
> 
> AArch64 GCC will generate:
> store_laned:
>         umov    x1, v0.d[0]
>         st1     {v0.d}[1], [x0]
>         str     x1, [x0, 24]
>         ret
> 
> It moves the zero lane into a core register and does a scalar store when instead it could have used a scalar FP store
> that supports the required addressing mode:
> store_laned:
>         st1     {v0.d}[1], [x0]
>         str     d0, [x0, 24]
>         ret
> 
> Combine already tries to match this pattern:
> 
> Trying 10 -> 11:
> Failed to match this instruction:
> (set (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>             (const_int 24 [0x18])) [1 MEM[(long long int *)y_4(D) + 24B]+0 S8 A64])
>     (vec_select:DI (reg/v:V2DI 75 [ x ])
>         (parallel [
>                 (const_int 0 [0])
>             ])))
> 
> but we don't match it in the backend. It's not hard to add it, so this patch does that for all the relevant vector modes.
> With this patch we generate the second sequence above and in SPEC2006 eliminate some address computation instructions
> because we use the more expressive STR instead of ST1 or we eliminate such moves to the integer registers because we
> can just do the store of the D-reg.

Good spot!

> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

OK.

Thanks,
James

> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
> 	* config/aarch64/aarch64-simd.md (aarch64_store_lane0<mode>):
> 	New pattern.
> 
> 2017-04-21  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
> 	* gcc.target/aarch64/store_lane0_str_1.c: New test.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-02 13:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-21  8:59 [PATCH][AArch64] Add combine pattern for storing lane zero of a vecto Kyrill Tkachov
2017-05-11 10:39 ` Kyrill Tkachov
2017-06-02 10:39   ` Kyrill Tkachov
2017-06-02 13:52 ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).