public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] vec_merge + vec_duplicate + vec_concat simplification
@ 2017-06-06  8:35 Kyrill Tkachov
  2017-06-06 21:40 ` Marc Glisse
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Kyrill Tkachov @ 2017-06-06  8:35 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]

Hi all,

Another vec_merge simplification that's missing is transforming:
(vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
into
(vec_concat x z) if N == 1 (0b01) or
(vec_concat y x) if N == 2 (0b10)

For the testcase in this patch on aarch64 this allows us to try matching during combine the pattern:
(set (reg:V2DI 78 [ x ])
     (vec_concat:V2DI
         (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
         (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
                 (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64])))

rather than the more complex:
(set (reg:V2DI 78 [ x ])
     (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
                     (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64]))
         (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64]))
         (const_int 2 [0x2])))

We don't actually have an aarch64 pattern for the simplified version above, but it's a simple enough
form to add, so this patch adds such a pattern that performs a concatenated load of two 64-bit vectors
in adjacent memory locations as a single Q-register LDR. The new aarch64 pattern is needed to demonstrate
the effectiveness of the simplify-rtx change, so I've kept them together as one patch.

Now for the testcase in the patch we can generate:
construct_lanedi:
         ldr     q0, [x0]
         ret

construct_lanedf:
         ldr     q0, [x0]
         ret

instead of:
construct_lanedi:
         ld1r    {v0.2d}, [x0]
         ldr     x0, [x0, 8]
         ins     v0.d[1], x0
         ret

construct_lanedf:
         ld1r    {v0.2d}, [x0]
         ldr     d1, [x0, 8]
         ins     v0.d[1], v1.d[0]
         ret

The new memory constraint Utq is needed because we need to allow only the Q-register addressing modes but
the MEM expressions in the RTL pattern have 64-bit vector modes, and if we don't constrain them they will
allow the D-register addressing modes during register allocation/address mode selection, which will produce
invalid assembly.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?

Thanks,
Kyrill

2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
     Simplify vec_merge of vec_duplicate and vec_concat.
     * config/aarch64/constraints.md (Utq): New constraint.
     * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New
     define_insn.

2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/load_v2vec_lanes_1.c: New test.

[-- Attachment #2: vec-merge-vec-dup-2.patch --]
[-- Type: text/x-patch, Size: 3919 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 77a3a7d6534e5fd3575e33d5a7c607713abd614b..b78affe9b06ffc973888822a4fcf1ec8e80ecdf6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2803,6 +2803,20 @@ (define_insn "aarch64_get_lane<mode>"
   [(set_attr "type" "neon_to_gp<q>, neon_dup<q>, neon_store1_one_lane<q>")]
 )
 
+(define_insn "load_pair_lanes<mode>"
+  [(set (match_operand:<VDBL> 0 "register_operand" "=w")
+	(vec_concat:<VDBL>
+	   (match_operand:VDC 1 "memory_operand" "Utq")
+	   (match_operand:VDC 2 "memory_operand" "m")))]
+  "TARGET_SIMD && !STRICT_ALIGNMENT
+   && rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (Pmode,
+				  XEXP (operands[1], 0),
+				  GET_MODE_SIZE (<MODE>mode)))"
+  "ldr\\t%q0, %1"
+  [(set_attr "type" "neon_load1_1reg_q")]
+)
+
 ;; In this insn, operand 1 should be low, and operand 2 the high part of the
 ;; dest vector.
 
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index b8293376fde7e03c4cfc2a6ad6268201f487eb92..ab607b9f7488e903a14fe93e88d4c4e1fad762b3 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -161,6 +161,13 @@ (define_memory_constraint "Utv"
   (and (match_code "mem")
        (match_test "aarch64_simd_mem_operand_p (op)")))
 
+(define_memory_constraint "Utq"
+  "@internal
+   An address valid for loading or storing a 128-bit AdvSIMD register"
+  (and (match_code "mem")
+       (match_test "aarch64_legitimate_address_p (V2DImode, XEXP (op, 0),
+						  MEM, 1)")))
+
 (define_constraint "Ufc"
   "A floating point constant which can be used with an\
    FMOV immediate operation."
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 42824b6c61af37f6b005de75bd1e5ebe7522bdba..a4aebae68afc14a69870e1fd280d28251aa5f398 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5701,6 +5701,25 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 		std::swap (newop0, newop1);
 	      return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
 	    }
+	  /* Replace (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
+	     with (vec_concat x z) if N == 1, or (vec_concat y x) if N == 2.
+	     Only applies for vectors of two elements.  */
+	  if (GET_CODE (op0) == VEC_DUPLICATE
+	      && GET_CODE (op1) == VEC_CONCAT
+	      && GET_MODE_NUNITS (GET_MODE (op0)) == 2
+	      && GET_MODE_NUNITS (GET_MODE (op1)) == 2
+	      && IN_RANGE (sel, 1, 2))
+	    {
+	      rtx newop0 = XEXP (op0, 0);
+	      rtx newop1 = XEXP (op1, 2 - sel);
+	      rtx otherop = XEXP (op1, sel - 1);
+	      if (sel == 2)
+		std::swap (newop0, newop1);
+	      /* Don't want to throw away the other part of the vec_concat if
+		 it has side-effects.  */
+	      if (!side_effects_p (otherop))
+		return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
+	    }
 	}
 
       if (rtx_equal_p (op0, op1)
diff --git a/gcc/testsuite/gcc.target/aarch64/load_v2vec_lanes_1.c b/gcc/testsuite/gcc.target/aarch64/load_v2vec_lanes_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c31b340154b5469fca858a579e9a6ab90ee0d22
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/load_v2vec_lanes_1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef long long v2di __attribute__ ((vector_size (16)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+v2di
+construct_lanedi (long long *y)
+{
+  v2di x = { y[0], y[1] };
+  return x;
+}
+
+v2df
+construct_lanedf (double *y)
+{
+  v2df x = { y[0], y[1] };
+  return x;
+}
+
+/* We can use the load_pair_lanes<mode> pattern to vec_concat two DI/DF
+   values from consecutive memory into a 2-element vector by using
+   a Q-reg LDR.  */
+
+/* { dg-final { scan-assembler-times "ldr\tq\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-not "ins\t" } } */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification
  2017-06-06  8:35 [PATCH] vec_merge + vec_duplicate + vec_concat simplification Kyrill Tkachov
@ 2017-06-06 21:40 ` Marc Glisse
  2017-06-27 22:28 ` Jeff Law
  2017-07-24 10:46 ` James Greenhalgh
  2 siblings, 0 replies; 6+ messages in thread
From: Marc Glisse @ 2017-06-06 21:40 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

On Tue, 6 Jun 2017, Kyrill Tkachov wrote:

> Another vec_merge simplification that's missing is transforming:
> (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
> into
> (vec_concat x z) if N == 1 (0b01) or
> (vec_concat y x) if N == 2 (0b10)

Do we have a canonicalization somewhere that guarantees we cannot get
(vec_merge (vec_concat (y) (z)) (vec_duplicate x) (const_int N))
?

I was wondering if it would be possible to merge the transformations for 
concat and duplicate into a single one, but maybe it becomes too 
unreadable.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification
  2017-06-06  8:35 [PATCH] vec_merge + vec_duplicate + vec_concat simplification Kyrill Tkachov
  2017-06-06 21:40 ` Marc Glisse
@ 2017-06-27 22:28 ` Jeff Law
  2017-07-05 15:15   ` Kyrill Tkachov
  2017-07-24 10:46 ` James Greenhalgh
  2 siblings, 1 reply; 6+ messages in thread
From: Jeff Law @ 2017-06-27 22:28 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 06/06/2017 02:35 AM, Kyrill Tkachov wrote:
> Hi all,
> 
> Another vec_merge simplification that's missing is transforming:
> (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
> into
> (vec_concat x z) if N == 1 (0b01) or
> (vec_concat y x) if N == 2 (0b10)
> 
> For the testcase in this patch on aarch64 this allows us to try matching
> during combine the pattern:
> (set (reg:V2DI 78 [ x ])
>     (vec_concat:V2DI
>         (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
>         (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>                 (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) +
> 8B]+0 S8 A64])))
> 
> rather than the more complex:
> (set (reg:V2DI 78 [ x ])
>     (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76
> [ y ])
>                     (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D)
> + 8B]+0 S8 A64]))
>         (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0
> S8 A64]))
>         (const_int 2 [0x2])))
> 
> We don't actually have an aarch64 pattern for the simplified version
> above, but it's a simple enough
> form to add, so this patch adds such a pattern that performs a
> concatenated load of two 64-bit vectors
> in adjacent memory locations as a single Q-register LDR. The new aarch64
> pattern is needed to demonstrate
> the effectiveness of the simplify-rtx change, so I've kept them together
> as one patch.
> 
> Now for the testcase in the patch we can generate:
> construct_lanedi:
>         ldr     q0, [x0]
>         ret
> 
> construct_lanedf:
>         ldr     q0, [x0]
>         ret
> 
> instead of:
> construct_lanedi:
>         ld1r    {v0.2d}, [x0]
>         ldr     x0, [x0, 8]
>         ins     v0.d[1], x0
>         ret
> 
> construct_lanedf:
>         ld1r    {v0.2d}, [x0]
>         ldr     d1, [x0, 8]
>         ins     v0.d[1], v1.d[0]
>         ret
> 
> The new memory constraint Utq is needed because we need to allow only
> the Q-register addressing modes but
> the MEM expressions in the RTL pattern have 64-bit vector modes, and if
> we don't constrain them they will
> allow the D-register addressing modes during register allocation/address
> mode selection, which will produce
> invalid assembly.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
>     Simplify vec_merge of vec_duplicate and vec_concat.
>     * config/aarch64/constraints.md (Utq): New constraint.
>     * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New
>     define_insn.
> 
> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * gcc.target/aarch64/load_v2vec_lanes_1.c: New test.
OK for the simplify-rtx bits.

jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification
  2017-06-27 22:28 ` Jeff Law
@ 2017-07-05 15:15   ` Kyrill Tkachov
  2017-07-18  8:54     ` Kyrill Tkachov
  0 siblings, 1 reply; 6+ messages in thread
From: Kyrill Tkachov @ 2017-07-05 15:15 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: James Greenhalgh, Richard Earnshaw, Marcus Shawcroft


On 27/06/17 23:28, Jeff Law wrote:
> On 06/06/2017 02:35 AM, Kyrill Tkachov wrote:
>> Hi all,
>>
>> Another vec_merge simplification that's missing is transforming:
>> (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
>> into
>> (vec_concat x z) if N == 1 (0b01) or
>> (vec_concat y x) if N == 2 (0b10)
>>
>> For the testcase in this patch on aarch64 this allows us to try matching
>> during combine the pattern:
>> (set (reg:V2DI 78 [ x ])
>>      (vec_concat:V2DI
>>          (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
>>          (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>>                  (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) +
>> 8B]+0 S8 A64])))
>>
>> rather than the more complex:
>> (set (reg:V2DI 78 [ x ])
>>      (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76
>> [ y ])
>>                      (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D)
>> + 8B]+0 S8 A64]))
>>          (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0
>> S8 A64]))
>>          (const_int 2 [0x2])))
>>
>> We don't actually have an aarch64 pattern for the simplified version
>> above, but it's a simple enough
>> form to add, so this patch adds such a pattern that performs a
>> concatenated load of two 64-bit vectors
>> in adjacent memory locations as a single Q-register LDR. The new aarch64
>> pattern is needed to demonstrate
>> the effectiveness of the simplify-rtx change, so I've kept them together
>> as one patch.
>>
>> Now for the testcase in the patch we can generate:
>> construct_lanedi:
>>          ldr     q0, [x0]
>>          ret
>>
>> construct_lanedf:
>>          ldr     q0, [x0]
>>          ret
>>
>> instead of:
>> construct_lanedi:
>>          ld1r    {v0.2d}, [x0]
>>          ldr     x0, [x0, 8]
>>          ins     v0.d[1], x0
>>          ret
>>
>> construct_lanedf:
>>          ld1r    {v0.2d}, [x0]
>>          ldr     d1, [x0, 8]
>>          ins     v0.d[1], v1.d[0]
>>          ret
>>
>> The new memory constraint Utq is needed because we need to allow only
>> the Q-register addressing modes but
>> the MEM expressions in the RTL pattern have 64-bit vector modes, and if
>> we don't constrain them they will
>> allow the D-register addressing modes during register allocation/address
>> mode selection, which will produce
>> invalid assembly.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
>>      Simplify vec_merge of vec_duplicate and vec_concat.
>>      * config/aarch64/constraints.md (Utq): New constraint.
>>      * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New
>>      define_insn.
>>
>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * gcc.target/aarch64/load_v2vec_lanes_1.c: New test.
> OK for the simplify-rtx bits.

Thanks Jeff.
I'd like to ping the aarch64 bits:
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00273.html

I've re-bootstrapped and re-tested these patches on aarch64 with today's trunk.

Kyrill

> jeff
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification
  2017-07-05 15:15   ` Kyrill Tkachov
@ 2017-07-18  8:54     ` Kyrill Tkachov
  0 siblings, 0 replies; 6+ messages in thread
From: Kyrill Tkachov @ 2017-07-18  8:54 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: James Greenhalgh, Richard Earnshaw, Marcus Shawcroft


On 05/07/17 16:14, Kyrill Tkachov wrote:
>
> On 27/06/17 23:28, Jeff Law wrote:
>> On 06/06/2017 02:35 AM, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> Another vec_merge simplification that's missing is transforming:
>>> (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
>>> into
>>> (vec_concat x z) if N == 1 (0b01) or
>>> (vec_concat y x) if N == 2 (0b10)
>>>
>>> For the testcase in this patch on aarch64 this allows us to try matching
>>> during combine the pattern:
>>> (set (reg:V2DI 78 [ x ])
>>>      (vec_concat:V2DI
>>>          (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
>>>          (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>>>                  (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) +
>>> 8B]+0 S8 A64])))
>>>
>>> rather than the more complex:
>>> (set (reg:V2DI 78 [ x ])
>>>      (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76
>>> [ y ])
>>>                      (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D)
>>> + 8B]+0 S8 A64]))
>>>          (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0
>>> S8 A64]))
>>>          (const_int 2 [0x2])))
>>>
>>> We don't actually have an aarch64 pattern for the simplified version
>>> above, but it's a simple enough
>>> form to add, so this patch adds such a pattern that performs a
>>> concatenated load of two 64-bit vectors
>>> in adjacent memory locations as a single Q-register LDR. The new aarch64
>>> pattern is needed to demonstrate
>>> the effectiveness of the simplify-rtx change, so I've kept them together
>>> as one patch.
>>>
>>> Now for the testcase in the patch we can generate:
>>> construct_lanedi:
>>>          ldr     q0, [x0]
>>>          ret
>>>
>>> construct_lanedf:
>>>          ldr     q0, [x0]
>>>          ret
>>>
>>> instead of:
>>> construct_lanedi:
>>>          ld1r    {v0.2d}, [x0]
>>>          ldr     x0, [x0, 8]
>>>          ins     v0.d[1], x0
>>>          ret
>>>
>>> construct_lanedf:
>>>          ld1r    {v0.2d}, [x0]
>>>          ldr     d1, [x0, 8]
>>>          ins     v0.d[1], v1.d[0]
>>>          ret
>>>
>>> The new memory constraint Utq is needed because we need to allow only
>>> the Q-register addressing modes but
>>> the MEM expressions in the RTL pattern have 64-bit vector modes, and if
>>> we don't constrain them they will
>>> allow the D-register addressing modes during register allocation/address
>>> mode selection, which will produce
>>> invalid assembly.
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
>>>      Simplify vec_merge of vec_duplicate and vec_concat.
>>>      * config/aarch64/constraints.md (Utq): New constraint.
>>>      * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New
>>>      define_insn.
>>>
>>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * gcc.target/aarch64/load_v2vec_lanes_1.c: New test.
>> OK for the simplify-rtx bits.
>
> Thanks Jeff.
> I'd like to ping the aarch64 bits:
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00273.html
>

Ping.

Thanks,
Kyrill

> I've re-bootstrapped and re-tested these patches on aarch64 with today's trunk.
>
> Kyrill
>
>> jeff
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] vec_merge + vec_duplicate + vec_concat simplification
  2017-06-06  8:35 [PATCH] vec_merge + vec_duplicate + vec_concat simplification Kyrill Tkachov
  2017-06-06 21:40 ` Marc Glisse
  2017-06-27 22:28 ` Jeff Law
@ 2017-07-24 10:46 ` James Greenhalgh
  2 siblings, 0 replies; 6+ messages in thread
From: James Greenhalgh @ 2017-07-24 10:46 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, nd

On Tue, Jun 06, 2017 at 09:35:10AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> Another vec_merge simplification that's missing is transforming:
> (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
> into
> (vec_concat x z) if N == 1 (0b01) or
> (vec_concat y x) if N == 2 (0b10)
> 
> For the testcase in this patch on aarch64 this allows us to try matching during combine the pattern:
> (set (reg:V2DI 78 [ x ])
>     (vec_concat:V2DI
>         (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
>         (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>                 (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64])))
> 
> rather than the more complex:
> (set (reg:V2DI 78 [ x ])
>     (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
>                     (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64]))
>         (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64]))
>         (const_int 2 [0x2])))
> 
> We don't actually have an aarch64 pattern for the simplified version above, but it's a simple enough
> form to add, so this patch adds such a pattern that performs a concatenated load of two 64-bit vectors
> in adjacent memory locations as a single Q-register LDR. The new aarch64 pattern is needed to demonstrate
> the effectiveness of the simplify-rtx change, so I've kept them together as one patch.
> 
> Now for the testcase in the patch we can generate:
> construct_lanedi:
>         ldr     q0, [x0]
>         ret
> 
> construct_lanedf:
>         ldr     q0, [x0]
>         ret
> 
> instead of:
> construct_lanedi:
>         ld1r    {v0.2d}, [x0]
>         ldr     x0, [x0, 8]
>         ins     v0.d[1], x0
>         ret
> 
> construct_lanedf:
>         ld1r    {v0.2d}, [x0]
>         ldr     d1, [x0, 8]
>         ins     v0.d[1], v1.d[0]
>         ret
> 
> The new memory constraint Utq is needed because we need to allow only the Q-register addressing modes but
> the MEM expressions in the RTL pattern have 64-bit vector modes, and if we don't constrain them they will
> allow the D-register addressing modes during register allocation/address mode selection, which will produce
> invalid assembly.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?

I'd appreciate a more thorough set of tests, looking at some of the vector
mode combines that you now permit. I'm (only a little) nervous that you only
test here for the DI/DFmode cases and that there is no testing for
V2SI etc., nor tests for strict align, nor tests for when the addesses
must block the transformation.

The patch itself looks OK, but it could do with better tests.

Thanks,
James

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-07-24 10:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-06  8:35 [PATCH] vec_merge + vec_duplicate + vec_concat simplification Kyrill Tkachov
2017-06-06 21:40 ` Marc Glisse
2017-06-27 22:28 ` Jeff Law
2017-07-05 15:15   ` Kyrill Tkachov
2017-07-18  8:54     ` Kyrill Tkachov
2017-07-24 10:46 ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).