public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Simplify vec_merge of vec_duplicate with const_vector
@ 2017-06-06  8:25 Kyrill Tkachov
  2017-06-27 22:29 ` Jeff Law
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Kyrill Tkachov @ 2017-06-06  8:25 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2525 bytes --]

Hi all,

I'm trying to improve some of the RTL-level handling of vector lane operations on aarch64 and that
involves dealing with a lot of vec_merge operations. One simplification that I noticed missing
from simplify-rtx are combinations of vec_merge with vec_duplicate.
In this particular case:
(vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))

which can be replaced with

(vec_concat (X) (B)) if N == 1 (0b01) or
(vec_concat (A) (X)) if N == 2 (0b10).

For the aarch64 testcase in this patch this simplifications allows us to try to combine:
(set (reg:V2DI 77 [ x ])
     (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
         (const_int 0 [0])))

instead of the more complex:
(set (reg:V2DI 77 [ x ])
     (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]))
         (const_vector:V2DI [
                 (const_int 0 [0])
                 (const_int 0 [0])
             ])
         (const_int 1 [0x1])))


For the simplified form above we already have an aarch64 pattern: *aarch64_combinez<mode> which
is missing a DI/DFmode version due to an oversight, so this patch extends that pattern as well to
use the VDC mode iterator that includes DI and DFmode (as well as V2HF which VD_BHSI was missing).
The aarch64 hunk is needed to see the benefit of the simplify-rtx.c hunk, so I didn't split them
into separate patches.

Before this for the testcase we'd generate:
construct_lanedi:
         movi    v0.4s, 0
         ldr     x0, [x0]
         ins     v0.d[0], x0
         ret

construct_lanedf:
         movi    v0.2d, 0
         ldr     d1, [x0]
         ins     v0.d[0], v1.d[0]
         ret

but now we can generate:
construct_lanedi:
         ldr     d0, [x0]
         ret

construct_lanedf:
         ldr     d0, [x0]
         ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
     Simplify vec_merge of vec_duplicate and const_vector.
     * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
     New predicate.
     * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC
     mode iterator.  Update predicate on operand 1 to
     handle non-const_vec constants.  Delete constraints.
     (*aarch64_combinez_be<mode>): Likewise for operand 2.

2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/construct_lane_zero_1.c: New test.

[-- Attachment #2: vec-merge-vec-dup-1.patch --]
[-- Type: text/x-patch, Size: 3952 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index d31857db385e1695d37906f8339620c9c878b50c..77a3a7d6534e5fd3575e33d5a7c607713abd614b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2808,9 +2808,9 @@ (define_insn "aarch64_get_lane<mode>"
 
 (define_insn "*aarch64_combinez<mode>"
   [(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
-        (vec_concat:<VDBL>
-	   (match_operand:VD_BHSI 1 "general_operand" "w,?r,m")
-	   (match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")))]
+	(vec_concat:<VDBL>
+	  (match_operand:VDC 1 "general_operand" "w,?r,m")
+	  (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")))]
   "TARGET_SIMD && !BYTES_BIG_ENDIAN"
   "@
    mov\\t%0.8b, %1.8b
@@ -2824,8 +2824,8 @@ (define_insn "*aarch64_combinez<mode>"
 (define_insn "*aarch64_combinez_be<mode>"
   [(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
         (vec_concat:<VDBL>
-	   (match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")
-	   (match_operand:VD_BHSI 1 "general_operand" "w,?r,m")))]
+	  (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")
+	  (match_operand:VDC 1 "general_operand" "w,?r,m")))]
   "TARGET_SIMD && BYTES_BIG_ENDIAN"
   "@
    mov\\t%0.8b, %1.8b
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 4bd8f45562c017bca736ae466ede6b9e4de0d17a..16e864765cde3bf8f54a11e5fc8db5a53606db30 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -348,6 +348,9 @@ (define_special_predicate "aarch64_simd_imm_zero"
   return aarch64_simd_imm_zero_p (op, mode);
 })
 
+(define_special_predicate "aarch64_simd_or_scalar_imm_zero"
+  (match_test "aarch64_simd_imm_zero_p (op, mode)"))
+
 (define_special_predicate "aarch64_simd_imm_minus_one"
   (match_code "const_vector")
 {
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 51ffcbc9b5fbf71298b186c43c4a50913cc5e792..42824b6c61af37f6b005de75bd1e5ebe7522bdba 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5685,6 +5685,22 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 		    return op1;
 		}
 	    }
+	  /* Replace (vec_merge (vec_duplicate (X)) (const_vector [A, B])
+	     (const_int N))
+	     with (vec_concat (X) (B)) if N == 1 or
+	     (vec_concat (A) (X)) if N == 2.  */
+	  if (GET_CODE (op0) == VEC_DUPLICATE
+	      && GET_CODE (op1) == CONST_VECTOR
+	      && CONST_VECTOR_NUNITS (op1) == 2
+	      && GET_MODE_NUNITS (GET_MODE (op0)) == 2
+	      && IN_RANGE (sel, 1, 2))
+	    {
+	      rtx newop0 = XEXP (op0, 0);
+	      rtx newop1 = CONST_VECTOR_ELT (op1, 2 - sel);
+	      if (sel == 2)
+		std::swap (newop0, newop1);
+	      return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1);
+	    }
 	}
 
       if (rtx_equal_p (op0, op1)
diff --git a/gcc/testsuite/gcc.target/aarch64/construct_lane_zero_1.c b/gcc/testsuite/gcc.target/aarch64/construct_lane_zero_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..d87f32908280cf7f6ad89d129f0005510ba7cced
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/construct_lane_zero_1.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef long long v2di __attribute__ ((vector_size (16)));
+typedef double v2df __attribute__ ((vector_size (16)));
+
+v2di
+construct_lanedi (long long *y)
+{
+  v2di x =
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  { 0, y[0] }
+#else
+  { y[0], 0 }
+#endif
+  ;
+  return x;
+}
+
+v2df
+construct_lanedf (double *y)
+{
+  v2df x =
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  { 0.0, y[0] }
+#else
+  { y[0], 0.0 }
+#endif
+  ;
+  return x;
+}
+
+/* Check that creating V2DI and V2DF vectors from a lane with a zero
+   makes use of the D-reg LDR rather than doing explicit lane inserts.  */
+
+/* { dg-final { scan-assembler-times "ldr\td\[0-9\]+" 2 } } */
+/* { dg-final { scan-assembler-not "ins\t" } } */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Simplify vec_merge of vec_duplicate with const_vector
  2017-06-06  8:25 [PATCH] Simplify vec_merge of vec_duplicate with const_vector Kyrill Tkachov
@ 2017-06-27 22:29 ` Jeff Law
  2017-07-05 15:15   ` Kyrill Tkachov
  2017-07-24 10:10 ` James Greenhalgh
  2018-03-27 14:37 ` H.J. Lu
  2 siblings, 1 reply; 6+ messages in thread
From: Jeff Law @ 2017-06-27 22:29 UTC (permalink / raw)
  To: Kyrill Tkachov, GCC Patches

On 06/06/2017 02:25 AM, Kyrill Tkachov wrote:
> Hi all,
> 
> I'm trying to improve some of the RTL-level handling of vector lane
> operations on aarch64 and that
> involves dealing with a lot of vec_merge operations. One simplification
> that I noticed missing
> from simplify-rtx are combinations of vec_merge with vec_duplicate.
> In this particular case:
> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))
> 
> which can be replaced with
> 
> (vec_concat (X) (B)) if N == 1 (0b01) or
> (vec_concat (A) (X)) if N == 2 (0b10).
> 
> For the aarch64 testcase in this patch this simplifications allows us to
> try to combine:
> (set (reg:V2DI 77 [ x ])
>     (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
>         (const_int 0 [0])))
> 
> instead of the more complex:
> (set (reg:V2DI 77 [ x ])
>     (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1
> *y_3(D)+0 S8 A64]))
>         (const_vector:V2DI [
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>             ])
>         (const_int 1 [0x1])))
> 
> 
> For the simplified form above we already have an aarch64 pattern:
> *aarch64_combinez<mode> which
> is missing a DI/DFmode version due to an oversight, so this patch
> extends that pattern as well to
> use the VDC mode iterator that includes DI and DFmode (as well as V2HF
> which VD_BHSI was missing).
> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c
> hunk, so I didn't split them
> into separate patches.
> 
> Before this for the testcase we'd generate:
> construct_lanedi:
>         movi    v0.4s, 0
>         ldr     x0, [x0]
>         ins     v0.d[0], x0
>         ret
> 
> construct_lanedf:
>         movi    v0.2d, 0
>         ldr     d1, [x0]
>         ins     v0.d[0], v1.d[0]
>         ret
> 
> but now we can generate:
> construct_lanedi:
>         ldr     d0, [x0]
>         ret
> 
> construct_lanedf:
>         ldr     d0, [x0]
>         ret
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
>     Simplify vec_merge of vec_duplicate and const_vector.
>     * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
>     New predicate.
>     * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC
>     mode iterator.  Update predicate on operand 1 to
>     handle non-const_vec constants.  Delete constraints.
>     (*aarch64_combinez_be<mode>): Likewise for operand 2.
> 
> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * gcc.target/aarch64/construct_lane_zero_1.c: New test.
OK for the simplify-rtx parts.

jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Simplify vec_merge of vec_duplicate with const_vector
  2017-06-27 22:29 ` Jeff Law
@ 2017-07-05 15:15   ` Kyrill Tkachov
  2017-07-18  8:54     ` Kyrill Tkachov
  0 siblings, 1 reply; 6+ messages in thread
From: Kyrill Tkachov @ 2017-07-05 15:15 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: James Greenhalgh, Richard Earnshaw, Marcus Shawcroft


On 27/06/17 23:29, Jeff Law wrote:
> On 06/06/2017 02:25 AM, Kyrill Tkachov wrote:
>> Hi all,
>>
>> I'm trying to improve some of the RTL-level handling of vector lane
>> operations on aarch64 and that
>> involves dealing with a lot of vec_merge operations. One simplification
>> that I noticed missing
>> from simplify-rtx are combinations of vec_merge with vec_duplicate.
>> In this particular case:
>> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))
>>
>> which can be replaced with
>>
>> (vec_concat (X) (B)) if N == 1 (0b01) or
>> (vec_concat (A) (X)) if N == 2 (0b10).
>>
>> For the aarch64 testcase in this patch this simplifications allows us to
>> try to combine:
>> (set (reg:V2DI 77 [ x ])
>>      (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
>>          (const_int 0 [0])))
>>
>> instead of the more complex:
>> (set (reg:V2DI 77 [ x ])
>>      (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1
>> *y_3(D)+0 S8 A64]))
>>          (const_vector:V2DI [
>>                  (const_int 0 [0])
>>                  (const_int 0 [0])
>>              ])
>>          (const_int 1 [0x1])))
>>
>>
>> For the simplified form above we already have an aarch64 pattern:
>> *aarch64_combinez<mode> which
>> is missing a DI/DFmode version due to an oversight, so this patch
>> extends that pattern as well to
>> use the VDC mode iterator that includes DI and DFmode (as well as V2HF
>> which VD_BHSI was missing).
>> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c
>> hunk, so I didn't split them
>> into separate patches.
>>
>> Before this for the testcase we'd generate:
>> construct_lanedi:
>>          movi    v0.4s, 0
>>          ldr     x0, [x0]
>>          ins     v0.d[0], x0
>>          ret
>>
>> construct_lanedf:
>>          movi    v0.2d, 0
>>          ldr     d1, [x0]
>>          ins     v0.d[0], v1.d[0]
>>          ret
>>
>> but now we can generate:
>> construct_lanedi:
>>          ldr     d0, [x0]
>>          ret
>>
>> construct_lanedf:
>>          ldr     d0, [x0]
>>          ret
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
>>      Simplify vec_merge of vec_duplicate and const_vector.
>>      * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
>>      New predicate.
>>      * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC
>>      mode iterator.  Update predicate on operand 1 to
>>      handle non-const_vec constants.  Delete constraints.
>>      (*aarch64_combinez_be<mode>): Likewise for operand 2.
>>
>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * gcc.target/aarch64/construct_lane_zero_1.c: New test.
> OK for the simplify-rtx parts.

Thanks Jeff.
Pinging the aarch64 parts at:
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00272.html

I've re-bootstrapped and re-tested the patches on top of current trunk.

Kyrill

> jeff
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Simplify vec_merge of vec_duplicate with const_vector
  2017-07-05 15:15   ` Kyrill Tkachov
@ 2017-07-18  8:54     ` Kyrill Tkachov
  0 siblings, 0 replies; 6+ messages in thread
From: Kyrill Tkachov @ 2017-07-18  8:54 UTC (permalink / raw)
  To: Jeff Law, GCC Patches
  Cc: James Greenhalgh, Richard Earnshaw, Marcus Shawcroft


On 05/07/17 16:14, Kyrill Tkachov wrote:
>
> On 27/06/17 23:29, Jeff Law wrote:
>> On 06/06/2017 02:25 AM, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> I'm trying to improve some of the RTL-level handling of vector lane
>>> operations on aarch64 and that
>>> involves dealing with a lot of vec_merge operations. One simplification
>>> that I noticed missing
>>> from simplify-rtx are combinations of vec_merge with vec_duplicate.
>>> In this particular case:
>>> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))
>>>
>>> which can be replaced with
>>>
>>> (vec_concat (X) (B)) if N == 1 (0b01) or
>>> (vec_concat (A) (X)) if N == 2 (0b10).
>>>
>>> For the aarch64 testcase in this patch this simplifications allows us to
>>> try to combine:
>>> (set (reg:V2DI 77 [ x ])
>>>      (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
>>>          (const_int 0 [0])))
>>>
>>> instead of the more complex:
>>> (set (reg:V2DI 77 [ x ])
>>>      (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1
>>> *y_3(D)+0 S8 A64]))
>>>          (const_vector:V2DI [
>>>                  (const_int 0 [0])
>>>                  (const_int 0 [0])
>>>              ])
>>>          (const_int 1 [0x1])))
>>>
>>>
>>> For the simplified form above we already have an aarch64 pattern:
>>> *aarch64_combinez<mode> which
>>> is missing a DI/DFmode version due to an oversight, so this patch
>>> extends that pattern as well to
>>> use the VDC mode iterator that includes DI and DFmode (as well as V2HF
>>> which VD_BHSI was missing).
>>> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c
>>> hunk, so I didn't split them
>>> into separate patches.
>>>
>>> Before this for the testcase we'd generate:
>>> construct_lanedi:
>>>          movi    v0.4s, 0
>>>          ldr     x0, [x0]
>>>          ins     v0.d[0], x0
>>>          ret
>>>
>>> construct_lanedf:
>>>          movi    v0.2d, 0
>>>          ldr     d1, [x0]
>>>          ins     v0.d[0], v1.d[0]
>>>          ret
>>>
>>> but now we can generate:
>>> construct_lanedi:
>>>          ldr     d0, [x0]
>>>          ret
>>>
>>> construct_lanedf:
>>>          ldr     d0, [x0]
>>>          ret
>>>
>>> Bootstrapped and tested on aarch64-none-linux-gnu.
>>>
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
>>>      Simplify vec_merge of vec_duplicate and const_vector.
>>>      * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
>>>      New predicate.
>>>      * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC
>>>      mode iterator.  Update predicate on operand 1 to
>>>      handle non-const_vec constants.  Delete constraints.
>>>      (*aarch64_combinez_be<mode>): Likewise for operand 2.
>>>
>>> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      * gcc.target/aarch64/construct_lane_zero_1.c: New test.
>> OK for the simplify-rtx parts.
>
> Thanks Jeff.
> Pinging the aarch64 parts at:
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00272.html

Ping.

Thanks,
Kyrill

> I've re-bootstrapped and re-tested the patches on top of current trunk.
>
> Kyrill
>
>> jeff
>>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Simplify vec_merge of vec_duplicate with const_vector
  2017-06-06  8:25 [PATCH] Simplify vec_merge of vec_duplicate with const_vector Kyrill Tkachov
  2017-06-27 22:29 ` Jeff Law
@ 2017-07-24 10:10 ` James Greenhalgh
  2018-03-27 14:37 ` H.J. Lu
  2 siblings, 0 replies; 6+ messages in thread
From: James Greenhalgh @ 2017-07-24 10:10 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches, nd

On Tue, Jun 06, 2017 at 09:25:51AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> I'm trying to improve some of the RTL-level handling of vector lane
> operations on aarch64 and that involves dealing with a lot of vec_merge
> operations. One simplification that I noticed missing from simplify-rtx are
> combinations of vec_merge with vec_duplicate.
> In this particular case:
> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))
> 
> which can be replaced with
> 
> (vec_concat (X) (B)) if N == 1 (0b01) or
> (vec_concat (A) (X)) if N == 2 (0b10).
> 
> For the aarch64 testcase in this patch this simplifications allows us to try to combine:
> (set (reg:V2DI 77 [ x ])
>     (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
>         (const_int 0 [0])))
> 
> instead of the more complex:
> (set (reg:V2DI 77 [ x ])
>     (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]))
>         (const_vector:V2DI [
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>             ])
>         (const_int 1 [0x1])))
> 
> 
> For the simplified form above we already have an aarch64 pattern: *aarch64_combinez<mode> which
> is missing a DI/DFmode version due to an oversight, so this patch extends that pattern as well to
> use the VDC mode iterator that includes DI and DFmode (as well as V2HF which VD_BHSI was missing).
> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c hunk, so I didn't split them
> into separate patches.
> 
> Before this for the testcase we'd generate:
> construct_lanedi:
>         movi    v0.4s, 0
>         ldr     x0, [x0]
>         ins     v0.d[0], x0
>         ret
> 
> construct_lanedf:
>         movi    v0.2d, 0
>         ldr     d1, [x0]
>         ins     v0.d[0], v1.d[0]
>         ret
> 
> but now we can generate:
> construct_lanedi:
>         ldr     d0, [x0]
>         ret
> 
> construct_lanedf:
>         ldr     d0, [x0]
>         ret
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

OK.

Thanks,
James

> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
>     Simplify vec_merge of vec_duplicate and const_vector.
>     * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
>     New predicate.
>     * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC
>     mode iterator.  Update predicate on operand 1 to
>     handle non-const_vec constants.  Delete constraints.
>     (*aarch64_combinez_be<mode>): Likewise for operand 2.
> 
> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * gcc.target/aarch64/construct_lane_zero_1.c: New test.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Simplify vec_merge of vec_duplicate with const_vector
  2017-06-06  8:25 [PATCH] Simplify vec_merge of vec_duplicate with const_vector Kyrill Tkachov
  2017-06-27 22:29 ` Jeff Law
  2017-07-24 10:10 ` James Greenhalgh
@ 2018-03-27 14:37 ` H.J. Lu
  2 siblings, 0 replies; 6+ messages in thread
From: H.J. Lu @ 2018-03-27 14:37 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: GCC Patches

On Tue, Jun 6, 2017 at 1:25 AM, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> I'm trying to improve some of the RTL-level handling of vector lane
> operations on aarch64 and that
> involves dealing with a lot of vec_merge operations. One simplification that
> I noticed missing
> from simplify-rtx are combinations of vec_merge with vec_duplicate.
> In this particular case:
> (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))
>
> which can be replaced with
>
> (vec_concat (X) (B)) if N == 1 (0b01) or
> (vec_concat (A) (X)) if N == 2 (0b10).
>
> For the aarch64 testcase in this patch this simplifications allows us to try
> to combine:
> (set (reg:V2DI 77 [ x ])
>     (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
>         (const_int 0 [0])))
>
> instead of the more complex:
> (set (reg:V2DI 77 [ x ])
>     (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1
> *y_3(D)+0 S8 A64]))
>         (const_vector:V2DI [
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>             ])
>         (const_int 1 [0x1])))
>
>
> For the simplified form above we already have an aarch64 pattern:
> *aarch64_combinez<mode> which
> is missing a DI/DFmode version due to an oversight, so this patch extends
> that pattern as well to
> use the VDC mode iterator that includes DI and DFmode (as well as V2HF which
> VD_BHSI was missing).
> The aarch64 hunk is needed to see the benefit of the simplify-rtx.c hunk, so
> I didn't split them
> into separate patches.
>
> Before this for the testcase we'd generate:
> construct_lanedi:
>         movi    v0.4s, 0
>         ldr     x0, [x0]
>         ins     v0.d[0], x0
>         ret
>
> construct_lanedf:
>         movi    v0.2d, 0
>         ldr     d1, [x0]
>         ins     v0.d[0], v1.d[0]
>         ret
>
> but now we can generate:
> construct_lanedi:
>         ldr     d0, [x0]
>         ret
>
> construct_lanedf:
>         ldr     d0, [x0]
>         ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
>     Simplify vec_merge of vec_duplicate and const_vector.
>     * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
>     New predicate.
>     * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC
>     mode iterator.  Update predicate on operand 1 to
>     handle non-const_vec constants.  Delete constraints.
>     (*aarch64_combinez_be<mode>): Likewise for operand 2.
>
> 2017-06-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/construct_lane_zero_1.c: New test.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85090

-- 
H.J.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-27 13:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-06  8:25 [PATCH] Simplify vec_merge of vec_duplicate with const_vector Kyrill Tkachov
2017-06-27 22:29 ` Jeff Law
2017-07-05 15:15   ` Kyrill Tkachov
2017-07-18  8:54     ` Kyrill Tkachov
2017-07-24 10:10 ` James Greenhalgh
2018-03-27 14:37 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).