public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/vendors/ARM/heads/morello)] aarch64: Rework LDP/STP handling
@ 2022-05-05 12:06 Matthew Malcomson
  0 siblings, 0 replies; only message in thread
From: Matthew Malcomson @ 2022-05-05 12:06 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:9f2ddb3dbcae4bfca426e208a5f05566764e75e3

commit 9f2ddb3dbcae4bfca426e208a5f05566764e75e3
Author: Richard Sandiford <richard.sandiford@arm.com>
Date:   Thu Apr 7 17:37:26 2022 +0100

    aarch64: Rework LDP/STP handling
    
    There are two main aims here:
    
    - Reduce the abstraction penalty of using CADI for -mfake-capability
    - Allow CADI LDP/STP pairs to be formed automatically (by a later patch)
    
    Currently we have the following sets of LDP/STP define_insns:
    
    (1) SX (SI and SF)
    (2) DXC (DI, DF and CADI)
    (3) DREG (64-bit vector modes + DF, so partially overlapping (2))
    (4) TF
    (5) VQ (128-bit vector modes)
    
    (1) isn't a problem: SI and SF are the only two 32-bit modes.
    
    Except for the DF+DF overlap, (2) and (3) partition the 64-bit
    modes into two.  This is artificially restrictive though, since
    all 64-bit modes can be held in both GPRs and FPRs.  It would be
    better to have a single pattern for all 64-bit modes.
    
    Similarly, (4) and (5) artificially partition the 128-bit modes.
    They also miss out on TI.
    
    DXC is an interesting special case in that CADI is only really
    a “D” mode for -mfake-capability.  I agree bundling them together
    was the right call though, since for -mfake-capability we want CADI
    to support all the pair combinations that DImode.  But for real
    capabilities, we could end up trying to pair a 128-bit CADI
    with a 64-bit mode, so the insn condition needs to check that
    the two modes have the same size.
    
    Finally, the peephole2s only looked for certain combinations
    of modes.  It isn't really necessary to give explicit modes
    in peephole2s, since they're matching existing instructions
    that are already known to be valid.  We can reduce the number
    of patterns (and increase the generality) by doing the checks
    in the C++ code instead.

Diff:
---
 gcc/config/aarch64/aarch64-ldpstp.md               | 126 ++-------------------
 gcc/config/aarch64/aarch64-protos.h                |   2 +-
 gcc/config/aarch64/aarch64-simd.md                 |  56 ---------
 gcc/config/aarch64/aarch64.c                       |  80 +++++++------
 gcc/config/aarch64/aarch64.md                      |  82 ++++++++------
 gcc/config/aarch64/iterators.md                    |  41 +++----
 .../gcc.target/aarch64/ldp_stp_combos_1.c          |  61 ++++++++++
 .../gcc.target/aarch64/ldp_stp_combos_2.c          |  76 +++++++++++++
 .../gcc.target/aarch64/ldp_stp_combos_3.c          |  51 +++++++++
 .../gcc.target/aarch64/ldp_stp_combos_4.c          |  55 +++++++++
 .../aarch64/morello/normal-base-pair-2.c           |   2 +-
 11 files changed, 363 insertions(+), 269 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md
index 02d7a5bd171..e8b39ced3fd 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -19,35 +19,11 @@
 ;; <http://www.gnu.org/licenses/>.
 
 (define_peephole2
-  [(set (match_operand:GPI 0 "register_operand" "")
-	(match_operand:GPI 1 "memory_operand" ""))
-   (set (match_operand:GPI 2 "register_operand" "")
-	(match_operand:GPI 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, <MODE>mode)"
-  [(parallel [(set (match_dup 0) (match_dup 1))
-	      (set (match_dup 2) (match_dup 3))])]
-{
-  aarch64_swap_ldrstr_operands (operands, true);
-})
-
-(define_peephole2
-  [(set (match_operand:GPI 0 "memory_operand" "")
-	(match_operand:GPI 1 "aarch64_reg_or_zero" ""))
-   (set (match_operand:GPI 2 "memory_operand" "")
-	(match_operand:GPI 3 "aarch64_reg_or_zero" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, <MODE>mode)"
-  [(parallel [(set (match_dup 0) (match_dup 1))
-	      (set (match_dup 2) (match_dup 3))])]
-{
-  aarch64_swap_ldrstr_operands (operands, false);
-})
-
-(define_peephole2
-  [(set (match_operand:GPF 0 "register_operand" "")
-	(match_operand:GPF 1 "memory_operand" ""))
-   (set (match_operand:GPF 2 "register_operand" "")
-	(match_operand:GPF 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, <MODE>mode)"
+  [(set (match_operand 0 "register_operand" "")
+	(match_operand 1 "memory_operand" ""))
+   (set (match_operand 2 "register_operand" "")
+	(match_operand 3 "memory_operand" ""))]
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(parallel [(set (match_dup 0) (match_dup 1))
 	      (set (match_dup 2) (match_dup 3))])]
 {
@@ -55,73 +31,17 @@
 })
 
 (define_peephole2
-  [(set (match_operand:GPF 0 "memory_operand" "")
-	(match_operand:GPF 1 "aarch64_reg_or_fp_zero" ""))
-   (set (match_operand:GPF 2 "memory_operand" "")
-	(match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, <MODE>mode)"
+  [(set (match_operand 0 "memory_operand" "")
+	(match_operand 1 "aarch64_simd_reg_or_zero" ""))
+   (set (match_operand 2 "memory_operand" "")
+	(match_operand 3 "aarch64_simd_reg_or_zero" ""))]
+  "aarch64_operands_ok_for_ldpstp (operands, false)"
   [(parallel [(set (match_dup 0) (match_dup 1))
 	      (set (match_dup 2) (match_dup 3))])]
 {
   aarch64_swap_ldrstr_operands (operands, false);
 })
 
-(define_peephole2
-  [(set (match_operand:DREG 0 "register_operand" "")
-	(match_operand:DREG 1 "memory_operand" ""))
-   (set (match_operand:DREG2 2 "register_operand" "")
-	(match_operand:DREG2 3 "memory_operand" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, <DREG:MODE>mode)"
-  [(parallel [(set (match_dup 0) (match_dup 1))
-	      (set (match_dup 2) (match_dup 3))])]
-{
-  aarch64_swap_ldrstr_operands (operands, true);
-})
-
-(define_peephole2
-  [(set (match_operand:DREG 0 "memory_operand" "")
-	(match_operand:DREG 1 "register_operand" ""))
-   (set (match_operand:DREG2 2 "memory_operand" "")
-	(match_operand:DREG2 3 "register_operand" ""))]
-  "TARGET_SIMD
-   && aarch64_operands_ok_for_ldpstp (operands, false, <DREG:MODE>mode)"
-  [(parallel [(set (match_dup 0) (match_dup 1))
-	      (set (match_dup 2) (match_dup 3))])]
-{
-  aarch64_swap_ldrstr_operands (operands, false);
-})
-
-(define_peephole2
-  [(set (match_operand:VQ 0 "register_operand" "")
-	(match_operand:VQ 1 "memory_operand" ""))
-   (set (match_operand:VQ2 2 "register_operand" "")
-	(match_operand:VQ2 3 "memory_operand" ""))]
-  "TARGET_SIMD
-   && aarch64_operands_ok_for_ldpstp (operands, true, <VQ:MODE>mode)
-   && (aarch64_tune_params.extra_tuning_flags
-	& AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
-  [(parallel [(set (match_dup 0) (match_dup 1))
-	      (set (match_dup 2) (match_dup 3))])]
-{
-  aarch64_swap_ldrstr_operands (operands, true);
-})
-
-(define_peephole2
-  [(set (match_operand:VQ 0 "memory_operand" "")
-	(match_operand:VQ 1 "register_operand" ""))
-   (set (match_operand:VQ2 2 "memory_operand" "")
-	(match_operand:VQ2 3 "register_operand" ""))]
-  "TARGET_SIMD
-   && aarch64_operands_ok_for_ldpstp (operands, false, <VQ:MODE>mode)
-   && (aarch64_tune_params.extra_tuning_flags
-	& AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
-  [(parallel [(set (match_dup 0) (match_dup 1))
-	      (set (match_dup 2) (match_dup 3))])]
-{
-  aarch64_swap_ldrstr_operands (operands, false);
-})
-
-
 ;; Handle sign/zero extended consecutive load/store.
 
 (define_peephole2
@@ -129,7 +49,7 @@
 	(sign_extend:DI (match_operand:SI 1 "memory_operand" "")))
    (set (match_operand:DI 2 "register_operand" "")
 	(sign_extend:DI (match_operand:SI 3 "memory_operand" "")))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(parallel [(set (match_dup 0) (sign_extend:DI (match_dup 1)))
 	      (set (match_dup 2) (sign_extend:DI (match_dup 3)))])]
 {
@@ -141,35 +61,13 @@
 	(zero_extend:DI (match_operand:SI 1 "memory_operand" "")))
    (set (match_operand:DI 2 "register_operand" "")
 	(zero_extend:DI (match_operand:SI 3 "memory_operand" "")))]
-  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
+  "aarch64_operands_ok_for_ldpstp (operands, true)"
   [(parallel [(set (match_dup 0) (zero_extend:DI (match_dup 1)))
 	      (set (match_dup 2) (zero_extend:DI (match_dup 3)))])]
 {
   aarch64_swap_ldrstr_operands (operands, true);
 })
 
-;; Handle storing of a floating point zero with integer data.
-;; This handles cases like:
-;;   struct pair { int a; float b; }
-;;
-;;   p->a = 1;
-;;   p->b = 0.0;
-;;
-;; We can match modes that won't work for a stp instruction
-;; as aarch64_operands_ok_for_ldpstp checks that the modes are
-;; compatible.
-(define_peephole2
-  [(set (match_operand:DSX 0 "memory_operand" "")
-	(match_operand:DSX 1 "aarch64_reg_zero_or_fp_zero" ""))
-   (set (match_operand:<FCVT_TARGET> 2 "memory_operand" "")
-	(match_operand:<FCVT_TARGET> 3 "aarch64_reg_zero_or_fp_zero" ""))]
-  "aarch64_operands_ok_for_ldpstp (operands, false, <V_INT_EQUIV>mode)"
-  [(parallel [(set (match_dup 0) (match_dup 1))
-	      (set (match_dup 2) (match_dup 3))])]
-{
-  aarch64_swap_ldrstr_operands (operands, false);
-})
-
 ;; Handle consecutive load/store whose offset is out of the range
 ;; supported by ldp/ldpsw/stp.  We firstly adjust offset in a scratch
 ;; register, then merge them into ldp/ldpsw/stp by using the adjusted
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 3b4a3f2c165..01dca749d07 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -737,7 +737,7 @@ void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
 int aarch64_ccmp_mode_to_code (machine_mode mode);
 
 bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset);
-bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode);
+bool aarch64_operands_ok_for_ldpstp (rtx *, bool);
 bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, machine_mode);
 void aarch64_swap_ldrstr_operands (rtx *, bool);
 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index cb1077f9259..37adee676fd 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -179,62 +179,6 @@
   [(set_attr "type" "neon_store1_1reg<q>")]
 )
 
-(define_insn "load_pair<DREG:mode><DREG2:mode>"
-  [(set (match_operand:DREG 0 "register_operand" "=w")
-	(match_operand:DREG 1 "aarch64_mem_pair_operand" "Ump"))
-   (set (match_operand:DREG2 2 "register_operand" "=w")
-	(match_operand:DREG2 3 "memory_operand" "m"))]
-  "TARGET_SIMD
-   && rtx_equal_p (XEXP (operands[3], 0),
-		   plus_constant (mem_address_mode (operands[1]),
-				  XEXP (operands[1], 0),
-				  GET_MODE_SIZE (<DREG:MODE>mode)))"
-  "ldp\\t%d0, %d2, %z1"
-  [(set_attr "type" "neon_ldp")]
-)
-
-(define_insn "vec_store_pair<DREG:mode><DREG2:mode>"
-  [(set (match_operand:DREG 0 "aarch64_mem_pair_operand" "=Ump")
-	(match_operand:DREG 1 "register_operand" "w"))
-   (set (match_operand:DREG2 2 "memory_operand" "=m")
-	(match_operand:DREG2 3 "register_operand" "w"))]
-  "TARGET_SIMD
-   && rtx_equal_p (XEXP (operands[2], 0),
-		   plus_constant (mem_address_mode (operands[0]),
-				  XEXP (operands[0], 0),
-				  GET_MODE_SIZE (<DREG:MODE>mode)))"
-  "stp\\t%d1, %d3, %z0"
-  [(set_attr "type" "neon_stp")]
-)
-
-(define_insn "load_pair<VQ:mode><VQ2:mode>"
-  [(set (match_operand:VQ 0 "register_operand" "=w")
-	(match_operand:VQ 1 "aarch64_mem_pair_operand" "Ump"))
-   (set (match_operand:VQ2 2 "register_operand" "=w")
-	(match_operand:VQ2 3 "memory_operand" "m"))]
-  "TARGET_SIMD
-    && rtx_equal_p (XEXP (operands[3], 0),
-		    plus_constant (mem_address_mode (operands[1]),
-				   XEXP (operands[1], 0),
-				   GET_MODE_SIZE (<VQ:MODE>mode)))"
-  "ldp\\t%q0, %q2, %z1"
-  [(set_attr "type" "neon_ldp_q")]
-)
-
-(define_insn "vec_store_pair<VQ:mode><VQ2:mode>"
-  [(set (match_operand:VQ 0 "aarch64_mem_pair_operand" "=Ump")
-	(match_operand:VQ 1 "register_operand" "w"))
-   (set (match_operand:VQ2 2 "memory_operand" "=m")
-	(match_operand:VQ2 3 "register_operand" "w"))]
-  "TARGET_SIMD && rtx_equal_p (XEXP (operands[2], 0),
-		plus_constant (mem_address_mode (operands[0]),
-			       XEXP (operands[0], 0),
-			       GET_MODE_SIZE (<VQ:MODE>mode)))"
-  "stp\\t%q1, %q3, %z0"
-  [(set_attr "type" "neon_stp_q")]
-)
-
-
 (define_split
   [(set (match_operand:VQMOV 0 "register_operand" "")
       (match_operand:VQMOV 1 "register_operand" ""))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 99a6e4169e1..f3ec3a04930 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7588,12 +7588,7 @@ static rtx
 aarch64_gen_store_pair (machine_mode mode, rtx mem1, rtx reg1, rtx mem2,
 			rtx reg2)
 {
-  if (mode == E_V4SImode)
-    return gen_vec_store_pairv4siv4si (mem1, reg1, mem2, reg2);
-  else if (mode == E_TFmode)
-    return gen_store_pair_dw_tftf (mem1, reg1, mem2, reg2);
-  else
-    return gen_store_pair_dw (mode, mode, mem1, reg1, mem2, reg2);
+  return gen_store_pair (mode, mode, mem1, reg1, mem2, reg2);
 }
 
 /* Generate and regurn a load pair isntruction of mode MODE to load register
@@ -7603,12 +7598,7 @@ static rtx
 aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2,
 		       rtx mem2)
 {
-  if (mode == E_V4SImode)
-    return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
-  else if (mode == E_TFmode)
-    return gen_load_pair_dw_tftf (reg1, mem1, reg2, mem2);
-  else
-    return gen_load_pair_dw (mode, mode, reg1, mem1, reg2, mem2);
+  return gen_load_pair (mode, mode, reg1, mem1, reg2, mem2);
 }
 
 /* Return TRUE if return address signing should be enabled for the current
@@ -23452,12 +23442,10 @@ aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
 }
 
 /* Given OPERANDS of consecutive load/store, check if we can merge
-   them into ldp/stp.  LOAD is true if they are load instructions.
-   MODE is the mode of memory operands.  */
+   them into ldp/stp.  LOAD is true if they are load instructions.  */
 
 bool
-aarch64_operands_ok_for_ldpstp (rtx *operands, bool load,
-				machine_mode mode)
+aarch64_operands_ok_for_ldpstp (rtx *operands, bool load)
 {
   HOST_WIDE_INT offval_1, offval_2, msize;
   enum reg_class rclass_1, rclass_2;
@@ -23481,6 +23469,45 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load,
       reg_2 = operands[3];
     }
 
+  /* The two accesses must be the same size.  */
+  machine_mode mode = GET_MODE (mem_1);
+  if (maybe_ne (GET_MODE_SIZE (mode), GET_MODE_SIZE (GET_MODE (mem_2))))
+    return false;
+
+  /* Check for valid LDP/STP register sizes.  */
+  if (!GET_MODE_SIZE (mode).is_constant (&msize)
+      || !(msize == 4 || msize == 8 || msize == 16))
+    return false;
+
+  /* Check if the registers are of same class.  */
+  if (REG_P (reg_1) && FP_REGNUM_P (REGNO (reg_1)))
+    rclass_1 = FP_REGS;
+  else
+    rclass_1 = GENERAL_REGS;
+
+  if (REG_P (reg_2) && FP_REGNUM_P (REGNO (reg_2)))
+    rclass_2 = FP_REGS;
+  else
+    rclass_2 = GENERAL_REGS;
+
+  if (rclass_1 != rclass_2)
+    return false;
+
+  if (msize == 16)
+    {
+      /* Vector LDPs and STPs must use floating-point registers.  */
+      if (rclass_1 != FP_REGS)
+	return false;
+
+      /* Respect the -mtune preference about whether to form vector
+	 LDPs and STPs.
+
+	 ??? Traditionally we've done this even when optimizing for size.  */
+      if (aarch64_tune_params.extra_tuning_flags
+	  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
+	return false;
+    }
+
   /* The mems cannot be volatile.  */
   if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
     return false;
@@ -23506,15 +23533,8 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load,
   if (!rtx_equal_p (base_1, base_2))
     return false;
 
-  /* The operands must be of the same size.  */
-  gcc_assert (known_eq (GET_MODE_SIZE (GET_MODE (mem_1)),
-			 GET_MODE_SIZE (GET_MODE (mem_2))));
-
   offval_1 = INTVAL (offset_1);
   offval_2 = INTVAL (offset_2);
-  /* We should only be trying this for fixed-sized modes.  There is no
-     SVE LDP/STP instruction.  */
-  msize = GET_MODE_SIZE (mode).to_constant ();
   /* Check if the offsets are consecutive.  */
   if (offval_1 != (offval_2 + msize) && offval_2 != (offval_1 + msize))
     return false;
@@ -23537,20 +23557,6 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load,
        && !aarch64_mem_pair_operand (mem_2, GET_MODE (mem_2)))
     return false;
 
-  if (REG_P (reg_1) && FP_REGNUM_P (REGNO (reg_1)))
-    rclass_1 = FP_REGS;
-  else
-    rclass_1 = GENERAL_REGS;
-
-  if (REG_P (reg_2) && FP_REGNUM_P (REGNO (reg_2)))
-    rclass_2 = FP_REGS;
-  else
-    rclass_2 = GENERAL_REGS;
-
-  /* Check if the registers are of same class.  */
-  if (rclass_1 != rclass_2)
-    return false;
-
   return true;
 }
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bf16e7c7256..9f17b48487e 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1596,7 +1596,7 @@
 
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
-(define_insn "load_pair_sw_<SX:mode><SX2:mode>"
+(define_insn "@load_pair_<SX:mode><SX2:mode>"
   [(set (match_operand:SX 0 "register_operand" "=r,w")
 	(match_operand:SX 1 "aarch64_mem_pair_operand" "Ump,Ump"))
    (set (match_operand:SX2 2 "register_operand" "=r,w")
@@ -1612,16 +1612,22 @@
    (set_attr "arch" "*,fp")]
 )
 
-;; Storing different modes that can still be merged
-(define_insn "@load_pair_dw_<DXC:mode><DXC2:mode>"
-  [(set (match_operand:DXC 0 "register_operand" "=r,w")
-	(match_operand:DXC 1 "aarch64_mem_pair_operand" "Ump,Ump"))
-   (set (match_operand:DXC2 2 "register_operand" "=r,w")
-	(match_operand:DXC2 3 "memory_operand" "m,m"))]
-   "rtx_equal_p (XEXP (operands[3], 0),
-		 plus_constant (mem_address_mode (operands[1]),
-				XEXP (operands[1], 0),
-				GET_MODE_SIZE (<DXC:MODE>mode)))"
+;; This pattern handles all 64-bit modes + CADI, which is a 64-bit mode
+;; for -mfake-capability and a 128-bit mode for "real" capabilities.
+;; A 64-bit CADI can be paired with other 64-bit modes whereas a 128-bit
+;; CADI can only be paired with another CADI.  We therefore need to check
+;; that the sizes of the modes are equal.
+(define_insn "@load_pair_<ANY_DC:mode><ANY_DC2:mode>"
+  [(set (match_operand:ANY_DC 0 "register_operand" "=r,w")
+	(match_operand:ANY_DC 1 "aarch64_mem_pair_operand" "Ump,Ump"))
+   (set (match_operand:ANY_DC2 2 "register_operand" "=r,w")
+	(match_operand:ANY_DC2 3 "memory_operand" "m,m"))]
+  "known_eq (GET_MODE_SIZE (<ANY_DC:MODE>mode),
+	     GET_MODE_SIZE (<ANY_DC2:MODE>mode))
+   && rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (mem_address_mode (operands[1]),
+				  XEXP (operands[1], 0),
+				  GET_MODE_SIZE (<ANY_DC:MODE>mode)))"
   "@
    ldp\\t%0, %2, %z1
    ldp\\t%d0, %d2, %z1"
@@ -1629,16 +1635,16 @@
    (set_attr "arch" "*,fp")]
 )
 
-(define_insn "load_pair_dw_tftf"
-  [(set (match_operand:TF 0 "register_operand" "=w")
-	(match_operand:TF 1 "aarch64_mem_pair_operand" "Ump"))
-   (set (match_operand:TF 2 "register_operand" "=w")
-	(match_operand:TF 3 "memory_operand" "m"))]
+(define_insn "@load_pair_<ANY_Q:mode><ANY_Q2:mode>"
+  [(set (match_operand:ANY_Q 0 "register_operand" "=w")
+	(match_operand:ANY_Q 1 "aarch64_mem_pair_operand" "Ump"))
+   (set (match_operand:ANY_Q2 2 "register_operand" "=w")
+	(match_operand:ANY_Q2 3 "memory_operand" "m"))]
    "TARGET_SIMD
     && rtx_equal_p (XEXP (operands[3], 0),
 		    plus_constant (mem_address_mode (operands[1]),
 				   XEXP (operands[1], 0),
-				   GET_MODE_SIZE (TFmode)))"
+				   GET_MODE_SIZE (<ANY_Q:MODE>mode)))"
   "ldp\\t%q0, %q2, %z1"
   [(set_attr "type" "neon_ldp_q")
    (set_attr "fp" "yes")]
@@ -1646,7 +1652,7 @@
 
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
-(define_insn "store_pair_sw_<SX:mode><SX2:mode>"
+(define_insn "@store_pair_<SX:mode><SX2:mode>"
   [(set (match_operand:SX 0 "aarch64_mem_pair_operand" "=Ump,Ump")
 	(match_operand:SX 1 "aarch64_reg_zero_or_fp_zero" "rYZ,w"))
    (set (match_operand:SX2 2 "memory_operand" "=m,m")
@@ -1662,33 +1668,39 @@
    (set_attr "arch" "*,fp")]
 )
 
-;; Storing different modes that can still be merged
-(define_insn "@store_pair_dw_<DXC:mode><DXC2:mode>"
-  [(set (match_operand:DXC 0 "aarch64_mem_pair_operand" "=Ump,Ump")
-	(match_operand:DXC 1 "aarch64_reg_zero_or_fp_zero" "rYZ,w"))
-   (set (match_operand:DXC2 2 "memory_operand" "=m,m")
-	(match_operand:DXC2 3 "aarch64_reg_zero_or_fp_zero" "rYZ,w"))]
-   "rtx_equal_p (XEXP (operands[2], 0),
-		 plus_constant (mem_address_mode (operands[0]),
-				XEXP (operands[0], 0),
-				GET_MODE_SIZE (<DXC:MODE>mode)))"
+;; This pattern handles all 64-bit modes + CADI, which is a 64-bit mode
+;; for -mfake-capability and a 128-bit mode for "real" capabilities.
+;; A 64-bit CADI can be paired with other 64-bit modes whereas a 128-bit
+;; CADI can only be paired with another CADI.  We therefore need to check
+;; that the sizes of the modes are equal.
+(define_insn "@store_pair_<ANY_DC:mode><ANY_DC2:mode>"
+  [(set (match_operand:ANY_DC 0 "aarch64_mem_pair_operand" "=Ump,Ump")
+	(match_operand:ANY_DC 1 "aarch64_simd_reg_or_zero" "rYZ,w"))
+   (set (match_operand:ANY_DC2 2 "memory_operand" "=m,m")
+	(match_operand:ANY_DC2 3 "aarch64_simd_reg_or_zero" "rYZ,w"))]
+  "known_eq (GET_MODE_SIZE (<ANY_DC:MODE>mode),
+	     GET_MODE_SIZE (<ANY_DC2:MODE>mode))
+   && rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (mem_address_mode (operands[0]),
+				  XEXP (operands[0], 0),
+				  GET_MODE_SIZE (<ANY_DC:MODE>mode)))"
   "@
-   stp\\t%<DXC:dxc_gpr>1, %<DXC2:dxc_gpr>3, %z0
+   stp\\t%<ANY_DC:dxc_gpr>1, %<ANY_DC2:dxc_gpr>3, %z0
    stp\\t%d1, %d3, %z0"
   [(set_attr "type" "store_16,neon_store1_2reg")
    (set_attr "arch" "*,fp")]
 )
 
-(define_insn "store_pair_dw_tftf"
-  [(set (match_operand:TF 0 "aarch64_mem_pair_operand" "=Ump")
-	(match_operand:TF 1 "register_operand" "w"))
-   (set (match_operand:TF 2 "memory_operand" "=m")
-	(match_operand:TF 3 "register_operand" "w"))]
+(define_insn "@store_pair_<ANY_Q:mode><ANY_Q2:mode>"
+  [(set (match_operand:ANY_Q 0 "aarch64_mem_pair_operand" "=Ump")
+	(match_operand:ANY_Q 1 "register_operand" "w"))
+   (set (match_operand:ANY_Q2 2 "memory_operand" "=m")
+	(match_operand:ANY_Q2 3 "register_operand" "w"))]
    "TARGET_SIMD &&
     rtx_equal_p (XEXP (operands[2], 0),
 		 plus_constant (mem_address_mode (operands[0]),
 				XEXP (operands[0], 0),
-				GET_MODE_SIZE (TFmode)))"
+				GET_MODE_SIZE (<ANY_Q2:MODE>mode)))"
   "stp\\t%q1, %q3, %z0"
   [(set_attr "type" "neon_stp_q")
    (set_attr "fp" "yes")]
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 4d53603c05d..5e02464f5ca 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -101,12 +101,6 @@
 ;; Double vector modes suitable for moving.  Includes BFmode.
 (define_mode_iterator VDMOV [V8QI V4HI V4HF V4BF V2SI V2SF])
 
-;; All modes stored in registers d0-d31.
-(define_mode_iterator DREG [V8QI V4HI V4HF V2SI V2SF DF])
-
-;; Copy of the above.
-(define_mode_iterator DREG2 [V8QI V4HI V4HF V2SI V2SF DF])
-
 ;; All modes suitable to store/load pair (2 elements) using STP/LDP.
 (define_mode_iterator VP_2E [V2SI V2SF V2DI V2DF])
 
@@ -119,9 +113,6 @@
 ;; Quad vector modes.
 (define_mode_iterator VQ [V16QI V8HI V4SI V2DI V8HF V4SF V2DF V8BF])
 
-;; Copy of the above.
-(define_mode_iterator VQ2 [V16QI V8HI V4SI V2DI V8HF V8BF V4SF V2DF])
-
 ;; Quad vector modes suitable for moving.  Includes BFmode.
 (define_mode_iterator VQMOV [V16QI V8HI V4SI V2DI V8HF V8BF V4SF V2DF])
 
@@ -314,21 +305,23 @@
 ;; Duplicate of the above
 (define_mode_iterator DX2 [DI DF])
 
-;; Double scalar modes + CADImode
-(define_mode_iterator DXC [DI DF CADI])
-
-;; Duplicate of the above
-(define_mode_iterator DXC2 [DI DF CADI])
-
 ;; Single scalar modes
 (define_mode_iterator SX [SI SF])
 
 ;; Duplicate of the above
 (define_mode_iterator SX2 [SI SF])
 
-;; Single and double integer and float modes
-(define_mode_iterator DSX [DF DI SF SI])
+;; 64-bit modes + CADImode (which has 64 bits for -mfake-capability).
+(define_mode_iterator ANY_DC [V8QI V4HI V4HF V4BF V2SI V2SF DI DF V1DF CADI])
+
+;; Duplicate of the above.
+(define_mode_iterator ANY_DC2 [V8QI V4HI V4HF V4BF V2SI V2SF DI DF V1DF CADI])
+
+;; 128-bit modes (excluding CADImode).
+(define_mode_iterator ANY_Q [V16QI V8HI V8HF V8BF V4SI V4SF V2DI V2DF TI TF])
 
+;; Duplicate of the above.
+(define_mode_iterator ANY_Q2 [V16QI V8HI V8HF V8BF V4SI V4SF V2DI V2DF TI TF])
 
 ;; Modes available for Advanced SIMD <f>mul lane operations.
 (define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
@@ -921,14 +914,12 @@
 (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")
 		     (CADI "B")])
 
-; Similar to the 'w' attribute, but maps DF -> x.  The domain of this
-; attribute is the DXC[2] iterator.  It is intended to be used with the
-; store_pair_dw_<DXC:mode><DXC2:mode> pattern which implements an
-; optimization whereby (const_double:DF 0.0) is stored to two
-; consecutive doubles using:
-; stp xzr, xzr [addr].  Hence, the pattern accepts (const_double 0.0) in
-; the GPR alternative.
-(define_mode_attr dxc_gpr [(DI "x") (DF "x") (CADI "")])
+;; Used with ANY_DC[2] to print a GPR.  Only CADI is a special case.
+(define_mode_attr dxc_gpr [(V8QI "x")
+			   (V4HI "x") (V4HF "x") (V4BF "x")
+			   (V2SI "x") (V2SF "x")
+			   (DI "x") (DF "x") (V1DF "x")
+			   (CADI "B")])
 
 ;; The size of access, in bytes.
 ;; Morello TODO: this is right for fake capabilities but wrong for PureCap.
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_1.c
new file mode 100644
index 00000000000..324337ae790
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_1.c
@@ -0,0 +1,61 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mstrict-align -fdisable-rtl-postreload -save-temps" } */
+
+#include <stdint.h>
+
+#define TEST_PAIR(T1, T2)			\
+  void						\
+  load_gpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1 = *(T1 *) ptr;			\
+    T2 x2 = *(T2 *) (ptr + sizeof (T1));	\
+    asm volatile ("" :: "r" (x1), "r" (x2));	\
+  }						\
+						\
+  void						\
+  load_fpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1 = *(T1 *) ptr;			\
+    T2 x2 = *(T2 *) (ptr + sizeof (T1));	\
+    asm volatile ("" :: "w" (x1), "w" (x2));	\
+  }						\
+						\
+  void						\
+  store_gpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1;					\
+    T2 x2;					\
+    asm volatile ("" : "=r" (x1), "=r" (x2));	\
+    *(T1 *) ptr = x1;				\
+    *(T2 *) (ptr + sizeof (T1)) = x2;		\
+  }						\
+						\
+  void						\
+  store_fpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1;					\
+    T2 x2;					\
+    asm volatile ("" : "=w" (x1), "=w" (x2));	\
+    *(T1 *) ptr = x1;				\
+    *(T2 *) (ptr + sizeof (T1)) = x2;		\
+  }						\
+						\
+  void						\
+  store_zero_##T1##_##T2 (char *ptr)		\
+  {						\
+    *(T1 *) ptr = (T1) { 0 };			\
+    *(T2 *) (ptr + sizeof (T1)) = (T2) { 0 };	\
+  }
+
+#define TEST1(T1)				\
+  TEST_PAIR (T1, int32_t)			\
+  TEST_PAIR (T1, float)
+
+TEST1 (int32_t)
+TEST1 (float)
+
+/* { dg-final { scan-assembler-times {\tldp\tw[0-9]+,} 4 } } */
+/* { dg-final { scan-assembler-times {\tldp\ts[0-9]+,} 4 } } */
+/* { dg-final { scan-assembler-times {\tstp\tw[0-9]+,} 4 } } */
+/* { dg-final { scan-assembler-times {\tstp\ts[0-9]+,} 4 } } */
+/* { dg-final { scan-assembler-times {\tstp\twzr,} 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_2.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_2.c
new file mode 100644
index 00000000000..b10be14bebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_2.c
@@ -0,0 +1,76 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mstrict-align -save-temps" } */
+
+#include <stdint.h>
+#include <arm_neon.h>
+
+#define TEST_PAIR(T1, T2)			\
+  void						\
+  load_gpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1 = *(T1 *) ptr;			\
+    T2 x2 = *(T2 *) (ptr + sizeof (T1));	\
+    asm volatile ("" :: "r" (x1), "r" (x2));	\
+  }						\
+						\
+  void						\
+  load_fpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1 = *(T1 *) ptr;			\
+    T2 x2 = *(T2 *) (ptr + sizeof (T1));	\
+    asm volatile ("" :: "w" (x1), "w" (x2));	\
+  }						\
+						\
+  void						\
+  store_gpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1;					\
+    T2 x2;					\
+    asm volatile ("" : "=r" (x1), "=r" (x2));	\
+    *(T1 *) ptr = x1;				\
+    *(T2 *) (ptr + sizeof (T1)) = x2;		\
+  }						\
+						\
+  void						\
+  store_fpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1;					\
+    T2 x2;					\
+    asm volatile ("" : "=w" (x1), "=w" (x2));	\
+    *(T1 *) ptr = x1;				\
+    *(T2 *) (ptr + sizeof (T1)) = x2;		\
+  }						\
+						\
+  void						\
+  store_zero_##T1##_##T2 (char *ptr)		\
+  {						\
+    *(T1 *) ptr = (T1) { 0 };			\
+    *(T2 *) (ptr + sizeof (T1)) = (T2) { 0 };	\
+  }
+
+#define TEST1(T1)				\
+  TEST_PAIR (T1, int64_t)			\
+  TEST_PAIR (T1, double)			\
+  TEST_PAIR (T1, int8x8_t)			\
+  TEST_PAIR (T1, int16x4_t)			\
+  TEST_PAIR (T1, int32x2_t)			\
+  TEST_PAIR (T1, int64x1_t)			\
+  TEST_PAIR (T1, float16x4_t)			\
+  TEST_PAIR (T1, float32x2_t)			\
+  TEST_PAIR (T1, float64x1_t)
+
+TEST1 (int64_t)
+TEST1 (double)
+TEST1 (int8x8_t)
+TEST1 (int16x4_t)
+TEST1 (int32x2_t)
+TEST1 (int64x1_t)
+TEST1 (float16x4_t)
+TEST1 (float32x2_t)
+TEST1 (float64x1_t)
+
+/* { dg-final { scan-assembler-times {\tldp\tx[0-9]+,} 81 } } */
+/* { dg-final { scan-assembler-times {\tldp\td[0-9]+,} 81 } } */
+/* { dg-final { scan-assembler-times {\tstp\tx[0-9]+,} 81 } } */
+/* { dg-final { scan-assembler-times {\tstp\td[0-9]+,} 81 } } */
+/* { dg-final { scan-assembler-times {\tstp\txzr,} 81 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_3.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_3.c
new file mode 100644
index 00000000000..d93da9ad6f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_3.c
@@ -0,0 +1,51 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mstrict-align -save-temps" } */
+
+#include <stdint.h>
+#include <arm_neon.h>
+
+typedef __int128_t ti;
+typedef long double tf;
+
+#define TEST_PAIR(T1, T2)			\
+  void						\
+  load_fpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1 = *(T1 *) ptr;			\
+    T2 x2 = *(T2 *) (ptr + sizeof (T1));	\
+    asm volatile ("" :: "w" (x1), "w" (x2));	\
+  }						\
+						\
+  void						\
+  store_fpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1;					\
+    T2 x2;					\
+    asm volatile ("" : "=w" (x1), "=w" (x2));	\
+    *(T1 *) ptr = x1;				\
+    *(T2 *) (ptr + sizeof (T1)) = x2;		\
+  }
+
+#define TEST1(T1)				\
+  TEST_PAIR (T1, ti)				\
+  TEST_PAIR (T1, tf)				\
+  TEST_PAIR (T1, int8x16_t)			\
+  TEST_PAIR (T1, int16x8_t)			\
+  TEST_PAIR (T1, int32x4_t)			\
+  TEST_PAIR (T1, int64x2_t)			\
+  TEST_PAIR (T1, float16x8_t)			\
+  TEST_PAIR (T1, float32x4_t)			\
+  TEST_PAIR (T1, float64x2_t)
+
+TEST1 (ti)
+TEST1 (tf)
+TEST1 (int8x16_t)
+TEST1 (int16x8_t)
+TEST1 (int32x4_t)
+TEST1 (int64x2_t)
+TEST1 (float16x8_t)
+TEST1 (float32x4_t)
+TEST1 (float64x2_t)
+
+/* { dg-final { scan-assembler-times {\tldp\tq[0-9]+,} 81 } } */
+/* { dg-final { scan-assembler-times {\tstp\tq[0-9]+,} 81 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_4.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_4.c
new file mode 100644
index 00000000000..766985430a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_combos_4.c
@@ -0,0 +1,55 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -mstrict-align" } */
+
+#include <stdint.h>
+#include <arm_neon.h>
+
+typedef __int128_t ti;
+typedef long double tf;
+
+#define TEST_PAIR(T1, T2)			\
+  void						\
+  load_gpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1 = *(T1 *) ptr;			\
+    T2 x2 = *(T2 *) (ptr + sizeof (T1));	\
+    asm volatile ("" :: "r" (x1), "r" (x2));	\
+  }						\
+						\
+  void						\
+  store_gpr_##T1##_##T2 (char *ptr)		\
+  {						\
+    T1 x1;					\
+    T2 x2;					\
+    asm volatile ("" : "=r" (x1), "=r" (x2));	\
+    *(T1 *) ptr = x1;				\
+    *(T2 *) (ptr + sizeof (T1)) = x2;		\
+  }						\
+						\
+  void						\
+  store_zero_##T1##_##T2 (char *ptr)		\
+  {						\
+    *(T1 *) ptr = (T1) { 0 };			\
+    *(T2 *) (ptr + sizeof (T1)) = (T2) { 0 };	\
+  }
+
+#define TEST1(T1)				\
+  TEST_PAIR (T1, ti)				\
+  TEST_PAIR (T1, tf)				\
+  TEST_PAIR (T1, int8x16_t)			\
+  TEST_PAIR (T1, int16x8_t)			\
+  TEST_PAIR (T1, int32x4_t)			\
+  TEST_PAIR (T1, int64x2_t)			\
+  TEST_PAIR (T1, float16x8_t)			\
+  TEST_PAIR (T1, float32x4_t)			\
+  TEST_PAIR (T1, float64x2_t)
+
+TEST1 (ti)
+TEST1 (tf)
+TEST1 (int8x16_t)
+TEST1 (int16x8_t)
+TEST1 (int32x4_t)
+TEST1 (int64x2_t)
+TEST1 (float16x8_t)
+TEST1 (float32x4_t)
+TEST1 (float64x2_t)
diff --git a/gcc/testsuite/gcc.target/aarch64/morello/normal-base-pair-2.c b/gcc/testsuite/gcc.target/aarch64/morello/normal-base-pair-2.c
index 1bf9e852bc9..817fa5354f3 100644
--- a/gcc/testsuite/gcc.target/aarch64/morello/normal-base-pair-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/morello/normal-base-pair-2.c
@@ -26,4 +26,4 @@ TEST_TYPE (uint32x4_t)
 /* { dg-final { scan-assembler-times {\tstp\ts[0-9]+,} 1 } } */
 /* { dg-final { scan-assembler-times {\tstp\td[0-9]+,} 2 } } */
 /* { dg-final { scan-assembler-times {\tstp\tq[0-9]+,} 2 } } */
-/* { dg-final { scan-assembler-times {\tstp\t[wx]zr,} 4 } } */
+/* { dg-final { scan-assembler-times {\tstp\t[wx]zr,} 5 } } */


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-05-05 12:06 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-05 12:06 [gcc(refs/vendors/ARM/heads/morello)] aarch64: Rework LDP/STP handling Matthew Malcomson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).