[PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes
@ 2023-05-18 15:18 Kyrylo Tkachov
  2023-05-25 10:12 ` [ping] " Kyrylo Tkachov
  2023-05-25 16:02 ` Richard Sandiford
  0 siblings, 2 replies; 3+ messages in thread
From: Kyrylo Tkachov @ 2023-05-18 15:18 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4690 bytes --]

Hi all,

This patch expresses the intrinsics for the SRA and RSRA instructions with
standard RTL codes rather than relying on UNSPECs.
These instructions perform a vector shift right plus accumulate with an
optional rounding constant addition for the RSRA variant.
There are a number of interesting points:

* The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N
is left using the UNSPECs. Expressing it as a DImode plus+shift led to all
kinds of trouble as it started matching the existing define_insns for
"add x0, x0, asr #N" instructions and adding the SRA form as an extra
alternative required a significant amount of deduplication of iterators and
things still didn't work out well. I decided not to tackle that case in
this patch. It can be attempted later.

* For the RSRA variants that add a rounding constant (1 << (shift-1)) the
addition is notionally performed in a wider mode than the input types so that
overflow is handled properly. In RTL this can be represented with an appropriate
extend operation followed by a truncate back to the original modes.
However for 128-bit input modes such as V4SI we don't have appropriate modes
defined for this widening i.e. we'd need a V4DI mode to represent the
intermediate widened result.  This patch defines such modes for
V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have
more Advanced SIMD instruction that have similar intermediate widening
semantics.

* The above new modes led to a problem with stor-layout.cc. The new modes only
exist for the sake of the RTL optimisers understanding the semantics of the
instruction but are not indended to be moved to and from register or memory,
assigned to types, used as TYPE_MODE or participate in auto-vectorisation.
This is expressed in aarch64 by aarch64_classify_vector_mode returning zero
for these new modes. However, the code in stor-layout.cc:<mode_for_vector>
explicitly doesn't check this when picking a TYPE_MODE due to modes being made
potentially available later through target switching (PR38240).
This led to these modes being picked as TYPE_MODE for declarations such as:
typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit
fixed-length SVE modes are available and vector_type_mode later struggling
to rectify this.
This issue is addressed with the new target hook
TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a
vector mode can be used in any legal target attribute configuration of the
port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks
only the initial target configuration. This allows a simple adjustment in
stor-layout.cc that still disqualifies these limited modes early on while
allowing consideration of modes that can be turned on in the future with
target attributes.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for the non-aarch64 parts?

Thanks,
Kyrill

gcc/ChangeLog:

	* config/aarch64/aarch64-modes.def (V16HI, V8SI, V4DI, V2TI): New modes.
	* config/aarch64/aarch64-protos.h (aarch64_const_vec_rnd_cst_p):
	Declare prototype.
	(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
	* config/aarch64/aarch64-simd.md (*aarch64_simd_sra<mode>): Rename to...
	(aarch64_<sra_op>sra_n<mode>_insn): ... This.
	(aarch64_<sra_op>rsra_n<mode>_insn): New define_insn.
	(aarch64_<sra_op>sra_n<mode>): New define_expand.
	(aarch64_<sra_op>rsra_n<mode>): Likewise.
	(aarch64_<sur>sra_n<mode>): Rename to...
	(aarch64_<sur>sra_ndi): ... This.
	* config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Add
	any_target_p argument.
	(aarch64_extract_vec_duplicate_wide_int): Define.
	(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
	(aarch64_const_vec_rnd_cst_p): Likewise.
	(aarch64_vector_mode_supported_any_target_p): Likewise.
	(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
	* config/aarch64/iterators.md (UNSPEC_SRSRA, UNSPEC_URSRA): Delete.
	(VSRA): Adjust for the above.
	(sur): Likewise.
	(V2XWIDE): New mode_attr.
	(vec_or_offset): Likewise.
	(SHIFTEXTEND): Likewise.
	* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): New
	predicate.
	* doc/tm.texi (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
	clarify that it applies to current target options.
	(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Document.
	* doc/tm.texi.in: Regenerate.
	* stor-layout.cc (mode_for_vector): Check
	vector_mode_supported_any_target_p when iterating through vector modes.
	* target.def (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
	clarify that it applies to current target options.
	(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Define.

[-- Attachment #2: sra.patch --]
[-- Type: application/octet-stream, Size: 19822 bytes --]

diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index a6c79e154d076b6dad653dc2f0a124a55c796dac..6b4f4e17dd59c9b5424ec19f15061b0cc13c8633 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -75,6 +75,14 @@ VECTOR_MODE (INT, DI, 1);     /*                 V1DI.  */
 VECTOR_MODE (FLOAT, DF, 1);   /*                 V1DF.  */
 VECTOR_MODE (FLOAT, HF, 2);   /*                 V2HF.  */
 
+
+/* Integer vector modes used to represent intermediate widened values in some
+   instructions.  Not intended to be moved to and from registers or memory.  */
+VECTOR_MODE (INT, HI, 16);   /*                V16HI.  */
+VECTOR_MODE (INT, SI, 8);   /*                 V8SI.  */
+VECTOR_MODE (INT, DI, 4);   /*                 V4DI.  */
+VECTOR_MODE (INT, TI, 2);   /*                 V2TI.  */
+
 /* Oct Int: 256-bit integer mode needed for 32-byte vector arguments.  */
 INT_MODE (OI, 32);
 
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 2f055a26f927434e53dca3f03d257f257d406a3b..a0642df26dbc83737c5c765d123e39d08357234f 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -758,6 +758,8 @@ bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
 bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
 bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
 					    HOST_WIDE_INT);
+bool aarch64_const_vec_rnd_cst_p (rtx, rtx);
+bool aarch64_const_vec_rsra_rnd_imm_p (rtx);
 bool aarch64_constant_address_p (rtx);
 bool aarch64_emit_approx_div (rtx, rtx, rtx);
 bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c86b872cd4223d2cf7fec52b73c4d0bd450a4c5c..4aa2f324b795f66c6c56477fceba7147d51bf832 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1259,18 +1259,76 @@ (define_insn "aarch64_simd_ashr<mode>"
   [(set_attr "type" "neon_compare<q>,neon_shift_imm<q>")]
 )
 
-(define_insn "*aarch64_simd_sra<mode>"
+(define_insn "aarch64_<sra_op>sra_n<mode>_insn"
  [(set (match_operand:VDQ_I 0 "register_operand" "=w")
 	(plus:VDQ_I
 	   (SHIFTRT:VDQ_I
-		(match_operand:VDQ_I 1 "register_operand" "w")
-		(match_operand:VDQ_I 2 "aarch64_simd_rshift_imm" "Dr"))
-	   (match_operand:VDQ_I 3 "register_operand" "0")))]
+		(match_operand:VDQ_I 2 "register_operand" "w")
+		(match_operand:VDQ_I 3 "aarch64_simd_rshift_imm"))
+	   (match_operand:VDQ_I 1 "register_operand" "0")))]
   "TARGET_SIMD"
-  "<sra_op>sra\t%0.<Vtype>, %1.<Vtype>, %2"
+  "<sra_op>sra\t%<v>0<Vmtype>, %<v>2<Vmtype>, %3"
   [(set_attr "type" "neon_shift_acc<q>")]
 )
 
+(define_insn "aarch64_<sra_op>rsra_n<mode>_insn"
+ [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w")
+	(plus:VSDQ_I_DI
+	  (truncate:VSDQ_I_DI
+	    (SHIFTRT:<V2XWIDE>
+	      (plus:<V2XWIDE>
+		(<SHIFTEXTEND>:<V2XWIDE>
+		  (match_operand:VSDQ_I_DI 2 "register_operand" "w"))
+		(match_operand:<V2XWIDE> 4 "aarch64_simd_rsra_rnd_imm_vec"))
+	      (match_operand:VSDQ_I_DI 3 "aarch64_simd_shift_imm_<vec_or_offset>_<Vel>")))
+	  (match_operand:VSDQ_I_DI 1 "register_operand" "0")))]
+  "TARGET_SIMD
+   && aarch64_const_vec_rnd_cst_p (operands[4], operands[3])"
+  "<sra_op>rsra\\t%<v>0<Vmtype>, %<v>2<Vmtype>, %3"
+  [(set_attr "type" "neon_shift_acc<q>")]
+)
+
+(define_expand "aarch64_<sra_op>sra_n<mode>"
+ [(set (match_operand:VDQ_I 0 "register_operand")
+	(plus:VDQ_I
+	   (SHIFTRT:VDQ_I
+		(match_operand:VDQ_I 2 "register_operand")
+		(match_operand:SI 3 "aarch64_simd_shift_imm_offset_<ve_mode>"))
+	   (match_operand:VDQ_I 1 "register_operand")))]
+  "TARGET_SIMD"
+  {
+    operands[3]
+      = aarch64_simd_gen_const_vector_dup (<MODE>mode, UINTVAL (operands[3]));
+  }
+)
+
+(define_expand "aarch64_<sra_op>rsra_n<mode>"
+  [(match_operand:VSDQ_I_DI 0 "register_operand")
+   (match_operand:VSDQ_I_DI 1 "register_operand")
+   (SHIFTRT:VSDQ_I_DI
+     (match_operand:VSDQ_I_DI 2 "register_operand")
+     (match_operand:SI 3 "aarch64_simd_shift_imm_offset_<ve_mode>"))]
+  "TARGET_SIMD"
+  {
+    /* Use this expander to create the rounding constant vector, which is
+       1 << (shift - 1).  Use wide_int here to ensure that the right TImode
+       RTL is generated when handling the DImode expanders.  */
+    int prec = GET_MODE_UNIT_PRECISION (<V2XWIDE>mode);
+    wide_int rnd_wi = wi::set_bit_in_zero (INTVAL (operands[3]) - 1, prec);
+    rtx shft = gen_int_mode (INTVAL (operands[3]), DImode);
+    rtx rnd = immed_wide_int_const (rnd_wi, GET_MODE_INNER (<V2XWIDE>mode));
+    if (VECTOR_MODE_P (<MODE>mode))
+      {
+	shft = gen_const_vec_duplicate (<MODE>mode, shft);
+	rnd = gen_const_vec_duplicate (<V2XWIDE>mode, rnd);
+      }
+
+    emit_insn (gen_aarch64_<sra_op>rsra_n<mode>_insn (operands[0], operands[1],
+						      operands[2], shft, rnd));
+    DONE;
+  }
+)
+
 (define_insn "aarch64_simd_imm_shl<mode>"
  [(set (match_operand:VDQ_I 0 "register_operand" "=w")
        (ashift:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
@@ -6440,16 +6498,16 @@ (define_insn "aarch64_<sur>shr_n<mode>"
 
 ;; v(r)sra_n
 
-(define_insn "aarch64_<sur>sra_n<mode>"
-  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w")
-	(unspec:VSDQ_I_DI [(match_operand:VSDQ_I_DI 1 "register_operand" "0")
-		       (match_operand:VSDQ_I_DI 2 "register_operand" "w")
+(define_insn "aarch64_<sur>sra_ndi"
+  [(set (match_operand:DI 0 "register_operand" "=w")
+       (unspec:DI [(match_operand:DI 1 "register_operand" "0")
+                      (match_operand:DI 2 "register_operand" "w")
                        (match_operand:SI 3
-			 "aarch64_simd_shift_imm_offset_<ve_mode>" "i")]
+                        "aarch64_simd_shift_imm_offset_di" "i")]
                       VSRA))]
   "TARGET_SIMD"
-  "<sur>sra\\t%<v>0<Vmtype>, %<v>2<Vmtype>, %3"
-  [(set_attr "type" "neon_shift_acc<q>")]
+  "<sur>sra\\t%d0, %d2, %3"
+  [(set_attr "type" "neon_shift_acc")]
 )
 
 ;; vs<lr>i_n
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 29dbacfa9171dbfaf2a8d6395239934d75233ce4..6affd02db68686245dc1ee0d829af94549db67ed 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3399,9 +3399,12 @@ const unsigned int VEC_ANY_SVE  = VEC_SVE_DATA | VEC_SVE_PRED;
 const unsigned int VEC_ANY_DATA = VEC_ADVSIMD | VEC_SVE_DATA;
 
 /* Return a set of flags describing the vector properties of mode MODE.
-   Ignore modes that are not supported by the current target.  */
+   If ANY_TARGET_P is false (the default), ignore modes that are not supported
+   by the current target.  Otherwise categorize the modes that can be used
+   with the set of all targets supported by the port.  */
+
 static unsigned int
-aarch64_classify_vector_mode (machine_mode mode)
+aarch64_classify_vector_mode (machine_mode mode, bool any_target_p = false)
 {
   if (aarch64_sve_pred_mode_p (mode))
     return VEC_SVE_PRED;
@@ -3428,7 +3431,7 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_VNx4BFmode:
     /* Partial SVE SF vector.  */
     case E_VNx2SFmode:
-      return TARGET_SVE ? VEC_SVE_DATA | VEC_PARTIAL : 0;
+      return (TARGET_SVE || any_target_p) ? VEC_SVE_DATA | VEC_PARTIAL : 0;
 
     case E_VNx16QImode:
     case E_VNx8HImode:
@@ -3438,7 +3441,7 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_VNx8HFmode:
     case E_VNx4SFmode:
     case E_VNx2DFmode:
-      return TARGET_SVE ? VEC_SVE_DATA : 0;
+      return (TARGET_SVE || any_target_p) ? VEC_SVE_DATA : 0;
 
     /* x2 SVE vectors.  */
     case E_VNx32QImode:
@@ -3467,12 +3470,12 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_VNx32HFmode:
     case E_VNx16SFmode:
     case E_VNx8DFmode:
-      return TARGET_SVE ? VEC_SVE_DATA | VEC_STRUCT : 0;
+      return (TARGET_SVE || any_target_p) ? VEC_SVE_DATA | VEC_STRUCT : 0;
 
     case E_OImode:
     case E_CImode:
     case E_XImode:
-      return TARGET_FLOAT ? VEC_ADVSIMD | VEC_STRUCT : 0;
+      return (TARGET_FLOAT || any_target_p) ? VEC_ADVSIMD | VEC_STRUCT : 0;
 
     /* Structures of 64-bit Advanced SIMD vectors.  */
     case E_V2x8QImode:
@@ -3499,7 +3502,8 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_V4x4HFmode:
     case E_V4x2SFmode:
     case E_V4x1DFmode:
-      return TARGET_FLOAT ? VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL : 0;
+      return (TARGET_FLOAT || any_target_p)
+	      ? VEC_ADVSIMD | VEC_STRUCT | VEC_PARTIAL : 0;
 
     /* Structures of 128-bit Advanced SIMD vectors.  */
     case E_V2x16QImode:
@@ -3526,7 +3530,7 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_V4x8HFmode:
     case E_V4x4SFmode:
     case E_V4x2DFmode:
-      return TARGET_FLOAT ? VEC_ADVSIMD | VEC_STRUCT : 0;
+      return (TARGET_FLOAT || any_target_p) ? VEC_ADVSIMD | VEC_STRUCT : 0;
 
     /* 64-bit Advanced SIMD vectors.  */
     case E_V8QImode:
@@ -3546,7 +3550,7 @@ aarch64_classify_vector_mode (machine_mode mode)
     case E_V8BFmode:
     case E_V4SFmode:
     case E_V2DFmode:
-      return TARGET_FLOAT ? VEC_ADVSIMD : 0;
+      return (TARGET_FLOAT || any_target_p) ? VEC_ADVSIMD : 0;
 
     default:
       return 0;
@@ -11730,6 +11734,56 @@ aarch64_get_condition_code_1 (machine_mode mode, enum rtx_code comp_code)
   return -1;
 }
 
+/* Return true if X is a CONST_INT, CONST_WIDE_INT or a constant vector
+   duplicate of such constants.  If so, store in RET_WI the wide_int
+   representation of the constant paired with the inner mode of the vector mode
+   or TImode for scalar X constants.  */
+
+static bool
+aarch64_extract_vec_duplicate_wide_int (rtx x, wide_int *ret_wi)
+{
+  rtx elt = unwrap_const_vec_duplicate (x);
+  if (!CONST_SCALAR_INT_P (elt))
+    return false;
+  scalar_mode smode
+    = CONST_SCALAR_INT_P (x) ? TImode : GET_MODE_INNER (GET_MODE (x));
+  *ret_wi = rtx_mode_t (elt, smode);
+  return true;
+}
+
+/* Return true if X is a TImode constant or a constant vector of integer
+   immediates that represent the rounding constant used in the RSRA
+   instructions.
+   The accepted form of the constant is (1 << (C - 1)) where C is within
+   [1, MODE_WIDTH/2].  */
+
+bool
+aarch64_const_vec_rsra_rnd_imm_p (rtx x)
+{
+  wide_int rnd_cst;
+  if (!aarch64_extract_vec_duplicate_wide_int (x, &rnd_cst))
+    return false;
+  int log2 = wi::exact_log2 (rnd_cst);
+  if (log2 < 0)
+    return false;
+  return IN_RANGE (log2, 0, rnd_cst.get_precision () / 2 - 1);
+}
+
+/* Return true if RND is a constant vector of integer rounding constants
+   corresponding to a constant vector of shifts, SHIFT.
+   The relationship should be RND == (1 << (SHIFT - 1)).  */
+
+bool
+aarch64_const_vec_rnd_cst_p (rtx rnd, rtx shift)
+{
+  wide_int rnd_cst, shft_cst;
+  if (!aarch64_extract_vec_duplicate_wide_int (rnd, &rnd_cst)
+      || !aarch64_extract_vec_duplicate_wide_int (shift, &shft_cst))
+    return false;
+
+  return rnd_cst == (wi::shwi (1, rnd_cst.get_precision ()) << (shft_cst - 1));
+}
+
 bool
 aarch64_const_vec_all_same_in_range_p (rtx x,
 				       HOST_WIDE_INT minval,
@@ -20663,6 +20717,14 @@ aarch64_vector_mode_supported_p (machine_mode mode)
   return vec_flags != 0 && (vec_flags & VEC_STRUCT) == 0;
 }
 
+/* Implements target hook vector_mode_supported_any_target_p.  */
+static bool
+aarch64_vector_mode_supported_any_target_p (machine_mode mode)
+{
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode, true);
+  return vec_flags != 0 && (vec_flags & VEC_STRUCT) == 0;
+}
+
 /* Return the full-width SVE vector mode for element mode MODE, if one
    exists.  */
 opt_machine_mode
@@ -28036,6 +28098,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P aarch64_vector_mode_supported_p
 
+#undef TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P
+#define TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P aarch64_vector_mode_supported_any_target_p
+
 #undef TARGET_COMPATIBLE_VECTOR_TYPES_P
 #define TARGET_COMPATIBLE_VECTOR_TYPES_P aarch64_compatible_vector_types_p
 
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 4f1fd648e7f6f20694f1f05b39d0817c921988fc..4637c7aea755146069d0618a9c3f0d43b4ae604a 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -643,8 +643,6 @@ (define_c_enum "unspec"
     UNSPEC_SQXTUN	; Used in aarch64-simd.md.
     UNSPEC_SSRA		; Used in aarch64-simd.md.
     UNSPEC_USRA		; Used in aarch64-simd.md.
-    UNSPEC_SRSRA	; Used in aarch64-simd.md.
-    UNSPEC_URSRA	; Used in aarch64-simd.md.
     UNSPEC_SRSHR	; Used in aarch64-simd.md.
     UNSPEC_URSHR	; Used in aarch64-simd.md.
     UNSPEC_SQSHLU	; Used in aarch64-simd.md.
@@ -1530,6 +1528,12 @@ (define_mode_attr VWIDE [(V8QI  "V8HI")  (V4HI  "V4SI")
 			 (VNx16BI "VNx8BI") (VNx8BI "VNx4BI")
 			 (VNx4BI  "VNx2BI")])
 
+;; Modes with the same number of elements but strictly 2x the width.
+(define_mode_attr V2XWIDE [(V8QI "V8HI") (V4HI "V4SI")
+			   (V16QI "V16HI") (V8HI "V8SI")
+			   (V2SI "V2DI") (V4SI "V4DI")
+			   (V2DI "V2TI") (DI "TI")])
+
 ;; Predicate mode associated with VWIDE.
 (define_mode_attr VWIDE_PRED [(VNx8HF "VNx4BI") (VNx4SF "VNx2BI")])
 
@@ -2136,6 +2140,10 @@ (define_mode_attr sve_lane_con [(VNx8HI "y") (VNx4SI "y") (VNx2DI "x")
 ;; The constraint to use for an SVE FCMLA lane index.
 (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")])
 
+(define_mode_attr vec_or_offset [(V8QI "vec") (V16QI "vec") (V4HI "vec")
+				 (V8HI "vec") (V2SI "vec") (V4SI "vec")
+				 (V2DI "vec") (DI "offset")])
+
 ;; -------------------------------------------------------------------
 ;; Code Iterators
 ;; -------------------------------------------------------------------
@@ -2326,6 +2334,8 @@ (define_code_attr addsub [(ss_plus "add")
 			  (ss_minus "sub")
 			  (us_minus "sub")])
 
+(define_code_attr SHIFTEXTEND [(ashiftrt "sign_extend") (lshiftrt "zero_extend")])
+
 ;; For comparison operators we use the FCM* and CM* instructions.
 ;; As there are no CMLE or CMLT instructions which act on 3 vector
 ;; operands, we must use CMGE or CMGT and swap the order of the
@@ -2629,8 +2639,7 @@ (define_int_iterator VSHLL [UNSPEC_SSHLL UNSPEC_USHLL])
 (define_int_iterator VQSHL [UNSPEC_SQSHL UNSPEC_UQSHL
                             UNSPEC_SQRSHL UNSPEC_UQRSHL])
 
-(define_int_iterator VSRA [UNSPEC_SSRA UNSPEC_USRA
-			     UNSPEC_SRSRA UNSPEC_URSRA])
+(define_int_iterator VSRA [UNSPEC_SSRA UNSPEC_USRA])
 
 (define_int_iterator VSLRI [UNSPEC_SSLI UNSPEC_USLI
 			      UNSPEC_SSRI UNSPEC_USRI])
@@ -3353,7 +3362,6 @@ (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
 		      (UNSPEC_SSLI  "s") (UNSPEC_USLI  "u")
 		      (UNSPEC_SSRI  "s") (UNSPEC_USRI  "u")
 		      (UNSPEC_USRA  "u") (UNSPEC_SSRA  "s")
-		      (UNSPEC_URSRA  "ur") (UNSPEC_SRSRA  "sr")
 		      (UNSPEC_URSHR  "ur") (UNSPEC_SRSHR  "sr")
 		      (UNSPEC_SQSHLU "s") (UNSPEC_SQSHL   "s")
 		      (UNSPEC_UQSHL  "u")
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 3b8817d79e41c8d7efb3841eb7cec408bb02f5f8..6abfd0132c8c84bc28153e9239974f0340811d2a 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -587,6 +587,10 @@ (define_predicate "aarch64_simd_shift_imm_vec_di"
   (and (match_code "const_vector")
        (match_test "aarch64_const_vec_all_same_in_range_p (op, 1, 64)")))
 
+(define_predicate "aarch64_simd_rsra_rnd_imm_vec"
+  (and (match_code "const_vector,const_int,const_wide_int")
+       (match_test "aarch64_const_vec_rsra_rnd_imm_p (op)")))
+
 (define_predicate "aarch64_simd_rshrn_imm_vec"
   (and (match_code "const_vector")
        (match_test "aarch64_const_vec_all_same_in_range_p (op, 1,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 33e7ffc8af5a9d48430145ef9b8e7924613b7dd7..95ba56e05ae4a0f11639cc4a21d6736c53ad5ef1 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4456,11 +4456,23 @@ code in @file{optabs.cc}.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_VECTOR_MODE_SUPPORTED_P (machine_mode @var{mode})
-Define this to return nonzero if the port is prepared to handle
+Define this to return nonzero if the current target is prepared to handle
 insns involving vector mode @var{mode}.  At the very least, it
 must have move patterns for this mode.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P (machine_mode @var{mode})
+Define this to return nonzero if the port is prepared to handle
+insns involving vector mode @var{mode} in any target configuration.
+Returning @var{true} means that the mode can be used as the @samp{TYPE_MODE}
+for vector types.
+
+The default version of this hook returns true.  The final mode assigned to
+@samp{TYPE_MODE} will also be checked against
+@code{TARGET_VECTOR_MODE_SUPPORTED_P} to take target configuration into
+account.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_COMPATIBLE_VECTOR_TYPES_P (const_tree @var{type1}, const_tree @var{type2})
 Return true if there is no target-specific reason for treating
 vector types @var{type1} and @var{type2} as distinct types.  The caller
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index c98b2447e28aa17996b1cbf8af7ed02d70db54f2..4ac96dc357d35e0e57bb43a41d1b1a4f66d05946 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3374,6 +3374,8 @@ stack.
 
 @hook TARGET_VECTOR_MODE_SUPPORTED_P
 
+@hook TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P
+
 @hook TARGET_COMPATIBLE_VECTOR_TYPES_P
 
 @hook TARGET_ARRAY_MODE
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index 023de8c37db8593b7f694294f9d0b71415420857..a6deed4424b98d33fecca6ed6fd205cae0c9b438 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -499,11 +499,13 @@ mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
   else
     mode = MIN_MODE_VECTOR_INT;
 
-  /* Do not check vector_mode_supported_p here.  We'll do that
-     later in vector_type_mode.  */
+  /* Only check the broader vector_mode_supported_any_target_p here.
+     We'll filter through target-specific availability and
+     vector_mode_supported_p later in vector_type_mode.  */
   FOR_EACH_MODE_FROM (mode, mode)
     if (known_eq (GET_MODE_NUNITS (mode), nunits)
-	&& GET_MODE_INNER (mode) == innermode)
+	&& GET_MODE_INNER (mode) == innermode
+	&& targetm.vector_mode_supported_any_target_p (mode))
       return mode;
 
   /* For integers, try mapping it to a same-sized scalar mode.  */
diff --git a/gcc/target.def b/gcc/target.def
index cda6c51e5167f85625168c7c26b777d6c8ccad82..7d684296c17897b4ceecb31c5de1ae8665a8228e 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3491,12 +3491,26 @@ code in @file{optabs.cc}.",
    for further details.  */
 DEFHOOK
 (vector_mode_supported_p,
- "Define this to return nonzero if the port is prepared to handle\n\
+ "Define this to return nonzero if the current target is prepared to handle\n\
 insns involving vector mode @var{mode}.  At the very least, it\n\
 must have move patterns for this mode.",
  bool, (machine_mode mode),
  hook_bool_mode_false)
 
+DEFHOOK
+(vector_mode_supported_any_target_p,
+ "Define this to return nonzero if the port is prepared to handle\n\
+insns involving vector mode @var{mode} in any target configuration.\n\
+Returning @var{true} means that the mode can be used as the @samp{TYPE_MODE}\n\
+for vector types.\n\
+\n\
+The default version of this hook returns true.  The final mode assigned to\n\
+@samp{TYPE_MODE} will also be checked against\n\
+@code{TARGET_VECTOR_MODE_SUPPORTED_P} to take target configuration into\n\
+account.",
+ bool, (machine_mode mode),
+ hook_bool_mode_true)
+
 DEFHOOK
 (compatible_vector_types_p,
  "Return true if there is no target-specific reason for treating\n\

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [ping] RE: [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes
  2023-05-18 15:18 [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes Kyrylo Tkachov
@ 2023-05-25 10:12 ` Kyrylo Tkachov
  2023-05-25 16:02 ` Richard Sandiford
  1 sibling, 0 replies; 3+ messages in thread
From: Kyrylo Tkachov @ 2023-05-25 10:12 UTC (permalink / raw)
  To: gcc-patches

Ping.
Thanks,
Kyrill

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+kyrylo.tkachov=arm.com@gcc.gnu.org> On Behalf Of Kyrylo
> Tkachov via Gcc-patches
> Sent: Thursday, May 18, 2023 4:19 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes
> 
> Hi all,
> 
> This patch expresses the intrinsics for the SRA and RSRA instructions with
> standard RTL codes rather than relying on UNSPECs.
> These instructions perform a vector shift right plus accumulate with an
> optional rounding constant addition for the RSRA variant.
> There are a number of interesting points:
> 
> * The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N
> is left using the UNSPECs. Expressing it as a DImode plus+shift led to all
> kinds of trouble as it started matching the existing define_insns for
> "add x0, x0, asr #N" instructions and adding the SRA form as an extra
> alternative required a significant amount of deduplication of iterators and
> things still didn't work out well. I decided not to tackle that case in
> this patch. It can be attempted later.
> 
> * For the RSRA variants that add a rounding constant (1 << (shift-1)) the
> addition is notionally performed in a wider mode than the input types so that
> overflow is handled properly. In RTL this can be represented with an
> appropriate
> extend operation followed by a truncate back to the original modes.
> However for 128-bit input modes such as V4SI we don't have appropriate
> modes
> defined for this widening i.e. we'd need a V4DI mode to represent the
> intermediate widened result.  This patch defines such modes for
> V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have
> more Advanced SIMD instruction that have similar intermediate widening
> semantics.
> 
> * The above new modes led to a problem with stor-layout.cc. The new modes
> only
> exist for the sake of the RTL optimisers understanding the semantics of the
> instruction but are not indended to be moved to and from register or
> memory,
> assigned to types, used as TYPE_MODE or participate in auto-vectorisation.
> This is expressed in aarch64 by aarch64_classify_vector_mode returning zero
> for these new modes. However, the code in stor-
> layout.cc:<mode_for_vector>
> explicitly doesn't check this when picking a TYPE_MODE due to modes being
> made
> potentially available later through target switching (PR38240).
> This led to these modes being picked as TYPE_MODE for declarations such as:
> typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit
> fixed-length SVE modes are available and vector_type_mode later struggling
> to rectify this.
> This issue is addressed with the new target hook
> TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to
> check if a
> vector mode can be used in any legal target attribute configuration of the
> port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that
> checks
> only the initial target configuration. This allows a simple adjustment in
> stor-layout.cc that still disqualifies these limited modes early on while
> allowing consideration of modes that can be turned on in the future with
> target attributes.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for the non-aarch64 parts?
> 
> Thanks,
> Kyrill
> 
> gcc/ChangeLog:
> 
> 	* config/aarch64/aarch64-modes.def (V16HI, V8SI, V4DI, V2TI): New
> modes.
> 	* config/aarch64/aarch64-protos.h (aarch64_const_vec_rnd_cst_p):
> 	Declare prototype.
> 	(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
> 	* config/aarch64/aarch64-simd.md (*aarch64_simd_sra<mode>):
> Rename to...
> 	(aarch64_<sra_op>sra_n<mode>_insn): ... This.
> 	(aarch64_<sra_op>rsra_n<mode>_insn): New define_insn.
> 	(aarch64_<sra_op>sra_n<mode>): New define_expand.
> 	(aarch64_<sra_op>rsra_n<mode>): Likewise.
> 	(aarch64_<sur>sra_n<mode>): Rename to...
> 	(aarch64_<sur>sra_ndi): ... This.
> 	* config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Add
> 	any_target_p argument.
> 	(aarch64_extract_vec_duplicate_wide_int): Define.
> 	(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
> 	(aarch64_const_vec_rnd_cst_p): Likewise.
> 	(aarch64_vector_mode_supported_any_target_p): Likewise.
> 	(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
> 	* config/aarch64/iterators.md (UNSPEC_SRSRA, UNSPEC_URSRA):
> Delete.
> 	(VSRA): Adjust for the above.
> 	(sur): Likewise.
> 	(V2XWIDE): New mode_attr.
> 	(vec_or_offset): Likewise.
> 	(SHIFTEXTEND): Likewise.
> 	* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec):
> New
> 	predicate.
> 	* doc/tm.texi (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust
> description to
> 	clarify that it applies to current target options.
> 	(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Document.
> 	* doc/tm.texi.in: Regenerate.
> 	* stor-layout.cc (mode_for_vector): Check
> 	vector_mode_supported_any_target_p when iterating through vector
> modes.
> 	* target.def (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust
> description to
> 	clarify that it applies to current target options.
> 	(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Define.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes
  2023-05-18 15:18 [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes Kyrylo Tkachov
  2023-05-25 10:12 ` [ping] " Kyrylo Tkachov
@ 2023-05-25 16:02 ` Richard Sandiford
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Sandiford @ 2023-05-25 16:02 UTC (permalink / raw)
  To: Kyrylo Tkachov via Gcc-patches; +Cc: Kyrylo Tkachov

Kyrylo Tkachov via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hi all,
>
> This patch expresses the intrinsics for the SRA and RSRA instructions with
> standard RTL codes rather than relying on UNSPECs.
> These instructions perform a vector shift right plus accumulate with an
> optional rounding constant addition for the RSRA variant.
> There are a number of interesting points:
>
> * The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N
> is left using the UNSPECs. Expressing it as a DImode plus+shift led to all
> kinds of trouble as it started matching the existing define_insns for
> "add x0, x0, asr #N" instructions and adding the SRA form as an extra
> alternative required a significant amount of deduplication of iterators and
> things still didn't work out well. I decided not to tackle that case in
> this patch. It can be attempted later.
>
> * For the RSRA variants that add a rounding constant (1 << (shift-1)) the
> addition is notionally performed in a wider mode than the input types so that
> overflow is handled properly. In RTL this can be represented with an appropriate
> extend operation followed by a truncate back to the original modes.
> However for 128-bit input modes such as V4SI we don't have appropriate modes
> defined for this widening i.e. we'd need a V4DI mode to represent the
> intermediate widened result.  This patch defines such modes for
> V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have
> more Advanced SIMD instruction that have similar intermediate widening
> semantics.
>
> * The above new modes led to a problem with stor-layout.cc. The new modes only
> exist for the sake of the RTL optimisers understanding the semantics of the
> instruction but are not indended to be moved to and from register or memory,
> assigned to types, used as TYPE_MODE or participate in auto-vectorisation.
> This is expressed in aarch64 by aarch64_classify_vector_mode returning zero
> for these new modes. However, the code in stor-layout.cc:<mode_for_vector>
> explicitly doesn't check this when picking a TYPE_MODE due to modes being made
> potentially available later through target switching (PR38240).
> This led to these modes being picked as TYPE_MODE for declarations such as:
> typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit
> fixed-length SVE modes are available and vector_type_mode later struggling
> to rectify this.
> This issue is addressed with the new target hook
> TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a
> vector mode can be used in any legal target attribute configuration of the
> port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks
> only the initial target configuration. This allows a simple adjustment in
> stor-layout.cc that still disqualifies these limited modes early on while
> allowing consideration of modes that can be turned on in the future with
> target attributes.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for the non-aarch64 parts?

Yes, thanks.

Since we'd discussed this approach off-list, I wanted to leave a gap in
case others objected to it.  But I guess they would have spoken up by
now if so.

Richard

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-05-25 16:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 15:18 [PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes Kyrylo Tkachov
2023-05-25 10:12 ` [ping] " Kyrylo Tkachov
2023-05-25 16:02 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).