public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Refactor memory block operations
@ 2023-05-15  7:17 Stefan Schulze Frielinghaus
  2023-05-15  7:17 ` [PATCH 1/3] s390: Refactor block operation cpymem Stefan Schulze Frielinghaus
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Stefan Schulze Frielinghaus @ 2023-05-15  7:17 UTC (permalink / raw)
  To: krebbel, gcc-patches; +Cc: Stefan Schulze Frielinghaus

Bootstrapped and regtested.  Ok for mainline?

Stefan Schulze Frielinghaus (3):
  s390: Refactor block operation cpymem
  s390: Add block operation movmem
  s390: Refactor block operation setmem

 gcc/config/s390/s390-protos.h            |   5 +-
 gcc/config/s390/s390.cc                  | 301 ++++++++++++++++++++---
 gcc/config/s390/s390.md                  |  61 ++++-
 gcc/testsuite/gcc.target/s390/memset-1.c |   7 +-
 4 files changed, 331 insertions(+), 43 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] s390: Refactor block operation cpymem
  2023-05-15  7:17 [PATCH 0/3] Refactor memory block operations Stefan Schulze Frielinghaus
@ 2023-05-15  7:17 ` Stefan Schulze Frielinghaus
  2023-05-15  7:17 ` [PATCH 2/3] s390: Add block operation movmem Stefan Schulze Frielinghaus
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Stefan Schulze Frielinghaus @ 2023-05-15  7:17 UTC (permalink / raw)
  To: krebbel, gcc-patches; +Cc: Stefan Schulze Frielinghaus

Do not perform a libc function call into memcpy in case the size is not
a compile-time constant but bounded and the upper bound is less than or
equal to 256 bytes.

gcc/ChangeLog:

	* config/s390/s390-protos.h (s390_expand_cpymem): Change
	function signature.
	* config/s390/s390.cc (s390_expand_cpymem): For memcpy's less
	than or equal to 256 byte do not perform a libc call.
	(s390_expand_insv): Adapt new function signature of
	s390_expand_cpymem.
	* config/s390/s390.md: Change expander into a version which
	takes 8 operands.
---
 gcc/config/s390/s390-protos.h |  2 +-
 gcc/config/s390/s390.cc       | 84 +++++++++++++++++++++++++++--------
 gcc/config/s390/s390.md       | 10 +++--
 3 files changed, 74 insertions(+), 22 deletions(-)

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 67fe09e732d..2c7495ca247 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -107,7 +107,7 @@ extern void s390_reload_symref_address (rtx , rtx , rtx , bool);
 extern void s390_expand_plus_operand (rtx, rtx, rtx);
 extern void emit_symbolic_move (rtx *);
 extern void s390_load_address (rtx, rtx);
-extern bool s390_expand_cpymem (rtx, rtx, rtx);
+extern bool s390_expand_cpymem (rtx, rtx, rtx, rtx, rtx);
 extern void s390_expand_setmem (rtx, rtx, rtx);
 extern bool s390_expand_cmpmem (rtx, rtx, rtx, rtx);
 extern void s390_expand_vec_strlen (rtx, rtx, rtx);
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 505de995da8..95ea5e8d009 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -5650,27 +5650,27 @@ legitimize_reload_address (rtx ad, machine_mode mode ATTRIBUTE_UNUSED,
   return NULL_RTX;
 }
 
-/* Emit code to move LEN bytes from DST to SRC.  */
+/* Emit code to move LEN bytes from SRC to DST.  */
 
 bool
-s390_expand_cpymem (rtx dst, rtx src, rtx len)
+s390_expand_cpymem (rtx dst, rtx src, rtx len, rtx min_len_rtx, rtx max_len_rtx)
 {
-  /* When tuning for z10 or higher we rely on the Glibc functions to
-     do the right thing. Only for constant lengths below 64k we will
-     generate inline code.  */
-  if (s390_tune >= PROCESSOR_2097_Z10
-      && (GET_CODE (len) != CONST_INT || INTVAL (len) > (1<<16)))
-    return false;
+  /* Exit early in case nothing has to be done.  */
+  if (CONST_INT_P (len) && UINTVAL (len) == 0)
+    return true;
+
+  unsigned HOST_WIDE_INT min_len = UINTVAL (min_len_rtx);
+  unsigned HOST_WIDE_INT max_len
+    = max_len_rtx ? UINTVAL (max_len_rtx) : HOST_WIDE_INT_M1U;
 
   /* Expand memcpy for constant length operands without a loop if it
      is shorter that way.
 
      With a constant length argument a
      memcpy loop (without pfd) is 36 bytes -> 6 * mvc  */
-  if (GET_CODE (len) == CONST_INT
-      && INTVAL (len) >= 0
-      && INTVAL (len) <= 256 * 6
-      && (!TARGET_MVCLE || INTVAL (len) <= 256))
+  if (CONST_INT_P (len)
+      && UINTVAL (len) <= 6 * 256
+      && (!TARGET_MVCLE || UINTVAL (len) <= 256))
     {
       HOST_WIDE_INT o, l;
 
@@ -5681,14 +5681,57 @@ s390_expand_cpymem (rtx dst, rtx src, rtx len)
 	  emit_insn (gen_cpymem_short (newdst, newsrc,
 				       GEN_INT (l > 256 ? 255 : l - 1)));
 	}
+
+      return true;
     }
 
-  else if (TARGET_MVCLE)
+  else if (TARGET_MVCLE
+	   && (s390_tune < PROCESSOR_2097_Z10
+	       || (CONST_INT_P (len) && UINTVAL (len) <= (1 << 16))))
     {
       emit_insn (gen_cpymem_long (dst, src, convert_to_mode (Pmode, len, 1)));
+      return true;
     }
 
-  else
+  /* Non-constant length and no loop required.  */
+  else if (!CONST_INT_P (len) && max_len <= 256)
+    {
+      rtx_code_label *end_label;
+
+      if (min_len == 0)
+	{
+	  end_label = gen_label_rtx ();
+	  emit_cmp_and_jump_insns (len, const0_rtx, EQ, NULL_RTX,
+				   GET_MODE (len), 1, end_label,
+				   profile_probability::very_unlikely ());
+	}
+
+      rtx lenm1 = expand_binop (GET_MODE (len), add_optab, len, constm1_rtx,
+				NULL_RTX, 1, OPTAB_DIRECT);
+
+      /* Prefer a vectorized implementation over one which makes use of an
+	 execute instruction since it is faster (although it increases register
+	 pressure).  */
+      if (max_len <= 16 && TARGET_VX)
+	{
+	  rtx tmp = gen_reg_rtx (V16QImode);
+	  lenm1 = convert_to_mode (SImode, lenm1, 1);
+	  emit_insn (gen_vllv16qi (tmp, lenm1, src));
+	  emit_insn (gen_vstlv16qi (tmp, lenm1, dst));
+	}
+      else if (TARGET_Z15)
+	emit_insn (gen_mvcrl (dst, src, convert_to_mode (SImode, lenm1, 1)));
+      else
+	emit_insn (
+	  gen_cpymem_short (dst, src, convert_to_mode (Pmode, lenm1, 1)));
+
+      if (min_len == 0)
+	emit_label (end_label);
+
+      return true;
+    }
+
+  else if (s390_tune < PROCESSOR_2097_Z10 || (CONST_INT_P (len) && UINTVAL (len) <= (1 << 16)))
     {
       rtx dst_addr, src_addr, count, blocks, temp;
       rtx_code_label *loop_start_label = gen_label_rtx ();
@@ -5706,8 +5749,9 @@ s390_expand_cpymem (rtx dst, rtx src, rtx len)
       blocks = gen_reg_rtx (mode);
 
       convert_move (count, len, 1);
-      emit_cmp_and_jump_insns (count, const0_rtx,
-			       EQ, NULL_RTX, mode, 1, end_label);
+      if (min_len == 0)
+	emit_cmp_and_jump_insns (count, const0_rtx, EQ, NULL_RTX, mode, 1,
+				 end_label);
 
       emit_move_insn (dst_addr, force_operand (XEXP (dst, 0), NULL_RTX));
       emit_move_insn (src_addr, force_operand (XEXP (src, 0), NULL_RTX));
@@ -5767,8 +5811,11 @@ s390_expand_cpymem (rtx dst, rtx src, rtx len)
       emit_insn (gen_cpymem_short (dst, src,
 				   convert_to_mode (Pmode, count, 1)));
       emit_label (end_label);
+
+      return true;
     }
-  return true;
+
+  return false;
 }
 
 /* Emit code to set LEN bytes at DST to VAL.
@@ -6599,7 +6646,8 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
 
 	  dest = adjust_address (dest, BLKmode, 0);
 	  set_mem_size (dest, size);
-	  s390_expand_cpymem (dest, src_mem, GEN_INT (size));
+	  rtx size_rtx = GEN_INT (size);
+	  s390_expand_cpymem (dest, src_mem, size_rtx, size_rtx, size_rtx);
 	  return true;
 	}
 
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 00d39608e1d..d9ce287ab85 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3341,11 +3341,15 @@
 (define_expand "cpymem<mode>"
   [(set (match_operand:BLK 0 "memory_operand" "")   ; destination
         (match_operand:BLK 1 "memory_operand" ""))  ; source
-   (use (match_operand:GPR 2 "general_operand" "")) ; count
-   (match_operand 3 "" "")]
+   (use (match_operand:GPR 2 "general_operand" "")) ; size
+   (match_operand 3 "")  ; align
+   (match_operand 4 "")  ; expected align
+   (match_operand 5 "")  ; expected size
+   (match_operand 6 "")  ; minimal size
+   (match_operand 7 "")] ; maximal size
   ""
 {
-  if (s390_expand_cpymem (operands[0], operands[1], operands[2]))
+  if (s390_expand_cpymem (operands[0], operands[1], operands[2], operands[6], operands[7]))
     DONE;
   else
     FAIL;
-- 
2.39.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/3] s390: Add block operation movmem
  2023-05-15  7:17 [PATCH 0/3] Refactor memory block operations Stefan Schulze Frielinghaus
  2023-05-15  7:17 ` [PATCH 1/3] s390: Refactor block operation cpymem Stefan Schulze Frielinghaus
@ 2023-05-15  7:17 ` Stefan Schulze Frielinghaus
  2023-05-15  7:17 ` [PATCH 3/3] s390: Refactor block operation setmem Stefan Schulze Frielinghaus
  2023-05-15 20:18 ` [PATCH 0/3] Refactor memory block operations Andreas Krebbel
  3 siblings, 0 replies; 5+ messages in thread
From: Stefan Schulze Frielinghaus @ 2023-05-15  7:17 UTC (permalink / raw)
  To: krebbel, gcc-patches; +Cc: Stefan Schulze Frielinghaus

gcc/ChangeLog:

	* config/s390/s390-protos.h (s390_expand_movmem): New.
	* config/s390/s390.cc (s390_expand_movmem): New.
	* config/s390/s390.md (movmem<mode>): New.
	(*mvcrl): New.
	(mvcrl): New.
---
 gcc/config/s390/s390-protos.h |  1 +
 gcc/config/s390/s390.cc       | 88 +++++++++++++++++++++++++++++++++++
 gcc/config/s390/s390.md       | 35 ++++++++++++++
 3 files changed, 124 insertions(+)

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 2c7495ca247..65e4f97b41e 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -108,6 +108,7 @@ extern void s390_expand_plus_operand (rtx, rtx, rtx);
 extern void emit_symbolic_move (rtx *);
 extern void s390_load_address (rtx, rtx);
 extern bool s390_expand_cpymem (rtx, rtx, rtx, rtx, rtx);
+extern bool s390_expand_movmem (rtx, rtx, rtx, rtx, rtx);
 extern void s390_expand_setmem (rtx, rtx, rtx);
 extern bool s390_expand_cmpmem (rtx, rtx, rtx, rtx);
 extern void s390_expand_vec_strlen (rtx, rtx, rtx);
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 95ea5e8d009..553273f23ff 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -5818,6 +5818,94 @@ s390_expand_cpymem (rtx dst, rtx src, rtx len, rtx min_len_rtx, rtx max_len_rtx)
   return false;
 }
 
+bool
+s390_expand_movmem (rtx dst, rtx src, rtx len, rtx min_len_rtx, rtx max_len_rtx)
+{
+  /* Exit early in case nothing has to be done.  */
+  if (CONST_INT_P (len) && UINTVAL (len) == 0)
+    return true;
+  /* Exit early in case length is not upper bounded.  */
+  else if (max_len_rtx == NULL)
+    return false;
+
+  unsigned HOST_WIDE_INT min_len = UINTVAL (min_len_rtx);
+  unsigned HOST_WIDE_INT max_len = UINTVAL (max_len_rtx);
+
+  /* At most 16 bytes.  */
+  if (max_len <= 16 && TARGET_VX)
+    {
+      rtx_code_label *end_label;
+
+      if (min_len == 0)
+	{
+	  end_label = gen_label_rtx ();
+	  emit_cmp_and_jump_insns (len, const0_rtx, EQ, NULL_RTX,
+				   GET_MODE (len), 1, end_label,
+				   profile_probability::very_unlikely ());
+	}
+
+      rtx lenm1;
+      if (CONST_INT_P (len))
+	{
+	  lenm1 = gen_reg_rtx (SImode);
+	  emit_move_insn (lenm1, GEN_INT (UINTVAL (len) - 1));
+	}
+      else
+	lenm1
+	  = expand_binop (SImode, add_optab, convert_to_mode (SImode, len, 1),
+			  constm1_rtx, NULL_RTX, 1, OPTAB_DIRECT);
+
+      rtx tmp = gen_reg_rtx (V16QImode);
+      emit_insn (gen_vllv16qi (tmp, lenm1, src));
+      emit_insn (gen_vstlv16qi (tmp, lenm1, dst));
+
+      if (min_len == 0)
+	emit_label (end_label);
+
+      return true;
+    }
+
+  /* At most 256 bytes.  */
+  else if (max_len <= 256 && TARGET_Z15)
+    {
+      rtx_code_label *end_label = gen_label_rtx ();
+
+      if (min_len == 0)
+	emit_cmp_and_jump_insns (len, const0_rtx, EQ, NULL_RTX, GET_MODE (len),
+				 1, end_label,
+				 profile_probability::very_unlikely ());
+
+      rtx dst_addr = gen_reg_rtx (Pmode);
+      rtx src_addr = gen_reg_rtx (Pmode);
+      emit_move_insn (dst_addr, force_operand (XEXP (dst, 0), NULL_RTX));
+      emit_move_insn (src_addr, force_operand (XEXP (src, 0), NULL_RTX));
+
+      rtx lenm1 = CONST_INT_P (len)
+		    ? GEN_INT (UINTVAL (len) - 1)
+		    : expand_binop (GET_MODE (len), add_optab, len, constm1_rtx,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+
+      rtx_code_label *right_to_left_label = gen_label_rtx ();
+      emit_cmp_and_jump_insns (src_addr, dst_addr, LT, NULL_RTX, GET_MODE (len),
+			       1, right_to_left_label);
+
+      // MVC
+      emit_insn (
+	gen_cpymem_short (dst, src, convert_to_mode (Pmode, lenm1, 1)));
+      emit_jump (end_label);
+
+      // MVCRL
+      emit_label (right_to_left_label);
+      emit_insn (gen_mvcrl (dst, src, convert_to_mode (SImode, lenm1, 1)));
+
+      emit_label (end_label);
+
+      return true;
+    }
+
+  return false;
+}
+
 /* Emit code to set LEN bytes at DST to VAL.
    Make use of clrmem if VAL is zero.  */
 
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index d9ce287ab85..abe3bbc5cd9 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -61,6 +61,7 @@
    UNSPEC_ROUND
    UNSPEC_ICM
    UNSPEC_TIE
+   UNSPEC_MVCRL
 
    ; Convert CC into a str comparison result and copy it into an
    ; integer register
@@ -3496,6 +3497,40 @@
   [(set_attr "length" "8")
    (set_attr "type" "vs")])
 
+(define_expand "movmem<mode>"
+  [(set (match_operand:BLK 0 "memory_operand")   ; destination
+        (match_operand:BLK 1 "memory_operand"))  ; source
+   (use (match_operand:GPR 2 "general_operand")) ; size
+   (match_operand 3 "")  ; align
+   (match_operand 4 "")  ; expected align
+   (match_operand 5 "")  ; expected size
+   (match_operand 6 "")  ; minimal size
+   (match_operand 7 "")] ; maximal size
+  ""
+{
+  if (s390_expand_movmem (operands[0], operands[1], operands[2], operands[6], operands[7]))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_insn "*mvcrl"
+  [(set (match_operand:BLK 0 "memory_operand" "=Q")
+	(unspec:BLK [(match_operand:BLK 1 "memory_operand" "Q")
+		     (reg:SI GPR0_REGNUM)]
+		    UNSPEC_MVCRL))]
+  "TARGET_Z15"
+  "mvcrl\t%0,%1"
+  [(set_attr "op_type" "SSE")])
+
+(define_expand "mvcrl"
+  [(set (reg:SI GPR0_REGNUM) (match_operand:SI 2 "general_operand"))
+   (set (match_operand:BLK 0 "memory_operand" "=Q")
+	(unspec:BLK [(match_operand:BLK 1 "memory_operand" "Q")
+		     (reg:SI GPR0_REGNUM)]
+		    UNSPEC_MVCRL))]
+  "TARGET_Z15"
+  "")
 
 ;
 ; Test data class.
-- 
2.39.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/3] s390: Refactor block operation setmem
  2023-05-15  7:17 [PATCH 0/3] Refactor memory block operations Stefan Schulze Frielinghaus
  2023-05-15  7:17 ` [PATCH 1/3] s390: Refactor block operation cpymem Stefan Schulze Frielinghaus
  2023-05-15  7:17 ` [PATCH 2/3] s390: Add block operation movmem Stefan Schulze Frielinghaus
@ 2023-05-15  7:17 ` Stefan Schulze Frielinghaus
  2023-05-15 20:18 ` [PATCH 0/3] Refactor memory block operations Andreas Krebbel
  3 siblings, 0 replies; 5+ messages in thread
From: Stefan Schulze Frielinghaus @ 2023-05-15  7:17 UTC (permalink / raw)
  To: krebbel, gcc-patches; +Cc: Stefan Schulze Frielinghaus

Vectorize memset with a constant length of less than or equal to 64
bytes.

Do not perform a libc function call into memset in case the size is not
a compile-time constant but bounded and the upper bound is less than or
equal to 256 bytes.

gcc/ChangeLog:

	* config/s390/s390-protos.h (s390_expand_setmem): Change
	function signature.
	* config/s390/s390.cc (s390_expand_setmem): For memset's less
	than or equal to 256 byte do not perform a libc call.
	* config/s390/s390.md: Change expander into a version which
	takes 8 operands.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/memset-1.c: Test case memset1 makes use of
	vst, now.
---
 gcc/config/s390/s390-protos.h            |   2 +-
 gcc/config/s390/s390.cc                  | 129 +++++++++++++++++++++--
 gcc/config/s390/s390.md                  |  14 ++-
 gcc/testsuite/gcc.target/s390/memset-1.c |   7 +-
 4 files changed, 132 insertions(+), 20 deletions(-)

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 65e4f97b41e..4a5263fccec 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -109,7 +109,7 @@ extern void emit_symbolic_move (rtx *);
 extern void s390_load_address (rtx, rtx);
 extern bool s390_expand_cpymem (rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_movmem (rtx, rtx, rtx, rtx, rtx);
-extern void s390_expand_setmem (rtx, rtx, rtx);
+extern void s390_expand_setmem (rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_cmpmem (rtx, rtx, rtx, rtx);
 extern void s390_expand_vec_strlen (rtx, rtx, rtx);
 extern void s390_expand_vec_movstr (rtx, rtx, rtx);
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 553273f23ff..b1cb54612b8 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -5910,20 +5910,62 @@ s390_expand_movmem (rtx dst, rtx src, rtx len, rtx min_len_rtx, rtx max_len_rtx)
    Make use of clrmem if VAL is zero.  */
 
 void
-s390_expand_setmem (rtx dst, rtx len, rtx val)
+s390_expand_setmem (rtx dst, rtx len, rtx val, rtx min_len_rtx, rtx max_len_rtx)
 {
-  if (GET_CODE (len) == CONST_INT && INTVAL (len) <= 0)
+  /* Exit early in case nothing has to be done.  */
+  if (CONST_INT_P (len) && UINTVAL (len) == 0)
     return;
 
   gcc_assert (GET_CODE (val) == CONST_INT || GET_MODE (val) == QImode);
 
+  unsigned HOST_WIDE_INT min_len = UINTVAL (min_len_rtx);
+  unsigned HOST_WIDE_INT max_len
+    = max_len_rtx ? UINTVAL (max_len_rtx) : HOST_WIDE_INT_M1U;
+
+  /* Vectorize memset with a constant length
+   - if  0 <  LEN <  16, then emit a vstl based solution;
+   - if 16 <= LEN <= 64, then emit a vst based solution
+     where the last two vector stores may overlap in case LEN%16!=0.  Paying
+     the price for an overlap is negligible compared to an extra GPR which is
+     required for vstl.  */
+  if (CONST_INT_P (len) && UINTVAL (len) <= 64 && val != const0_rtx
+      && TARGET_VX)
+    {
+      rtx val_vec = gen_reg_rtx (V16QImode);
+      emit_move_insn (val_vec, gen_rtx_VEC_DUPLICATE (V16QImode, val));
+
+      if (UINTVAL (len) < 16)
+	{
+	  rtx len_reg = gen_reg_rtx (SImode);
+	  emit_move_insn (len_reg, GEN_INT (UINTVAL (len) - 1));
+	  emit_insn (gen_vstlv16qi (val_vec, len_reg, dst));
+	}
+      else
+	{
+	  unsigned HOST_WIDE_INT l = UINTVAL (len) / 16;
+	  unsigned HOST_WIDE_INT r = UINTVAL (len) % 16;
+	  unsigned HOST_WIDE_INT o = 0;
+	  for (unsigned HOST_WIDE_INT i = 0; i < l; ++i)
+	    {
+	      rtx newdst = adjust_address (dst, V16QImode, o);
+	      emit_move_insn (newdst, val_vec);
+	      o += 16;
+	    }
+	  if (r != 0)
+	    {
+	      rtx newdst = adjust_address (dst, V16QImode, (o - 16) + r);
+	      emit_move_insn (newdst, val_vec);
+	    }
+	}
+    }
+
   /* Expand setmem/clrmem for a constant length operand without a
      loop if it will be shorter that way.
      clrmem loop (with PFD)    is 30 bytes -> 5 * xc
      clrmem loop (without PFD) is 24 bytes -> 4 * xc
      setmem loop (with PFD)    is 38 bytes -> ~4 * (mvi/stc + mvc)
      setmem loop (without PFD) is 32 bytes -> ~4 * (mvi/stc + mvc) */
-  if (GET_CODE (len) == CONST_INT
+  else if (GET_CODE (len) == CONST_INT
       && ((val == const0_rtx
 	   && (INTVAL (len) <= 256 * 4
 	       || (INTVAL (len) <= 256 * 5 && TARGET_SETMEM_PFD(val,len))))
@@ -5968,6 +6010,70 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
 				       val));
     }
 
+  /* Non-constant length and no loop required.  */
+  else if (!CONST_INT_P (len) && max_len <= 256)
+    {
+      rtx_code_label *end_label;
+
+      if (min_len == 0)
+	{
+	  end_label = gen_label_rtx ();
+	  emit_cmp_and_jump_insns (len, const0_rtx, EQ, NULL_RTX,
+				   GET_MODE (len), 1, end_label,
+				   profile_probability::very_unlikely ());
+	}
+
+      rtx lenm1 = expand_binop (GET_MODE (len), add_optab, len, constm1_rtx,
+				NULL_RTX, 1, OPTAB_DIRECT);
+
+      /* Prefer a vectorized implementation over one which makes use of an
+	 execute instruction since it is faster (although it increases register
+	 pressure).  */
+      if (max_len <= 16 && TARGET_VX)
+	{
+	  rtx val_vec = gen_reg_rtx (V16QImode);
+	  if (val == const0_rtx)
+	    emit_move_insn (val_vec, CONST0_RTX (V16QImode));
+	  else
+	    emit_move_insn (val_vec, gen_rtx_VEC_DUPLICATE (V16QImode, val));
+
+	  lenm1 = convert_to_mode (SImode, lenm1, 1);
+	  emit_insn (gen_vstlv16qi (val_vec, lenm1, dst));
+	}
+      else
+	{
+	  if (val == const0_rtx)
+	    emit_insn (
+	      gen_clrmem_short (dst, convert_to_mode (Pmode, lenm1, 1)));
+	  else
+	    {
+	      emit_move_insn (adjust_address (dst, QImode, 0), val);
+
+	      rtx_code_label *onebyte_end_label;
+	      if (min_len <= 1)
+		{
+		  onebyte_end_label = gen_label_rtx ();
+		  emit_cmp_and_jump_insns (
+		    len, const1_rtx, EQ, NULL_RTX, GET_MODE (len), 1,
+		    onebyte_end_label, profile_probability::very_unlikely ());
+		}
+
+	      rtx dstp1 = adjust_address (dst, VOIDmode, 1);
+	      rtx lenm2
+		= expand_binop (GET_MODE (len), add_optab, len, GEN_INT (-2),
+				NULL_RTX, 1, OPTAB_DIRECT);
+	      lenm2 = convert_to_mode (Pmode, lenm2, 1);
+	      emit_insn (gen_cpymem_short (dstp1, dst, lenm2));
+
+	      if (min_len <= 1)
+		emit_label (onebyte_end_label);
+	    }
+	}
+
+      if (min_len == 0)
+	emit_label (end_label);
+    }
+
   else
     {
       rtx dst_addr, count, blocks, temp, dstp1 = NULL_RTX;
@@ -5986,9 +6092,10 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
       blocks = gen_reg_rtx (mode);
 
       convert_move (count, len, 1);
-      emit_cmp_and_jump_insns (count, const0_rtx,
-			       EQ, NULL_RTX, mode, 1, zerobyte_end_label,
-			       profile_probability::very_unlikely ());
+      if (min_len == 0)
+	emit_cmp_and_jump_insns (count, const0_rtx, EQ, NULL_RTX, mode, 1,
+				 zerobyte_end_label,
+				 profile_probability::very_unlikely ());
 
       /* We need to make a copy of the target address since memset is
 	 supposed to return it unmodified.  We have to make it here
@@ -6003,10 +6110,10 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
 	     the mvc reading this value).  */
 	  set_mem_size (dst, 1);
 	  dstp1 = adjust_address (dst, VOIDmode, 1);
-	  emit_cmp_and_jump_insns (count,
-				   const1_rtx, EQ, NULL_RTX, mode, 1,
-				   onebyte_end_label,
-				   profile_probability::very_unlikely ());
+	  if (min_len <= 1)
+	    emit_cmp_and_jump_insns (count, const1_rtx, EQ, NULL_RTX, mode, 1,
+				     onebyte_end_label,
+				     profile_probability::very_unlikely ());
 	}
 
       /* There is one unconditional (mvi+mvc)/xc after the loop
@@ -6029,7 +6136,7 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
 
       emit_jump (loop_start_label);
 
-      if (val != const0_rtx)
+      if (val != const0_rtx && min_len <= 1)
 	{
 	  /* The 1 byte != 0 special case.  Not handled efficiently
 	     since we require two jumps for that.  However, this
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index abe3bbc5cd9..9631b2a8c60 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3595,12 +3595,16 @@
 ;
 
 (define_expand "setmem<mode>"
-  [(set (match_operand:BLK 0 "memory_operand" "")
-        (match_operand:QI 2 "general_operand" ""))
-   (use (match_operand:GPR 1 "general_operand" ""))
-   (match_operand 3 "" "")]
+  [(set (match_operand:BLK 0 "memory_operand" "")   ; destination
+        (match_operand:QI 2 "general_operand" ""))  ; value
+   (use (match_operand:GPR 1 "general_operand" "")) ; size
+   (match_operand 3 "")  ; align
+   (match_operand 4 "")  ; expected align
+   (match_operand 5 "")  ; expected size
+   (match_operand 6 "")  ; minimal size
+   (match_operand 7 "")] ; maximal size
   ""
-  "s390_expand_setmem (operands[0], operands[1], operands[2]); DONE;")
+  "s390_expand_setmem (operands[0], operands[1], operands[2], operands[6], operands[7]); DONE;")
 
 ; Clear a block that is up to 256 bytes in length.
 ; The block length is taken as (operands[1] % 256) + 1.
diff --git a/gcc/testsuite/gcc.target/s390/memset-1.c b/gcc/testsuite/gcc.target/s390/memset-1.c
index 9463a77208b..5eb96112f13 100644
--- a/gcc/testsuite/gcc.target/s390/memset-1.c
+++ b/gcc/testsuite/gcc.target/s390/memset-1.c
@@ -11,7 +11,7 @@ void
   return __builtin_memset (s, c, 1);
 }
 
-/* 1 stc 1 mvc */
+/* 3 vst */
 void
 *memset1(void *s, int c)
 {
@@ -170,8 +170,9 @@ void
 }
 
 /* { dg-final { scan-assembler-times "mvi\\s" 1 } } */
-/* { dg-final { scan-assembler-times "mvc\\s" 20 } } */
+/* { dg-final { scan-assembler-times "mvc\\s" 19 } } */
 /* { dg-final { scan-assembler-times "xc\\s" 28 } } */
-/* { dg-final { scan-assembler-times "stc\\s" 22 } } */
+/* { dg-final { scan-assembler-times "stc\\s" 21 } } */
 /* { dg-final { scan-assembler-times "stcy\\s" 0 } } */
 /* { dg-final { scan-assembler-times "pfd\\s" 2 } } */
+/* { dg-final { scan-assembler-times "vst\\s" 3 } } */
-- 
2.39.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] Refactor memory block operations
  2023-05-15  7:17 [PATCH 0/3] Refactor memory block operations Stefan Schulze Frielinghaus
                   ` (2 preceding siblings ...)
  2023-05-15  7:17 ` [PATCH 3/3] s390: Refactor block operation setmem Stefan Schulze Frielinghaus
@ 2023-05-15 20:18 ` Andreas Krebbel
  3 siblings, 0 replies; 5+ messages in thread
From: Andreas Krebbel @ 2023-05-15 20:18 UTC (permalink / raw)
  To: Stefan Schulze Frielinghaus, gcc-patches

On 5/15/23 09:17, Stefan Schulze Frielinghaus wrote:
> Bootstrapped and regtested.  Ok for mainline?
> 
> Stefan Schulze Frielinghaus (3):
>   s390: Refactor block operation cpymem
>   s390: Add block operation movmem
>   s390: Refactor block operation setmem
> 
>  gcc/config/s390/s390-protos.h            |   5 +-
>  gcc/config/s390/s390.cc                  | 301 ++++++++++++++++++++---
>  gcc/config/s390/s390.md                  |  61 ++++-
>  gcc/testsuite/gcc.target/s390/memset-1.c |   7 +-
>  4 files changed, 331 insertions(+), 43 deletions(-)
> 

Ok. Thanks!

Andreas


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-15 20:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15  7:17 [PATCH 0/3] Refactor memory block operations Stefan Schulze Frielinghaus
2023-05-15  7:17 ` [PATCH 1/3] s390: Refactor block operation cpymem Stefan Schulze Frielinghaus
2023-05-15  7:17 ` [PATCH 2/3] s390: Add block operation movmem Stefan Schulze Frielinghaus
2023-05-15  7:17 ` [PATCH 3/3] s390: Refactor block operation setmem Stefan Schulze Frielinghaus
2023-05-15 20:18 ` [PATCH 0/3] Refactor memory block operations Andreas Krebbel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).