[PATCH 0/8] Follow-on force

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/8] Follow-on force_subreg patches
@ 2024-06-17  9:53 Richard Sandiford
  2024-06-17  9:53 ` [PATCH 1/8] Make force_subreg emit nothing on failure Richard Sandiford
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

This series expands on the fix for PR115464 by using force_subreg
in more places.  It also adds some convenience wrappers for lowpart
and highpart subregs.

A part of this will need to be backported after a grace period,
but I'll post the cherry-picked parts separately.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard Sandiford (8):
  Make force_subreg emit nothing on failure
  aarch64: Use force_subreg in more places
  Make more use of force_subreg
  Add force_lowpart_subreg
  aarch64: Add some uses of force_lowpart_subreg
  Make more use of force_lowpart_subreg
  Add force_highpart_subreg
  aarch64: Add some uses of force_highpart_subreg

 gcc/builtins.cc                               | 22 +++-------
 gcc/config/aarch64/aarch64-builtins.cc        | 15 +++----
 gcc/config/aarch64/aarch64-simd.md            |  4 +-
 .../aarch64/aarch64-sve-builtins-base.cc      | 10 ++---
 .../aarch64/aarch64-sve-builtins-functions.h  |  6 +--
 .../aarch64/aarch64-sve-builtins-sme.cc       |  2 +-
 gcc/config/aarch64/aarch64.cc                 | 31 ++++---------
 gcc/explow.cc                                 | 34 +++++++++++++-
 gcc/explow.h                                  |  2 +
 gcc/expmed.cc                                 | 26 ++++-------
 gcc/expr.cc                                   | 44 +++++++++----------
 gcc/optabs.cc                                 | 26 ++---------
 .../aarch64/sve/acle/general/pr115464_2.c     | 11 +++++
 13 files changed, 111 insertions(+), 122 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] Make force_subreg emit nothing on failure
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
@ 2024-06-17  9:53 ` Richard Sandiford
  2024-06-17  9:53 ` [PATCH 2/8] aarch64: Use force_subreg in more places Richard Sandiford
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

While adding more uses of force_subreg, I realised that it should
be more careful to emit no instructions on failure.  This kind of
failure should be very rare, so I don't think it's a case worth
optimising for.

gcc/
	* explow.cc (force_subreg): Emit no instructions on failure.
---
 gcc/explow.cc | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index f6843398c4b..bd93c878064 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -756,8 +756,12 @@ force_subreg (machine_mode outermode, rtx op,
   if (x)
     return x;
 
+  auto *start = get_last_insn ();
   op = copy_to_mode_reg (innermode, op);
-  return simplify_gen_subreg (outermode, op, innermode, byte);
+  rtx res = simplify_gen_subreg (outermode, op, innermode, byte);
+  if (!res)
+    delete_insns_since (start);
+  return res;
 }
 
 /* If X is a memory ref, copy its contents to a new temp reg and return
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/8] aarch64: Use force_subreg in more places
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
  2024-06-17  9:53 ` [PATCH 1/8] Make force_subreg emit nothing on failure Richard Sandiford
@ 2024-06-17  9:53 ` Richard Sandiford
  2024-06-17  9:53 ` [PATCH 3/8] Make more use of force_subreg Richard Sandiford
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

This patch makes the aarch64 code use force_subreg instead of
simplify_gen_subreg in more places.  The criteria were:

(1) The code is obviously specific to expand (where new pseudos
    can be created).

(2) The value is obviously an rvalue rather than an lvalue.

(3) The offset wasn't a simple lowpart or highpart calculation;
    a later patch will deal with those.

gcc/
	* config/aarch64/aarch64-builtins.cc (aarch64_expand_fcmla_builtin):
	Use force_subreg instead of simplify_gen_subreg.
	* config/aarch64/aarch64-simd.md (ctz<mode>2): Likewise.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svget_impl::expand): Likewise.
	(svget_neonq_impl::expand): Likewise.
	* config/aarch64/aarch64-sve-builtins-functions.h
	(multireg_permute::expand): Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc              | 4 ++--
 gcc/config/aarch64/aarch64-simd.md                  | 4 ++--
 gcc/config/aarch64/aarch64-sve-builtins-base.cc     | 8 +++-----
 gcc/config/aarch64/aarch64-sve-builtins-functions.h | 6 +++---
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index d589e59defc..7d827cbc2ac 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2592,12 +2592,12 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
       rtx temp2 = gen_reg_rtx (DImode);
       temp1 = simplify_gen_subreg (d->mode, op2, quadmode,
 				   subreg_lowpart_offset (d->mode, quadmode));
-      temp1 = simplify_gen_subreg (V2DImode, temp1, d->mode, 0);
+      temp1 = force_subreg (V2DImode, temp1, d->mode, 0);
       if (BYTES_BIG_ENDIAN)
 	emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx));
       else
 	emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const1_rtx));
-      op2 = simplify_gen_subreg (d->mode, temp2, GET_MODE (temp2), 0);
+      op2 = force_subreg (d->mode, temp2, GET_MODE (temp2), 0);
 
       /* And recalculate the index.  */
       lane -= nunits / 4;
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 0bb39091a38..01b084d8ccb 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -389,8 +389,8 @@ (define_expand "ctz<mode>2"
   "TARGET_SIMD"
   {
      emit_insn (gen_bswap<mode>2 (operands[0], operands[1]));
-     rtx op0_castsi2qi = simplify_gen_subreg(<VS:VSI2QI>mode, operands[0],
-					     <MODE>mode, 0);
+     rtx op0_castsi2qi = force_subreg (<VS:VSI2QI>mode, operands[0],
+				       <MODE>mode, 0);
      emit_insn (gen_aarch64_rbit<VS:vsi2qi> (op0_castsi2qi, op0_castsi2qi));
      emit_insn (gen_clz<mode>2 (operands[0], operands[0]));
      DONE;
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 823d60040f9..99932037124 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1121,9 +1121,8 @@ public:
   expand (function_expander &e) const override
   {
     /* Fold the access into a subreg rvalue.  */
-    return simplify_gen_subreg (e.vector_mode (0), e.args[0],
-				GET_MODE (e.args[0]),
-				INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR);
+    return force_subreg (e.vector_mode (0), e.args[0], GET_MODE (e.args[0]),
+			 INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR);
   }
 };
 
@@ -1157,8 +1156,7 @@ public:
 	e.add_fixed_operand (indices);
 	return e.generate_insn (icode);
       }
-    return simplify_gen_subreg (e.result_mode (), e.args[0],
-				GET_MODE (e.args[0]), 0);
+    return force_subreg (e.result_mode (), e.args[0], GET_MODE (e.args[0]), 0);
   }
 };
 
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
index 3b8e575e98e..7d06a57ff83 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
@@ -639,9 +639,9 @@ public:
       {
 	machine_mode elt_mode = e.vector_mode (0);
 	rtx arg = e.args[0];
-	e.args[0] = simplify_gen_subreg (elt_mode, arg, GET_MODE (arg), 0);
-	e.args.safe_push (simplify_gen_subreg (elt_mode, arg, GET_MODE (arg),
-					       GET_MODE_SIZE (elt_mode)));
+	e.args[0] = force_subreg (elt_mode, arg, GET_MODE (arg), 0);
+	e.args.safe_push (force_subreg (elt_mode, arg, GET_MODE (arg),
+					GET_MODE_SIZE (elt_mode)));
       }
     return e.use_exact_insn (icode);
   }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/8] Make more use of force_subreg
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
  2024-06-17  9:53 ` [PATCH 1/8] Make force_subreg emit nothing on failure Richard Sandiford
  2024-06-17  9:53 ` [PATCH 2/8] aarch64: Use force_subreg in more places Richard Sandiford
@ 2024-06-17  9:53 ` Richard Sandiford
  2024-06-21 20:10   ` Jeff Law
  2024-06-17  9:53 ` [PATCH 4/8] Add force_lowpart_subreg Richard Sandiford
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

This patch makes target-independent code use force_subreg instead
of simplify_gen_subreg in some places.  The criteria were:

(1) The code is obviously specific to expand (where new pseudos
    can be created), or at least would be invalid to call when
    !can_create_pseudo_p () and temporaries are needed.

(2) The value is obviously an rvalue rather than an lvalue.

(3) The offset wasn't a simple lowpart or highpart calculation;
    a later patch will deal with those.

Doing this should reduce the likelihood of bugs like PR115464
occuring in other situations.

gcc/
	* expmed.cc (store_bit_field_using_insv): Use force_subreg
	instead of simplify_gen_subreg.
	(store_bit_field_1): Likewise.
	(extract_bit_field_as_subreg): Likewise.
	(extract_integral_bit_field): Likewise.
	(emit_store_flag_1): Likewise.
	* expr.cc (convert_move): Likewise.
	(convert_modes): Likewise.
	(emit_group_load_1): Likewise.
	(emit_group_store): Likewise.
	(expand_assignment): Likewise.
---
 gcc/expmed.cc | 22 ++++++++--------------
 gcc/expr.cc   | 27 ++++++++++++---------------
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 9ba01695f53..1f68e7be721 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -695,13 +695,7 @@ store_bit_field_using_insv (const extraction_insn *insv, rtx op0,
 	     if we must narrow it, be sure we do it correctly.  */
 
 	  if (GET_MODE_SIZE (value_mode) < GET_MODE_SIZE (op_mode))
-	    {
-	      tmp = simplify_subreg (op_mode, value1, value_mode, 0);
-	      if (! tmp)
-		tmp = simplify_gen_subreg (op_mode,
-					   force_reg (value_mode, value1),
-					   value_mode, 0);
-	    }
+	    tmp = force_subreg (op_mode, value1, value_mode, 0);
 	  else
 	    {
 	      if (targetm.mode_rep_extended (op_mode, value_mode) != UNKNOWN)
@@ -806,7 +800,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
       if (known_eq (bitnum, 0U)
 	  && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0))))
 	{
-	  sub = simplify_gen_subreg (GET_MODE (op0), value, fieldmode, 0);
+	  sub = force_subreg (GET_MODE (op0), value, fieldmode, 0);
 	  if (sub)
 	    {
 	      if (reverse)
@@ -1633,7 +1627,7 @@ extract_bit_field_as_subreg (machine_mode mode, rtx op0,
       && known_eq (bitsize, GET_MODE_BITSIZE (mode))
       && lowpart_bit_field_p (bitnum, bitsize, op0_mode)
       && TRULY_NOOP_TRUNCATION_MODES_P (mode, op0_mode))
-    return simplify_gen_subreg (mode, op0, op0_mode, bytenum);
+    return force_subreg (mode, op0, op0_mode, bytenum);
   return NULL_RTX;
 }
 
@@ -2000,11 +1994,11 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 	  return convert_extracted_bit_field (target, mode, tmode, unsignedp);
 	}
       /* If OP0 is a hard register, copy it to a pseudo before calling
-	 simplify_gen_subreg.  */
+	 force_subreg.  */
       if (REG_P (op0) && HARD_REGISTER_P (op0))
 	op0 = copy_to_reg (op0);
-      op0 = simplify_gen_subreg (word_mode, op0, op0_mode.require (),
-				 bitnum / BITS_PER_WORD * UNITS_PER_WORD);
+      op0 = force_subreg (word_mode, op0, op0_mode.require (),
+			  bitnum / BITS_PER_WORD * UNITS_PER_WORD);
       op0_mode = word_mode;
       bitnum %= BITS_PER_WORD;
     }
@@ -5774,8 +5768,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx op0, rtx op1,
 
 	  /* Do a logical OR or AND of the two words and compare the
 	     result.  */
-	  op00 = simplify_gen_subreg (word_mode, op0, int_mode, 0);
-	  op01 = simplify_gen_subreg (word_mode, op0, int_mode, UNITS_PER_WORD);
+	  op00 = force_subreg (word_mode, op0, int_mode, 0);
+	  op01 = force_subreg (word_mode, op0, int_mode, UNITS_PER_WORD);
 	  tem = expand_binop (word_mode,
 			      op1 == const0_rtx ? ior_optab : and_optab,
 			      op00, op01, NULL_RTX, unsignedp,
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 9cecc1758f5..31a7346e33f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -301,7 +301,7 @@ convert_move (rtx to, rtx from, int unsignedp)
 			    GET_MODE_BITSIZE (to_mode)));
 
       if (VECTOR_MODE_P (to_mode))
-	from = simplify_gen_subreg (to_mode, from, GET_MODE (from), 0);
+	from = force_subreg (to_mode, from, GET_MODE (from), 0);
       else
 	to = simplify_gen_subreg (from_mode, to, GET_MODE (to), 0);
 
@@ -935,7 +935,7 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp)
     {
       gcc_assert (known_eq (GET_MODE_BITSIZE (mode),
 			    GET_MODE_BITSIZE (oldmode)));
-      return simplify_gen_subreg (mode, x, oldmode, 0);
+      return force_subreg (mode, x, oldmode, 0);
     }
 
   temp = gen_reg_rtx (mode);
@@ -3072,8 +3072,8 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type,
 	    }
 	}
       else if (CONSTANT_P (src) && GET_MODE (dst) != BLKmode
-               && XVECLEN (dst, 0) > 1)
-        tmps[i] = simplify_gen_subreg (mode, src, GET_MODE (dst), bytepos);
+	       && XVECLEN (dst, 0) > 1)
+	tmps[i] = force_subreg (mode, src, GET_MODE (dst), bytepos);
       else if (CONSTANT_P (src))
 	{
 	  if (known_eq (bytelen, ssize))
@@ -3297,7 +3297,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
 	  if (known_eq (rtx_to_poly_int64 (XEXP (XVECEXP (src, 0, start), 1)),
 			bytepos))
 	    {
-	      temp = simplify_gen_subreg (outer, tmps[start], inner, 0);
+	      temp = force_subreg (outer, tmps[start], inner, 0);
 	      if (temp)
 		{
 		  emit_move_insn (dst, temp);
@@ -3317,7 +3317,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
 							  finish - 1), 1)),
 			bytepos))
 	    {
-	      temp = simplify_gen_subreg (outer, tmps[finish - 1], inner, 0);
+	      temp = force_subreg (outer, tmps[finish - 1], inner, 0);
 	      if (temp)
 		{
 		  emit_move_insn (dst, temp);
@@ -6191,11 +6191,9 @@ expand_assignment (tree to, tree from, bool nontemporal)
 		  to_mode = GET_MODE_INNER (to_mode);
 		  machine_mode from_mode = GET_MODE_INNER (GET_MODE (result));
 		  rtx from_real
-		    = simplify_gen_subreg (to_mode, XEXP (result, 0),
-					   from_mode, 0);
+		    = force_subreg (to_mode, XEXP (result, 0), from_mode, 0);
 		  rtx from_imag
-		    = simplify_gen_subreg (to_mode, XEXP (result, 1),
-					   from_mode, 0);
+		    = force_subreg (to_mode, XEXP (result, 1), from_mode, 0);
 		  if (!from_real || !from_imag)
 		    goto concat_store_slow;
 		  emit_move_insn (XEXP (to_rtx, 0), from_real);
@@ -6211,8 +6209,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
 		  if (MEM_P (result))
 		    from_rtx = change_address (result, to_mode, NULL_RTX);
 		  else
-		    from_rtx
-		      = simplify_gen_subreg (to_mode, result, from_mode, 0);
+		    from_rtx = force_subreg (to_mode, result, from_mode, 0);
 		  if (from_rtx)
 		    {
 		      emit_move_insn (XEXP (to_rtx, 0),
@@ -6224,10 +6221,10 @@ expand_assignment (tree to, tree from, bool nontemporal)
 		    {
 		      to_mode = GET_MODE_INNER (to_mode);
 		      rtx from_real
-			= simplify_gen_subreg (to_mode, result, from_mode, 0);
+			= force_subreg (to_mode, result, from_mode, 0);
 		      rtx from_imag
-			= simplify_gen_subreg (to_mode, result, from_mode,
-					       GET_MODE_SIZE (to_mode));
+			= force_subreg (to_mode, result, from_mode,
+					GET_MODE_SIZE (to_mode));
 		      if (!from_real || !from_imag)
 			goto concat_store_slow;
 		      emit_move_insn (XEXP (to_rtx, 0), from_real);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 4/8] Add force_lowpart_subreg
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
                   ` (2 preceding siblings ...)
  2024-06-17  9:53 ` [PATCH 3/8] Make more use of force_subreg Richard Sandiford
@ 2024-06-17  9:53 ` Richard Sandiford
  2024-06-17  9:53 ` [PATCH 5/8] aarch64: Add some uses of force_lowpart_subreg Richard Sandiford
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

optabs had a local function called lowpart_subreg_maybe_copy
that is very similar to the lowpart version of force_subreg.
This patch adds a force_lowpart_subreg wrapper around
force_subreg and uses it in optabs.cc.

The only difference between the old and new functions is that
the old one asserted success while the new one doesn't.
It's common not to assert elsewhere when taking subregs;
normally a null result is enough.

Later patches will make more use of the new function.

gcc/
	* explow.h (force_lowpart_subreg): Declare.
	* explow.cc (force_lowpart_subreg): New function.
	* optabs.cc (lowpart_subreg_maybe_copy): Delete.
	(expand_absneg_bit): Use force_lowpart_subreg instead of
	lowpart_subreg_maybe_copy.
	(expand_copysign_bit): Likewise.
---
 gcc/explow.cc | 14 ++++++++++++++
 gcc/explow.h  |  1 +
 gcc/optabs.cc | 24 ++----------------------
 3 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index bd93c878064..2a91cf76ea6 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -764,6 +764,20 @@ force_subreg (machine_mode outermode, rtx op,
   return res;
 }
 
+/* Try to return an rvalue expression for the OUTERMODE lowpart of OP,
+   which has mode INNERMODE.  Allow OP to be forced into a new register
+   if necessary.
+
+   Return null on failure.  */
+
+rtx
+force_lowpart_subreg (machine_mode outermode, rtx op,
+		      machine_mode innermode)
+{
+  auto byte = subreg_lowpart_offset (outermode, innermode);
+  return force_subreg (outermode, op, innermode, byte);
+}
+
 /* If X is a memory ref, copy its contents to a new temp reg and return
    that reg.  Otherwise, return X.  */
 
diff --git a/gcc/explow.h b/gcc/explow.h
index cbd1fcb7eb3..dd654649b06 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -43,6 +43,7 @@ extern rtx copy_to_suggested_reg (rtx, rtx, machine_mode);
 extern rtx force_reg (machine_mode, rtx);
 
 extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64);
+extern rtx force_lowpart_subreg (machine_mode, rtx, machine_mode);
 
 /* Return given rtx, copied into a new temp reg if it was in memory.  */
 extern rtx force_not_mem (rtx);
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index c54d275b8b7..d569742beea 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -3096,26 +3096,6 @@ expand_ffs (scalar_int_mode mode, rtx op0, rtx target)
   return 0;
 }
 
-/* Extract the OMODE lowpart from VAL, which has IMODE.  Under certain
-   conditions, VAL may already be a SUBREG against which we cannot generate
-   a further SUBREG.  In this case, we expect forcing the value into a
-   register will work around the situation.  */
-
-static rtx
-lowpart_subreg_maybe_copy (machine_mode omode, rtx val,
-			   machine_mode imode)
-{
-  rtx ret;
-  ret = lowpart_subreg (omode, val, imode);
-  if (ret == NULL)
-    {
-      val = force_reg (imode, val);
-      ret = lowpart_subreg (omode, val, imode);
-      gcc_assert (ret != NULL);
-    }
-  return ret;
-}
-
 /* Expand a floating point absolute value or negation operation via a
    logical operation on the sign bit.  */
 
@@ -3204,7 +3184,7 @@ expand_absneg_bit (enum rtx_code code, scalar_float_mode mode,
 			   gen_lowpart (imode, op0),
 			   immed_wide_int_const (mask, imode),
 		           gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN);
-      target = lowpart_subreg_maybe_copy (mode, temp, imode);
+      target = force_lowpart_subreg (mode, temp, imode);
 
       set_dst_reg_note (get_last_insn (), REG_EQUAL,
 			gen_rtx_fmt_e (code, mode, copy_rtx (op0)),
@@ -4043,7 +4023,7 @@ expand_copysign_bit (scalar_float_mode mode, rtx op0, rtx op1, rtx target,
 
       temp = expand_binop (imode, ior_optab, op0, op1,
 			   gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN);
-      target = lowpart_subreg_maybe_copy (mode, temp, imode);
+      target = force_lowpart_subreg (mode, temp, imode);
     }
 
   return target;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 5/8] aarch64: Add some uses of force_lowpart_subreg
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
                   ` (3 preceding siblings ...)
  2024-06-17  9:53 ` [PATCH 4/8] Add force_lowpart_subreg Richard Sandiford
@ 2024-06-17  9:53 ` Richard Sandiford
  2024-06-17  9:53 ` [PATCH 6/8] Make more use " Richard Sandiford
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

This patch makes more use of force_lowpart_subreg, similarly
to the recent patch for force_subreg.  The criteria were:

(1) The code is obviously specific to expand (where new pseudos
    can be created).

(2) The value is obviously an rvalue rather than an lvalue.

gcc/
	PR target/115464
	* config/aarch64/aarch64-builtins.cc (aarch64_expand_fcmla_builtin)
	(aarch64_expand_rwsr_builtin): Use force_lowpart_subreg instead of
	simplify_gen_subreg and lowpart_subreg.
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svset_neonq_impl::expand): Likewise.
	* config/aarch64/aarch64-sve-builtins-sme.cc
	(add_load_store_slice_operand): Likewise.
	* config/aarch64/aarch64.cc (aarch64_sve_reinterpret): Likewise.
	(aarch64_addti_scratch_regs, aarch64_subvti_scratch_regs): Likewise.

gcc/testsuite/
	PR target/115464
	* gcc.target/aarch64/sve/acle/general/pr115464_2.c: New test.
---
 gcc/config/aarch64/aarch64-builtins.cc             | 11 +++++------
 gcc/config/aarch64/aarch64-sve-builtins-base.cc    |  2 +-
 gcc/config/aarch64/aarch64-sve-builtins-sme.cc     |  2 +-
 gcc/config/aarch64/aarch64.cc                      | 14 +++++---------
 .../aarch64/sve/acle/general/pr115464_2.c          | 11 +++++++++++
 5 files changed, 23 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index 7d827cbc2ac..30669f8aa18 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2579,8 +2579,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
   int lane = INTVAL (lane_idx);
 
   if (lane < nunits / 4)
-    op2 = simplify_gen_subreg (d->mode, op2, quadmode,
-			       subreg_lowpart_offset (d->mode, quadmode));
+    op2 = force_lowpart_subreg (d->mode, op2, quadmode);
   else
     {
       /* Select the upper 64 bits, either a V2SF or V4HF, this however
@@ -2590,8 +2589,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
 	 gen_highpart_mode generates code that isn't optimal.  */
       rtx temp1 = gen_reg_rtx (d->mode);
       rtx temp2 = gen_reg_rtx (DImode);
-      temp1 = simplify_gen_subreg (d->mode, op2, quadmode,
-				   subreg_lowpart_offset (d->mode, quadmode));
+      temp1 = force_lowpart_subreg (d->mode, op2, quadmode);
       temp1 = force_subreg (V2DImode, temp1, d->mode, 0);
       if (BYTES_BIG_ENDIAN)
 	emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx));
@@ -2836,7 +2834,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int fcode)
 	case AARCH64_WSR64:
 	case AARCH64_WSRF64:
 	case AARCH64_WSR128:
-	  subreg = lowpart_subreg (sysreg_mode, input_val, mode);
+	  subreg = force_lowpart_subreg (sysreg_mode, input_val, mode);
 	  break;
 	case AARCH64_WSRF:
 	  subreg = gen_lowpart_SUBREG (SImode, input_val);
@@ -2871,7 +2869,8 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int fcode)
     case AARCH64_RSR64:
     case AARCH64_RSRF64:
     case AARCH64_RSR128:
-      return lowpart_subreg (TYPE_MODE (TREE_TYPE (exp)), target, sysreg_mode);
+      return force_lowpart_subreg (TYPE_MODE (TREE_TYPE (exp)),
+				   target, sysreg_mode);
     case AARCH64_RSRF:
       subreg = gen_lowpart_SUBREG (SImode, target);
       return gen_lowpart_SUBREG (SFmode, subreg);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 99932037124..aa26370d397 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1183,7 +1183,7 @@ public:
     if (BYTES_BIG_ENDIAN)
       return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode));
     insn_code icode = code_for_vcond_mask (mode, mode);
-    e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1]));
+    e.args[1] = force_lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1]));
     e.add_output_operand (icode);
     e.add_input_operand (icode, e.args[1]);
     e.add_input_operand (icode, e.args[0]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
index f4c91bcbb95..b66b35ae60b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
@@ -112,7 +112,7 @@ add_load_store_slice_operand (function_expander &e, insn_code icode,
   rtx base = e.args[argno];
   if (e.mode_suffix_id == MODE_vnum)
     {
-      rtx vnum = lowpart_subreg (SImode, e.args[vnum_argno], DImode);
+      rtx vnum = force_lowpart_subreg (SImode, e.args[vnum_argno], DImode);
       base = simplify_gen_binary (PLUS, SImode, base, vnum);
     }
   e.add_input_operand (icode, base);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 149e5b2f69a..c952a7cdefe 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3288,7 +3288,7 @@ aarch64_sve_reinterpret (machine_mode mode, rtx x)
   /* can_change_mode_class must only return true if subregs and svreinterprets
      have the same semantics.  */
   if (targetm.can_change_mode_class (GET_MODE (x), mode, FP_REGS))
-    return lowpart_subreg (mode, x, GET_MODE (x));
+    return force_lowpart_subreg (mode, x, GET_MODE (x));
 
   rtx res = gen_reg_rtx (mode);
   x = force_reg (GET_MODE (x), x);
@@ -26870,9 +26870,8 @@ aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
 			    rtx *high_in2)
 {
   *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = gen_lowpart (DImode, op1);
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
+  *low_in1 = force_lowpart_subreg (DImode, op1, TImode);
+  *low_in2 = force_lowpart_subreg (DImode, op2, TImode);
   *high_dest = gen_reg_rtx (DImode);
   *high_in1 = gen_highpart (DImode, op1);
   *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
@@ -26904,11 +26903,8 @@ aarch64_subvti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
 			     rtx *high_in2)
 {
   *low_dest = gen_reg_rtx (DImode);
-  *low_in1 = simplify_gen_subreg (DImode, op1, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
-
-  *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				  subreg_lowpart_offset (DImode, TImode));
+  *low_in1 = force_lowpart_subreg (DImode, op1, TImode);
+  *low_in2 = force_lowpart_subreg (DImode, op2, TImode);
   *high_dest = gen_reg_rtx (DImode);
 
   *high_in1 = simplify_gen_subreg (DImode, op1, TImode,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c
new file mode 100644
index 00000000000..f561c34f732
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c
@@ -0,0 +1,11 @@
+/* { dg-options "-O2" } */
+
+#include <arm_neon.h>
+#include <arm_sve.h>
+#include <arm_neon_sve_bridge.h>
+
+svuint16_t
+convolve4_4_x (uint16x8x2_t permute_tbl, svuint16_t a)
+{
+    return svset_neonq_u16 (a, permute_tbl.val[1]);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 6/8] Make more use of force_lowpart_subreg
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
                   ` (4 preceding siblings ...)
  2024-06-17  9:53 ` [PATCH 5/8] aarch64: Add some uses of force_lowpart_subreg Richard Sandiford
@ 2024-06-17  9:53 ` Richard Sandiford
  2024-06-17  9:53 ` [PATCH 7/8] Add force_highpart_subreg Richard Sandiford
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

This patch makes target-independent code use force_lowpart_subreg
instead of simplify_gen_subreg and lowpart_subreg in some places.
The criteria were:

(1) The code is obviously specific to expand (where new pseudos
    can be created), or at least would be invalid to call when
    !can_create_pseudo_p () and temporaries are needed.

(2) The value is obviously an rvalue rather than an lvalue.

Doing this should reduce the likelihood of bugs like PR115464
occuring in other situations.

gcc/
	* builtins.cc (expand_builtin_issignaling): Use force_lowpart_subreg
	instead of simplify_gen_subreg and lowpart_subreg.
	* expr.cc (convert_mode_scalar, expand_expr_real_2): Likewise.
	* optabs.cc (expand_doubleword_mod): Likewise.
---
 gcc/builtins.cc |  7 ++-----
 gcc/expr.cc     | 17 +++++++++--------
 gcc/optabs.cc   |  2 +-
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 5b5307c67b8..bde517b639e 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2940,8 +2940,7 @@ expand_builtin_issignaling (tree exp, rtx target)
 	  {
 	    hi = simplify_gen_subreg (imode, temp, fmode,
 				      subreg_highpart_offset (imode, fmode));
-	    lo = simplify_gen_subreg (imode, temp, fmode,
-				      subreg_lowpart_offset (imode, fmode));
+	    lo = force_lowpart_subreg (imode, temp, fmode);
 	    if (!hi || !lo)
 	      {
 		scalar_int_mode imode2;
@@ -2951,9 +2950,7 @@ expand_builtin_issignaling (tree exp, rtx target)
 		    hi = simplify_gen_subreg (imode, temp2, imode2,
 					      subreg_highpart_offset (imode,
 								      imode2));
-		    lo = simplify_gen_subreg (imode, temp2, imode2,
-					      subreg_lowpart_offset (imode,
-								     imode2));
+		    lo = force_lowpart_subreg (imode, temp2, imode2);
 		  }
 	      }
 	    if (!hi || !lo)
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 31a7346e33f..ffbac513692 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -423,7 +423,8 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
 					0).exists (&toi_mode))
 		{
 		  start_sequence ();
-		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+		  rtx fromi = force_lowpart_subreg (fromi_mode, from,
+						    from_mode);
 		  rtx tof = NULL_RTX;
 		  if (fromi)
 		    {
@@ -443,7 +444,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
 					      NULL_RTX, 1);
 		      if (toi)
 			{
-			  tof = lowpart_subreg (to_mode, toi, toi_mode);
+			  tof = force_lowpart_subreg (to_mode, toi, toi_mode);
 			  if (tof)
 			    emit_move_insn (to, tof);
 			}
@@ -475,7 +476,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
 				    0).exists (&toi_mode))
 	    {
 	      start_sequence ();
-	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+	      rtx fromi = force_lowpart_subreg (fromi_mode, from, from_mode);
 	      rtx tof = NULL_RTX;
 	      do
 		{
@@ -510,11 +511,11 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
 						  temp4, shift, NULL_RTX, 1);
 		  if (!temp5)
 		    break;
-		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
+		  rtx temp6 = force_lowpart_subreg (toi_mode, temp5,
+						    fromi_mode);
 		  if (!temp6)
 		    break;
-		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
-					toi_mode);
+		  tof = force_lowpart_subreg (to_mode, temp6, toi_mode);
 		  if (tof)
 		    emit_move_insn (to, tof);
 		}
@@ -9784,9 +9785,9 @@ expand_expr_real_2 (const_sepops ops, rtx target, machine_mode tmode,
 	    inner_mode = TYPE_MODE (inner_type);
 
 	  if (modifier == EXPAND_INITIALIZER)
-	    op0 = lowpart_subreg (mode, op0, inner_mode);
+	    op0 = force_lowpart_subreg (mode, op0, inner_mode);
 	  else
-	    op0=  convert_modes (mode, inner_mode, op0,
+	    op0 = convert_modes (mode, inner_mode, op0,
 				 TYPE_UNSIGNED (inner_type));
 	}
 
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index d569742beea..185c5b1a705 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1085,7 +1085,7 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp)
 					 NULL_RTX, 1, OPTAB_DIRECT);
 	      if (v == NULL_RTX)
 		return NULL_RTX;
-	      v = lowpart_subreg (word_mode, v, mode);
+	      v = force_lowpart_subreg (word_mode, v, mode);
 	      if (v == NULL_RTX)
 		return NULL_RTX;
 	      if (i != count - 1)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 7/8] Add force_highpart_subreg
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
                   ` (5 preceding siblings ...)
  2024-06-17  9:53 ` [PATCH 6/8] Make more use " Richard Sandiford
@ 2024-06-17  9:53 ` Richard Sandiford
  2024-06-17  9:53 ` [PATCH 8/8] aarch64: Add some uses of force_highpart_subreg Richard Sandiford
  2024-06-18 11:10 ` [PATCH 0/8] Follow-on force_subreg patches Richard Biener
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

This patch adds a force_highpart_subreg to go along with the
recently added force_lowpart_subreg.

gcc/
	* explow.h (force_highpart_subreg): Declare.
	* explow.cc (force_highpart_subreg): New function.
	* builtins.cc (expand_builtin_issignaling): Use it.
	* expmed.cc (emit_store_flag_1): Likewise.
---
 gcc/builtins.cc | 15 ++++-----------
 gcc/explow.cc   | 14 ++++++++++++++
 gcc/explow.h    |  1 +
 gcc/expmed.cc   |  4 +---
 4 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index bde517b639e..d467d1697b4 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2835,9 +2835,7 @@ expand_builtin_issignaling (tree exp, rtx target)
 	     it is, working on the DImode high part is usually better.  */
 	  if (!MEM_P (temp))
 	    {
-	      if (rtx t = simplify_gen_subreg (imode, temp, fmode,
-					       subreg_highpart_offset (imode,
-								       fmode)))
+	      if (rtx t = force_highpart_subreg (imode, temp, fmode))
 		hi = t;
 	      else
 		{
@@ -2845,9 +2843,7 @@ expand_builtin_issignaling (tree exp, rtx target)
 		  if (int_mode_for_mode (fmode).exists (&imode2))
 		    {
 		      rtx temp2 = gen_lowpart (imode2, temp);
-		      poly_uint64 off = subreg_highpart_offset (imode, imode2);
-		      if (rtx t = simplify_gen_subreg (imode, temp2,
-						       imode2, off))
+		      if (rtx t = force_highpart_subreg (imode, temp2, imode2))
 			hi = t;
 		    }
 		}
@@ -2938,8 +2934,7 @@ expand_builtin_issignaling (tree exp, rtx target)
 	   it is, working on DImode parts is usually better.  */
 	if (!MEM_P (temp))
 	  {
-	    hi = simplify_gen_subreg (imode, temp, fmode,
-				      subreg_highpart_offset (imode, fmode));
+	    hi = force_highpart_subreg (imode, temp, fmode);
 	    lo = force_lowpart_subreg (imode, temp, fmode);
 	    if (!hi || !lo)
 	      {
@@ -2947,9 +2942,7 @@ expand_builtin_issignaling (tree exp, rtx target)
 		if (int_mode_for_mode (fmode).exists (&imode2))
 		  {
 		    rtx temp2 = gen_lowpart (imode2, temp);
-		    hi = simplify_gen_subreg (imode, temp2, imode2,
-					      subreg_highpart_offset (imode,
-								      imode2));
+		    hi = force_highpart_subreg (imode, temp2, imode2);
 		    lo = force_lowpart_subreg (imode, temp2, imode2);
 		  }
 	      }
diff --git a/gcc/explow.cc b/gcc/explow.cc
index 2a91cf76ea6..b4a0df89bc3 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -778,6 +778,20 @@ force_lowpart_subreg (machine_mode outermode, rtx op,
   return force_subreg (outermode, op, innermode, byte);
 }
 
+/* Try to return an rvalue expression for the OUTERMODE highpart of OP,
+   which has mode INNERMODE.  Allow OP to be forced into a new register
+   if necessary.
+
+   Return null on failure.  */
+
+rtx
+force_highpart_subreg (machine_mode outermode, rtx op,
+		       machine_mode innermode)
+{
+  auto byte = subreg_highpart_offset (outermode, innermode);
+  return force_subreg (outermode, op, innermode, byte);
+}
+
 /* If X is a memory ref, copy its contents to a new temp reg and return
    that reg.  Otherwise, return X.  */
 
diff --git a/gcc/explow.h b/gcc/explow.h
index dd654649b06..de89e9e2933 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -44,6 +44,7 @@ extern rtx force_reg (machine_mode, rtx);
 
 extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64);
 extern rtx force_lowpart_subreg (machine_mode, rtx, machine_mode);
+extern rtx force_highpart_subreg (machine_mode, rtx, machine_mode);
 
 /* Return given rtx, copied into a new temp reg if it was in memory.  */
 extern rtx force_not_mem (rtx);
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 1f68e7be721..3b9475f5aa0 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -5784,9 +5784,7 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx op0, rtx op1,
 	  rtx op0h;
 
 	  /* If testing the sign bit, can just test on high word.  */
-	  op0h = simplify_gen_subreg (word_mode, op0, int_mode,
-				      subreg_highpart_offset (word_mode,
-							      int_mode));
+	  op0h = force_highpart_subreg (word_mode, op0, int_mode);
 	  tem = emit_store_flag (NULL_RTX, code, op0h, op1, word_mode,
 				 unsignedp, normalizep);
 	}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 8/8] aarch64: Add some uses of force_highpart_subreg
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
                   ` (6 preceding siblings ...)
  2024-06-17  9:53 ` [PATCH 7/8] Add force_highpart_subreg Richard Sandiford
@ 2024-06-17  9:53 ` Richard Sandiford
  2024-06-18 11:10 ` [PATCH 0/8] Follow-on force_subreg patches Richard Biener
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-17  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford

This patch adds uses of force_highpart_subreg to places that
already use force_lowpart_subreg.

gcc/
	* config/aarch64/aarch64.cc (aarch64_addti_scratch_regs): Use
	force_highpart_subreg instead of gen_highpart and simplify_gen_subreg.
	(aarch64_subvti_scratch_regs): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index c952a7cdefe..026f8627a89 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26873,19 +26873,12 @@ aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
   *low_in1 = force_lowpart_subreg (DImode, op1, TImode);
   *low_in2 = force_lowpart_subreg (DImode, op2, TImode);
   *high_dest = gen_reg_rtx (DImode);
-  *high_in1 = gen_highpart (DImode, op1);
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				   subreg_highpart_offset (DImode, TImode));
+  *high_in1 = force_highpart_subreg (DImode, op1, TImode);
+  *high_in2 = force_highpart_subreg (DImode, op2, TImode);
 }
 
 /* Generate DImode scratch registers for 128-bit (TImode) subtraction.
 
-   This function differs from 'arch64_addti_scratch_regs' in that
-   OP1 can be an immediate constant (zero). We must call
-   subreg_highpart_offset with DImode and TImode arguments, otherwise
-   VOIDmode will be used for the const_int which generates an internal
-   error from subreg_size_highpart_offset which does not expect a size of zero.
-
    OP1 represents the TImode destination operand 1
    OP2 represents the TImode destination operand 2
    LOW_DEST represents the low half (DImode) of TImode operand 0
@@ -26907,10 +26900,8 @@ aarch64_subvti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
   *low_in2 = force_lowpart_subreg (DImode, op2, TImode);
   *high_dest = gen_reg_rtx (DImode);
 
-  *high_in1 = simplify_gen_subreg (DImode, op1, TImode,
-				   subreg_highpart_offset (DImode, TImode));
-  *high_in2 = simplify_gen_subreg (DImode, op2, TImode,
-				   subreg_highpart_offset (DImode, TImode));
+  *high_in1 = force_highpart_subreg (DImode, op1, TImode);
+  *high_in2 = force_highpart_subreg (DImode, op2, TImode);
 }
 
 /* Generate RTL for 128-bit (TImode) subtraction with overflow.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] Follow-on force_subreg patches
  2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
                   ` (7 preceding siblings ...)
  2024-06-17  9:53 ` [PATCH 8/8] aarch64: Add some uses of force_highpart_subreg Richard Sandiford
@ 2024-06-18 11:10 ` Richard Biener
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Biener @ 2024-06-18 11:10 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

On Mon, Jun 17, 2024 at 11:55 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This series expands on the fix for PR115464 by using force_subreg
> in more places.  It also adds some convenience wrappers for lowpart
> and highpart subregs.
>
> A part of this will need to be backported after a grace period,
> but I'll post the cherry-picked parts separately.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard Sandiford (8):
>   Make force_subreg emit nothing on failure
>   aarch64: Use force_subreg in more places
>   Make more use of force_subreg
>   Add force_lowpart_subreg
>   aarch64: Add some uses of force_lowpart_subreg
>   Make more use of force_lowpart_subreg
>   Add force_highpart_subreg
>   aarch64: Add some uses of force_highpart_subreg
>
>  gcc/builtins.cc                               | 22 +++-------
>  gcc/config/aarch64/aarch64-builtins.cc        | 15 +++----
>  gcc/config/aarch64/aarch64-simd.md            |  4 +-
>  .../aarch64/aarch64-sve-builtins-base.cc      | 10 ++---
>  .../aarch64/aarch64-sve-builtins-functions.h  |  6 +--
>  .../aarch64/aarch64-sve-builtins-sme.cc       |  2 +-
>  gcc/config/aarch64/aarch64.cc                 | 31 ++++---------
>  gcc/explow.cc                                 | 34 +++++++++++++-
>  gcc/explow.h                                  |  2 +
>  gcc/expmed.cc                                 | 26 ++++-------
>  gcc/expr.cc                                   | 44 +++++++++----------
>  gcc/optabs.cc                                 | 26 ++---------
>  .../aarch64/sve/acle/general/pr115464_2.c     | 11 +++++
>  13 files changed, 111 insertions(+), 122 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/8] Make more use of force_subreg
  2024-06-17  9:53 ` [PATCH 3/8] Make more use of force_subreg Richard Sandiford
@ 2024-06-21 20:10   ` Jeff Law
  2024-06-22  1:30     ` Andrew Pinski
  2024-06-25  8:42     ` Richard Sandiford
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff Law @ 2024-06-21 20:10 UTC (permalink / raw)
  To: Richard Sandiford, gcc-patches



On 6/17/24 3:53 AM, Richard Sandiford wrote:
> This patch makes target-independent code use force_subreg instead
> of simplify_gen_subreg in some places.  The criteria were:
> 
> (1) The code is obviously specific to expand (where new pseudos
>      can be created), or at least would be invalid to call when
>      !can_create_pseudo_p () and temporaries are needed.
> 
> (2) The value is obviously an rvalue rather than an lvalue.
> 
> (3) The offset wasn't a simple lowpart or highpart calculation;
>      a later patch will deal with those.
> 
> Doing this should reduce the likelihood of bugs like PR115464
> occuring in other situations.
> 
> gcc/
> 	* expmed.cc (store_bit_field_using_insv): Use force_subreg
> 	instead of simplify_gen_subreg.
> 	(store_bit_field_1): Likewise.
> 	(extract_bit_field_as_subreg): Likewise.
> 	(extract_integral_bit_field): Likewise.
> 	(emit_store_flag_1): Likewise.
> 	* expr.cc (convert_move): Likewise.
> 	(convert_modes): Likewise.
> 	(emit_group_load_1): Likewise.
> 	(emit_group_store): Likewise.
> 	(expand_assignment): Likewise.
[ ... ]

So this has triggered a failure on ft32-elf with this testcase 
(simplified from the testsuite):

typedef _Bool bool;
const bool false = 0;
const bool true = 1;

struct RenderBox
{
   bool m_positioned : 1;
};

typedef struct RenderBox RenderBox;


void RenderBox_setStyle(RenderBox *thisin)
{
   RenderBox *this = thisin;
   bool ltrue = true;
   this->m_positioned = ltrue;

}



Before this change we generated this:

> (insn 13 12 14 (set (reg:QI 47)
>         (mem/c:QI (plus:SI (reg/f:SI 37 virtual-stack-vars)
>                 (const_int -5 [0xfffffffffffffffb])) [1 ltrue+0 S1 A8])) "j.c":17:22 -1
>      (nil))
> 
> (insn 14 13 15 (parallel [
>             (set (zero_extract:SI (subreg:SI (reg:QI 46) 0)
>                     (const_int 1 [0x1])
>                     (const_int 0 [0]))
>                 (subreg:SI (reg:QI 47) 0))
>             (clobber (scratch:SI))
>         ]) "j.c":17:22 -1
>      (nil))


Afterwards we generate:

> (insn 13 12 14 2 (parallel [
>             (set (zero_extract:SI (subreg:SI (reg:QI 46) 0)
>                     (const_int 1 [0x1])
>                     (const_int 0 [0]))
>                 (subreg:SI (mem/c:QI (plus:SI (reg/f:SI 37 virtual-stack-vars)
>                             (const_int -5 [0xfffffffffffffffb])) [1 ltrue+0 S1 A8]) 0))
>             (clobber (scratch:SI))
>         ]) "j.c":17:22 -1
>      (nil))

Note the (subreg (mem (...)).  Probably not desirable in general, but 
also note the virtual-stack-vars in the memory address.  The code to 
instantiate virtual registers doesn't handle (subreg (mem)), so we never 
convert that to an FP based address and we eventually fault.

Should be visible with ft32-elf cross compiler.  No options needed.

Jeff



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/8] Make more use of force_subreg
  2024-06-21 20:10   ` Jeff Law
@ 2024-06-22  1:30     ` Andrew Pinski
  2024-06-25  8:42     ` Richard Sandiford
  1 sibling, 0 replies; 13+ messages in thread
From: Andrew Pinski @ 2024-06-22  1:30 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Sandiford, gcc-patches

On Fri, Jun 21, 2024 at 1:11 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 6/17/24 3:53 AM, Richard Sandiford wrote:
> > This patch makes target-independent code use force_subreg instead
> > of simplify_gen_subreg in some places.  The criteria were:
> >
> > (1) The code is obviously specific to expand (where new pseudos
> >      can be created), or at least would be invalid to call when
> >      !can_create_pseudo_p () and temporaries are needed.
> >
> > (2) The value is obviously an rvalue rather than an lvalue.
> >
> > (3) The offset wasn't a simple lowpart or highpart calculation;
> >      a later patch will deal with those.
> >
> > Doing this should reduce the likelihood of bugs like PR115464
> > occuring in other situations.
> >
> > gcc/
> >       * expmed.cc (store_bit_field_using_insv): Use force_subreg
> >       instead of simplify_gen_subreg.
> >       (store_bit_field_1): Likewise.
> >       (extract_bit_field_as_subreg): Likewise.
> >       (extract_integral_bit_field): Likewise.
> >       (emit_store_flag_1): Likewise.
> >       * expr.cc (convert_move): Likewise.
> >       (convert_modes): Likewise.
> >       (emit_group_load_1): Likewise.
> >       (emit_group_store): Likewise.
> >       (expand_assignment): Likewise.
> [ ... ]
>
> So this has triggered a failure on ft32-elf with this testcase
> (simplified from the testsuite):
>
> typedef _Bool bool;
> const bool false = 0;
> const bool true = 1;
>
> struct RenderBox
> {
>    bool m_positioned : 1;
> };
>
> typedef struct RenderBox RenderBox;
>
>
> void RenderBox_setStyle(RenderBox *thisin)
> {
>    RenderBox *this = thisin;
>    bool ltrue = true;
>    this->m_positioned = ltrue;
>
> }
>
>
>
> Before this change we generated this:
>
> > (insn 13 12 14 (set (reg:QI 47)
> >         (mem/c:QI (plus:SI (reg/f:SI 37 virtual-stack-vars)
> >                 (const_int -5 [0xfffffffffffffffb])) [1 ltrue+0 S1 A8])) "j.c":17:22 -1
> >      (nil))
> >
> > (insn 14 13 15 (parallel [
> >             (set (zero_extract:SI (subreg:SI (reg:QI 46) 0)
> >                     (const_int 1 [0x1])
> >                     (const_int 0 [0]))
> >                 (subreg:SI (reg:QI 47) 0))
> >             (clobber (scratch:SI))
> >         ]) "j.c":17:22 -1
> >      (nil))
>
>
> Afterwards we generate:
>
> > (insn 13 12 14 2 (parallel [
> >             (set (zero_extract:SI (subreg:SI (reg:QI 46) 0)
> >                     (const_int 1 [0x1])
> >                     (const_int 0 [0]))
> >                 (subreg:SI (mem/c:QI (plus:SI (reg/f:SI 37 virtual-stack-vars)
> >                             (const_int -5 [0xfffffffffffffffb])) [1 ltrue+0 S1 A8]) 0))
> >             (clobber (scratch:SI))
> >         ]) "j.c":17:22 -1
> >      (nil))
>
> Note the (subreg (mem (...)).  Probably not desirable in general, but
> also note the virtual-stack-vars in the memory address.  The code to
> instantiate virtual registers doesn't handle (subreg (mem)), so we never
> convert that to an FP based address and we eventually fault.

We should really get rid of the support of `(subreg (mem))` as a valid
for register_operand (recorg.cc).
Two ideas on how to fix this before removing `(subreg (mem))` support
from register_operand:
1) Maybe for now reject subreg of mem inside validate_subreg that have
virtual-stack-vars addresses.
2) Or we add the code to instantiate virtual registers to handle (subreg (mem)).

Maybe we should just bite the bullet and remove support of `(subreg
(mem))` from register_operand instead of hacking around this
preexisting mess.

Also see
https://inbox.sourceware.org/gcc-patches/485B2857.2070003@naturalbridge.com/

Which is from 2008 about this subreg of mem.

Thanks,
Andrew

>
> Should be visible with ft32-elf cross compiler.  No options needed.
>
> Jeff
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/8] Make more use of force_subreg
  2024-06-21 20:10   ` Jeff Law
  2024-06-22  1:30     ` Andrew Pinski
@ 2024-06-25  8:42     ` Richard Sandiford
  1 sibling, 0 replies; 13+ messages in thread
From: Richard Sandiford @ 2024-06-25  8:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <jeffreyalaw@gmail.com> writes:
> On 6/17/24 3:53 AM, Richard Sandiford wrote:
>> This patch makes target-independent code use force_subreg instead
>> of simplify_gen_subreg in some places.  The criteria were:
>> 
>> (1) The code is obviously specific to expand (where new pseudos
>>      can be created), or at least would be invalid to call when
>>      !can_create_pseudo_p () and temporaries are needed.
>> 
>> (2) The value is obviously an rvalue rather than an lvalue.
>> 
>> (3) The offset wasn't a simple lowpart or highpart calculation;
>>      a later patch will deal with those.
>> 
>> Doing this should reduce the likelihood of bugs like PR115464
>> occuring in other situations.
>> 
>> gcc/
>> 	* expmed.cc (store_bit_field_using_insv): Use force_subreg
>> 	instead of simplify_gen_subreg.
>> 	(store_bit_field_1): Likewise.
>> 	(extract_bit_field_as_subreg): Likewise.
>> 	(extract_integral_bit_field): Likewise.
>> 	(emit_store_flag_1): Likewise.
>> 	* expr.cc (convert_move): Likewise.
>> 	(convert_modes): Likewise.
>> 	(emit_group_load_1): Likewise.
>> 	(emit_group_store): Likewise.
>> 	(expand_assignment): Likewise.
> [ ... ]
>
> So this has triggered a failure on ft32-elf with this testcase 
> (simplified from the testsuite):
>
> typedef _Bool bool;
> const bool false = 0;
> const bool true = 1;
>
> struct RenderBox
> {
>    bool m_positioned : 1;
> };
>
> typedef struct RenderBox RenderBox;
>
>
> void RenderBox_setStyle(RenderBox *thisin)
> {
>    RenderBox *this = thisin;
>    bool ltrue = true;
>    this->m_positioned = ltrue;
>
> }
>
>
>
> Before this change we generated this:
>
>> (insn 13 12 14 (set (reg:QI 47)
>>         (mem/c:QI (plus:SI (reg/f:SI 37 virtual-stack-vars)
>>                 (const_int -5 [0xfffffffffffffffb])) [1 ltrue+0 S1 A8])) "j.c":17:22 -1
>>      (nil))
>> 
>> (insn 14 13 15 (parallel [
>>             (set (zero_extract:SI (subreg:SI (reg:QI 46) 0)
>>                     (const_int 1 [0x1])
>>                     (const_int 0 [0]))
>>                 (subreg:SI (reg:QI 47) 0))
>>             (clobber (scratch:SI))
>>         ]) "j.c":17:22 -1
>>      (nil))
>
>
> Afterwards we generate:
>
>> (insn 13 12 14 2 (parallel [
>>             (set (zero_extract:SI (subreg:SI (reg:QI 46) 0)
>>                     (const_int 1 [0x1])
>>                     (const_int 0 [0]))
>>                 (subreg:SI (mem/c:QI (plus:SI (reg/f:SI 37 virtual-stack-vars)
>>                             (const_int -5 [0xfffffffffffffffb])) [1 ltrue+0 S1 A8]) 0))
>>             (clobber (scratch:SI))
>>         ]) "j.c":17:22 -1
>>      (nil))
>
> Note the (subreg (mem (...)).  Probably not desirable in general, but 
> also note the virtual-stack-vars in the memory address.  The code to 
> instantiate virtual registers doesn't handle (subreg (mem)), so we never 
> convert that to an FP based address and we eventually fault.
>
> Should be visible with ft32-elf cross compiler.  No options needed.

Bah.  Thanks for the report.

I agree of course with the follow-on discussion that we should get
rid of (subreg (mem)).  But this was supposed to be a conservative
patch.  I've therefore reverted the offending part of the commit,
as below.  (Tested on aarch64-linux-gnu.)

Richard


One of the changes in g:d4047da6a070175aae7121c739d1cad6b08ff4b2
caused a regression in ft32-elf; see:

    https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655418.html

for details.  This change was different from the others in that the
original call was to simplify_subreg rather than simplify_lowpart_subreg.
The old code would therefore go on to do the force_reg for more cases
than the new code would.

gcc/
	* expmed.cc (store_bit_field_using_insv): Revert earlier change
	to use force_subreg instead of simplify_gen_subreg.
---
 gcc/expmed.cc | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 3b9475f5aa0..8bbbc94a98c 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -695,7 +695,13 @@ store_bit_field_using_insv (const extraction_insn *insv, rtx op0,
 	     if we must narrow it, be sure we do it correctly.  */
 
 	  if (GET_MODE_SIZE (value_mode) < GET_MODE_SIZE (op_mode))
-	    tmp = force_subreg (op_mode, value1, value_mode, 0);
+	    {
+	      tmp = simplify_subreg (op_mode, value1, value_mode, 0);
+	      if (! tmp)
+		tmp = simplify_gen_subreg (op_mode,
+					   force_reg (value_mode, value1),
+					   value_mode, 0);
+	    }
 	  else
 	    {
 	      if (targetm.mode_rep_extended (op_mode, value_mode) != UNKNOWN)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-06-25  8:42 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-17  9:53 [PATCH 0/8] Follow-on force_subreg patches Richard Sandiford
2024-06-17  9:53 ` [PATCH 1/8] Make force_subreg emit nothing on failure Richard Sandiford
2024-06-17  9:53 ` [PATCH 2/8] aarch64: Use force_subreg in more places Richard Sandiford
2024-06-17  9:53 ` [PATCH 3/8] Make more use of force_subreg Richard Sandiford
2024-06-21 20:10   ` Jeff Law
2024-06-22  1:30     ` Andrew Pinski
2024-06-25  8:42     ` Richard Sandiford
2024-06-17  9:53 ` [PATCH 4/8] Add force_lowpart_subreg Richard Sandiford
2024-06-17  9:53 ` [PATCH 5/8] aarch64: Add some uses of force_lowpart_subreg Richard Sandiford
2024-06-17  9:53 ` [PATCH 6/8] Make more use " Richard Sandiford
2024-06-17  9:53 ` [PATCH 7/8] Add force_highpart_subreg Richard Sandiford
2024-06-17  9:53 ` [PATCH 8/8] aarch64: Add some uses of force_highpart_subreg Richard Sandiford
2024-06-18 11:10 ` [PATCH 0/8] Follow-on force_subreg patches Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).