[PATCH] combine: Try harder to form zero_extends [PR106594]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Tamar Christina <Tamar.Christina@arm.com>,
	 Segher Boessenkool <segher@kernel.crashing.org>,
	 Roger Sayle <roger@nextmovesoftware.com>,
	 Jeff Law <jeffreyalaw@gmail.com>
Subject: [PATCH] combine: Try harder to form zero_extends [PR106594]
Date: Mon, 06 Mar 2023 12:47:06 +0000	[thread overview]
Message-ID: <mptilfe9hdh.fsf_-_@arm.com> (raw)
In-Reply-To: <3b1ed616-5d90-7a66-63b5-bdb5e320eebf@gmail.com> (Jeff Law via Gcc-patches's message of "Sun, 5 Mar 2023 12:56:02 -0700")

Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
>> 
>> The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled.
>> 
>> The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take".
>> 
>> So we need a way forward, even if it's stage-4.
> Then it needs to be in a way that works within the design constraints of 
> combine.
>
> As Segher has indicated, using a magic constant to say "this is always 
> cheap enough" isn't acceptable.  Furthermore, what this patch changes is 
> combine's internal canonicalization of extensions into shift pairs.
>
> So I think another path forward needs to be found.  I don't see hacking 
> up expand_compound_operation is viable.
>
> Jeff

How about the patch below?  Tested on aarch64-linux-gnu,
but I'll test on x86_64-linux-gnu and powerpc64le-linux-gnu
before committing.

-----------------------------

The PR contains a case where we want GCC to combine a sign_extend
into an address (which is something that GCC 12 could do).  After
substitution, the sign_extend goes through the usual
expand_compound_operation wrangler and, after taking nonzero_bits
into account, make_compound_operation is presented with:

  X1: (and (mult (subreg x) (const_int N2)) (const_int N1))

where:

(a) the subreg is paradoxical
(b) N2 is a power of 2
(c) all bits outside N1/N2 are known to be zero in x

This is equivalent to:

  X2: (mult (and (subreg x) (const_int N1/N2)) (const_int N2))

Given in this form, existing code would already use (c) to convert
the inner "and" to a zero_extend:

  (mult (zero_extend x) (const_int N2))

This patch makes the code handle X1 as well as X2.

Logically, it would make sense to do the same for ASHIFT, which
would be the canonical form outside memory addresses.  However, it
seemed better to do the minimum possible, given the late stage in
the release cycle.

gcc/
	PR rtl-optimization/106594
	* combine.cc (make_compound_operation_int): Extend the AND to
	ZERO_EXTEND transformation so that it can handle an intervening
	multiplication by a power of two.

gcc/testsuite/
	* gcc.target/aarch64/pr106594.c: New test.
---
 gcc/combine.cc                              | 60 ++++++++++++++-------
 gcc/testsuite/gcc.target/aarch64/pr106594.c | 21 ++++++++
 2 files changed, 63 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..b45042bbafd 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -8188,28 +8188,52 @@ make_compound_operation_int (scalar_int_mode mode, rtx *x_ptr,
       /* If the one operand is a paradoxical subreg of a register or memory and
 	 the constant (limited to the smaller mode) has only zero bits where
 	 the sub expression has known zero bits, this can be expressed as
-	 a zero_extend.  */
-      else if (GET_CODE (XEXP (x, 0)) == SUBREG)
-	{
-	  rtx sub;
+	 a zero_extend.
+
+	 Look also for the case where the operand is such a subreg that
+	 is multiplied by 2**N:
 
-	  sub = XEXP (XEXP (x, 0), 0);
-	  machine_mode sub_mode = GET_MODE (sub);
-	  int sub_width;
-	  if ((REG_P (sub) || MEM_P (sub))
-	      && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
-	      && sub_width < mode_width)
+	   (and (mult ... 2**N) M) --> (mult (and ... M>>N) 2**N) -> ...  */
+      else
+	{
+	  rtx y = XEXP (x, 0);
+	  rtx top = y;
+	  int shift = 0;
+	  if (GET_CODE (y) == MULT
+	      && CONST_INT_P (XEXP (y, 1))
+	      && pow2p_hwi (INTVAL (XEXP (y, 1))))
 	    {
-	      unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
-	      unsigned HOST_WIDE_INT mask;
+	      shift = exact_log2 (INTVAL (XEXP (y, 1)));
+	      y = XEXP (y, 0);
+	    }
+	  if (GET_CODE (y) == SUBREG)
+	    {
+	      rtx sub;
 
-	      /* original AND constant with all the known zero bits set */
-	      mask = UINTVAL (XEXP (x, 1)) | (~nonzero_bits (sub, sub_mode));
-	      if ((mask & mode_mask) == mode_mask)
+	      sub = XEXP (y, 0);
+	      machine_mode sub_mode = GET_MODE (sub);
+	      int sub_width;
+	      if ((REG_P (sub) || MEM_P (sub))
+		  && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
+		  && sub_width < mode_width)
 		{
-		  new_rtx = make_compound_operation (sub, next_code);
-		  new_rtx = make_extraction (mode, new_rtx, 0, 0, sub_width,
-					     1, 0, in_code == COMPARE);
+		  unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
+		  unsigned HOST_WIDE_INT mask;
+
+		  /* The shifted AND constant with all the known zero
+		     bits set.  */
+		  mask = ((UINTVAL (XEXP (x, 1)) >> shift)
+			  | ~nonzero_bits (sub, sub_mode));
+		  if ((mask & mode_mask) == mode_mask)
+		    {
+		      new_rtx = make_compound_operation (sub, next_code);
+		      new_rtx = make_extraction (mode, new_rtx,
+						 0, 0, sub_width,
+						 1, 0, in_code == COMPARE);
+		      if (shift)
+			new_rtx = gen_rtx_fmt_ee (GET_CODE (top), mode,
+						  new_rtx, XEXP (top, 1));
+		    }
 		}
 	    }
 	}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr106594.c b/gcc/testsuite/gcc.target/aarch64/pr106594.c
new file mode 100644
index 00000000000..f10d72665e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr106594.c
@@ -0,0 +1,21 @@
+/* { dg-options "-O2" } */
+
+extern const int constellation_64qam[64];
+
+void foo(int nbits,
+         const char *p_src,
+         int *p_dst) {
+
+  while (nbits > 0U) {
+    char first = *p_src++;
+
+    char index1 = ((first & 0x3) << 4) | (first >> 4);
+
+    *p_dst++ = constellation_64qam[index1];
+
+    nbits--;
+  }
+}
+
+/* { dg-final { scan-assembler {ldr\tw[0-9]+, \[x[0-9]+, w[0-9]+, [su]xtw #?2\]} } } */
+
-- 
2.25.1

next prev parent reply	other threads:[~2023-03-06 12:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-04 18:32 [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap Roger Sayle
2023-03-04 22:17 ` Segher Boessenkool
2023-03-05 19:28   ` Tamar Christina
2023-03-05 19:56     ` Jeff Law
2023-03-05 20:43       ` Tamar Christina
2023-03-05 21:33         ` Segher Boessenkool
2023-03-06 12:08           ` Segher Boessenkool
2023-03-06 12:11             ` Tamar Christina
2023-03-06 12:47       ` Richard Sandiford [this message]
2023-03-06 13:58         ` [PATCH] combine: Try harder to form zero_extends [PR106594] Segher Boessenkool
2023-03-06 15:08           ` Richard Sandiford
2023-03-06 16:18             ` Jakub Jelinek
2023-03-06 16:34               ` Richard Sandiford
2023-03-06 18:31                 ` Segher Boessenkool
2023-03-06 19:13                   ` Richard Sandiford
2023-03-06 23:31                     ` Segher Boessenkool
2023-03-08 11:58                       ` Richard Sandiford
2023-03-08 22:50                         ` Segher Boessenkool
2023-03-09 10:18                           ` Richard Sandiford
2023-03-06 22:58                 ` Segher Boessenkool
2023-03-06 18:13               ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptilfe9hdh.fsf_-_@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=roger@nextmovesoftware.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).