public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] middle-end: Simplify (sign_extend:HI (truncate:QI (ashiftrt:HI X 8)))
@ 2020-07-19  9:42 Roger Sayle
  2020-08-16  1:23 ` Segher Boessenkool
  2020-08-24 23:28 ` Jeff Law
  0 siblings, 2 replies; 3+ messages in thread
From: Roger Sayle @ 2020-07-19  9:42 UTC (permalink / raw)
  To: 'GCC Patches'

[-- Attachment #1: Type: text/plain, Size: 2379 bytes --]


The combination of several my recent nvptx patches has revealed an
interesting RTL optimization opportunity.  This patch to simplify-rtx.c
simplifies (sign_extend:HI (truncate:QI (?shiftrt:HI x 8))) to just
(ashiftrt:HI x 8), as the inner shift already sets the high bits
appropriately.  The equivalent zero_extend variant appears to already
be implemented in simplify_unary_operation_1.

During the compilation of one of the tests in the test suite, we
manage the generate the redundant sequence of instructions:

(insn 17 16 18 3 (set (reg:HI 35)
        (ashiftrt:HI (reg:HI 34 [ arg ])
            (const_int 8 [0x8]))) "v2si-cvt.c":14:8 94 {ashrhi3}
     (expr_list:REG_DEAD (reg:HI 34 [ arg ])
        (nil)))
(insn 18 17 19 3 (set (reg:QI 36)
        (truncate:QI (reg:HI 35))) "v2si-cvt.c":14:8 28 {trunchiqi2}
     (expr_list:REG_DEAD (reg:HI 35)
        (nil)))
(insn 19 18 20 3 (set (reg:HI 37)
        (sign_extend:HI (reg:QI 36))) "v2si-cvt.c":14:6 22 {extendqihi2}
     (expr_list:REG_DEAD (reg:QI 36)
        (nil)))

These result from RTL expansion generating a reasonable arithmetic right
shift and truncation to char, only to then discover the backend doesn't
support QImode comparisons, so the next optab widens this result/operand
back to HImode.  In this sequence the truncate and sign extend are
redundant as the original arithmetic shift has already set the high
bits appropriately.  The one oddity of the patch is that it tests for
LSHIFTRT as inner shift, as simplify/combine has already canonicalized
this to a logical shift, assuming that the distinction is unimportant
following the truncation.

With this patch, the code generated by the nvptx backends goes from:
        shr.s16 %r35, %r34, 8;
        cvt.u32.u16     %r36, %r35;
        cvt.s16.s8      %r37, %r36;
to
        shr.s16 %r37, %r34, 8;

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" (just to be safe), and nvptx-none (both with and
without my other patches), all with no new regressions.
Ok for mainline?


2020-07-19  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
	Simplify (sign_extend:M (truncate:N (lshiftrt:M x C))) to
	(ashiftrt:M x C) when the shift sets the high bits appropriately.

Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK


[-- Attachment #2: patchs.txt --]
[-- Type: text/plain, Size: 1913 bytes --]

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index e631da4..e3630c9 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1527,6 +1527,38 @@ simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
 	  && XEXP (op, 1) != const0_rtx)
 	return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));
 
+      /* (sign_extend:M (truncate:N (lshiftrt:O <X> (const_int I)))) where
+	 I is GET_MODE_PRECISION(O) - GET_MODE_PRECISION(N), simplifies to
+	 (ashiftrt:M <X> (const_int I)) if modes M and O are the same, and
+	 (truncate:M (ashiftrt:O <X> (const_int I))) if M is narrower than
+	 O, and (sign_extend:M (ashiftrt:O <X> (const_int I))) if M is
+	 wider than O.  */
+      if (GET_CODE (op) == TRUNCATE
+	  && GET_CODE (XEXP (op, 0)) == LSHIFTRT
+	  && CONST_INT_P (XEXP (XEXP (op, 0), 1)))
+	{
+	  scalar_int_mode m_mode, n_mode, o_mode;
+	  rtx old_shift = XEXP (op, 0);
+	  if (is_a <scalar_int_mode> (mode, &m_mode)
+	      && is_a <scalar_int_mode> (GET_MODE (op), &n_mode)
+	      && is_a <scalar_int_mode> (GET_MODE (old_shift), &o_mode)
+	      && GET_MODE_PRECISION (o_mode) - GET_MODE_PRECISION (n_mode)
+		 == INTVAL (XEXP (old_shift, 1)))
+	    {
+	      rtx new_shift = simplify_gen_binary (ASHIFTRT,
+						   GET_MODE (old_shift),
+						   XEXP (old_shift, 0),
+						   XEXP (old_shift, 1));
+	      if (GET_MODE_PRECISION (m_mode) > GET_MODE_PRECISION (o_mode))
+		return simplify_gen_unary (SIGN_EXTEND, mode, new_shift,
+					   GET_MODE (new_shift));
+	      if (mode != GET_MODE (new_shift))
+		return simplify_gen_unary (TRUNCATE, mode, new_shift,
+					   GET_MODE (new_shift));
+	      return new_shift;
+	    }
+	}
+
 #if defined(POINTERS_EXTEND_UNSIGNED)
       /* As we do not know which address space the pointer is referring to,
 	 we can do this only if the target does not support different pointer

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] middle-end: Simplify (sign_extend:HI (truncate:QI (ashiftrt:HI X 8)))
  2020-07-19  9:42 [PATCH] middle-end: Simplify (sign_extend:HI (truncate:QI (ashiftrt:HI X 8))) Roger Sayle
@ 2020-08-16  1:23 ` Segher Boessenkool
  2020-08-24 23:28 ` Jeff Law
  1 sibling, 0 replies; 3+ messages in thread
From: Segher Boessenkool @ 2020-08-16  1:23 UTC (permalink / raw)
  To: Roger Sayle; +Cc: 'GCC Patches'

Hi!

On Sun, Jul 19, 2020 at 10:42:16AM +0100, Roger Sayle wrote:
> This patch to simplify-rtx.c
> simplifies (sign_extend:HI (truncate:QI (?shiftrt:HI x 8))) to just
> (ashiftrt:HI x 8), as the inner shift already sets the high bits
> appropriately.

> The one oddity of the patch is that it tests for
> LSHIFTRT as inner shift, as simplify/combine has already canonicalized
> this to a logical shift, assuming that the distinction is unimportant
> following the truncation.

If simplify-rtx does that, that is a bug.  But combine will do this, I
think that is what you are seeing?  Can you verify it already works if
the ASHIFTRT is not changed to an LSHIFTRT?

> +	  if (is_a <scalar_int_mode> (mode, &m_mode)
> +	      && is_a <scalar_int_mode> (GET_MODE (op), &n_mode)
> +	      && is_a <scalar_int_mode> (GET_MODE (old_shift), &o_mode)
> +	      && GET_MODE_PRECISION (o_mode) - GET_MODE_PRECISION (n_mode)
> +		 == INTVAL (XEXP (old_shift, 1)))
> +	    {
> +	      rtx new_shift = simplify_gen_binary (ASHIFTRT,
> +						   GET_MODE (old_shift),
> +						   XEXP (old_shift, 0),
> +						   XEXP (old_shift, 1));
> +	      if (GET_MODE_PRECISION (m_mode) > GET_MODE_PRECISION (o_mode))
> +		return simplify_gen_unary (SIGN_EXTEND, mode, new_shift,
> +					   GET_MODE (new_shift));
> +	      if (mode != GET_MODE (new_shift))
> +		return simplify_gen_unary (TRUNCATE, mode, new_shift,
> +					   GET_MODE (new_shift));
> +	      return new_shift;
> +	    }

Yeah looks like it :-)

You could say combine should be smarter about this, but this is a valid
simplification in itself.  So, okay for trunk.  Thank you!


Segher

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] middle-end: Simplify (sign_extend:HI (truncate:QI (ashiftrt:HI X 8)))
  2020-07-19  9:42 [PATCH] middle-end: Simplify (sign_extend:HI (truncate:QI (ashiftrt:HI X 8))) Roger Sayle
  2020-08-16  1:23 ` Segher Boessenkool
@ 2020-08-24 23:28 ` Jeff Law
  1 sibling, 0 replies; 3+ messages in thread
From: Jeff Law @ 2020-08-24 23:28 UTC (permalink / raw)
  To: Roger Sayle, 'GCC Patches'

On Sun, 2020-07-19 at 10:42 +0100, Roger Sayle wrote:
> The combination of several my recent nvptx patches has revealed an
> interesting RTL optimization opportunity.  This patch to simplify-rtx.c
> simplifies (sign_extend:HI (truncate:QI (?shiftrt:HI x 8))) to just
> (ashiftrt:HI x 8), as the inner shift already sets the high bits
> appropriately.  The equivalent zero_extend variant appears to already
> be implemented in simplify_unary_operation_1.
> 
> During the compilation of one of the tests in the test suite, we
> manage the generate the redundant sequence of instructions:
> 
> (insn 17 16 18 3 (set (reg:HI 35)
>         (ashiftrt:HI (reg:HI 34 [ arg ])
>             (const_int 8 [0x8]))) "v2si-cvt.c":14:8 94 {ashrhi3}
>      (expr_list:REG_DEAD (reg:HI 34 [ arg ])
>         (nil)))
> (insn 18 17 19 3 (set (reg:QI 36)
>         (truncate:QI (reg:HI 35))) "v2si-cvt.c":14:8 28 {trunchiqi2}
>      (expr_list:REG_DEAD (reg:HI 35)
>         (nil)))
> (insn 19 18 20 3 (set (reg:HI 37)
>         (sign_extend:HI (reg:QI 36))) "v2si-cvt.c":14:6 22 {extendqihi2}
>      (expr_list:REG_DEAD (reg:QI 36)
>         (nil)))
I can't recall the target, but I've seen similar sequences as well.  Thanks for
digging into it. 

Jeff


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-08-24 23:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-19  9:42 [PATCH] middle-end: Simplify (sign_extend:HI (truncate:QI (ashiftrt:HI X 8))) Roger Sayle
2020-08-16  1:23 ` Segher Boessenkool
2020-08-24 23:28 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).