[gcc r14-335] PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

From: Roger Sayle <sayle@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc r14-335] PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.
Date: Fri, 28 Apr 2023 13:22:38 +0000 (GMT)	[thread overview]
Message-ID: <20230428132238.CA3953858D37@sourceware.org> (raw)

https://gcc.gnu.org/g:650c36ec461a722d9c65e82512b4c3aeec2ffee1

commit r14-335-g650c36ec461a722d9c65e82512b4c3aeec2ffee1
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Apr 28 14:21:53 2023 +0100

    PR rtl-optimization/109476: Use ZERO_EXTEND instead of zeroing a SUBREG.
    
    This patch fixes PR rtl-optimization/109476, which is a code quality
    regression affecting AVR.  The cause is that the lower-subreg pass is
    sometimes overly aggressive, lowering the LSHIFTRT below:
    
    (insn 7 4 8 2 (set (reg:HI 51)
            (lshiftrt:HI (reg/v:HI 49 [ b ])
                (const_int 8 [0x8]))) "t.ii":4:36 557 {lshrhi3}
         (nil))
    
    into a pair of QImode SUBREG assignments:
    
    (insn 19 4 20 2 (set (subreg:QI (reg:HI 51) 0)
            (reg:QI 54 [ b+1 ])) "t.ii":4:36 86 {movqi_insn_split}
         (nil))
    (insn 20 19 8 2 (set (subreg:QI (reg:HI 51) 1)
            (const_int 0 [0])) "t.ii":4:36 86 {movqi_insn_split}
         (nil))
    
    but this idiom, SETs of SUBREGs, interferes with combine's ability
    to associate/fuse instructions.  The solution, on targets that
    have a suitable ZERO_EXTEND (i.e. where the lower-subreg pass
    wouldn't itself split a ZERO_EXTEND, so "splitting_zext" is false),
    is to split/lower LSHIFTRT to a ZERO_EXTEND.
    
    To answer Richard's question in comment #10 of the bugzilla PR,
    the function resolve_shift_zext is called with one of four RTX
    codes, ASHIFTRT, LSHIFTRT, ZERO_EXTEND and ASHIFT, but only with
    LSHIFTRT can the setting of low_part and high_part SUBREGs be
    replaced by a ZERO_EXTEND.  For ASHIFTRT, we require a sign
    extension, so don't set the high_part to zero; if we're splitting
    a ZERO_EXTEND then it doesn't make sense to replace it with a
    ZERO_EXTEND, and for ASHIFT we've played games to swap the
    high_part and low_part SUBREGs, so that we assign the low_part
    to zero (for double word shifts by greater than word size bits).
    
    2023-04-28  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/ChangeLog
            PR rtl-optimization/109476
            * lower-subreg.cc: Include explow.h for force_reg.
            (find_decomposable_shift_zext): Pass an additional SPEED_P argument.
            If decomposing a suitable LSHIFTRT and we're not splitting
            ZERO_EXTEND (based on the current SPEED_P), then use a ZERO_EXTEND
            instead of setting a high part SUBREG to zero, which helps combine.
            (decompose_multiword_subregs): Update call to resolve_shift_zext.
    
    gcc/testsuite/ChangeLog
            PR rtl-optimization/109476
            * gcc.target/avr/mmcu/pr109476.c: New test case.

Diff:
---
 gcc/lower-subreg.cc                          | 39 ++++++++++++++++++++--------
 gcc/testsuite/gcc.target/avr/mmcu/pr109476.c | 11 ++++++++
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
index 481e1e85a26..81fc5380cbe 100644
--- a/gcc/lower-subreg.cc
+++ b/gcc/lower-subreg.cc
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgbuild.h"
 #include "dce.h"
 #include "expr.h"
+#include "explow.h"
 #include "tree-pass.h"
 #include "lower-subreg.h"
 #include "rtl-iter.h"
@@ -1299,11 +1300,12 @@ find_decomposable_shift_zext (rtx_insn *insn, bool speed_p)
 
 /* Decompose a more than word wide shift (in INSN) of a multiword
    pseudo or a multiword zero-extend of a wordmode pseudo into a move
-   and 'set to zero' insn.  Return a pointer to the new insn when a
-   replacement was done.  */
+   and 'set to zero' insn.  SPEED_P says whether we are optimizing
+   for speed or size, when checking if a ZERO_EXTEND is preferable.
+   Return a pointer to the new insn when a replacement was done.  */
 
 static rtx_insn *
-resolve_shift_zext (rtx_insn *insn)
+resolve_shift_zext (rtx_insn *insn, bool speed_p)
 {
   rtx set;
   rtx op;
@@ -1378,14 +1380,29 @@ resolve_shift_zext (rtx_insn *insn)
 				dest_reg, GET_CODE (op) != ASHIFTRT);
     }
 
-  if (dest_reg != src_reg)
-    emit_move_insn (dest_reg, src_reg);
-  if (GET_CODE (op) != ASHIFTRT)
-    emit_move_insn (dest_upper, CONST0_RTX (word_mode));
-  else if (INTVAL (XEXP (op, 1)) == 2 * BITS_PER_WORD - 1)
-    emit_move_insn (dest_upper, copy_rtx (src_reg));
+  /* Consider using ZERO_EXTEND instead of setting DEST_UPPER to zero
+     if this is considered reasonable.  */
+  if (GET_CODE (op) == LSHIFTRT
+      && GET_MODE (op) == twice_word_mode
+      && REG_P (SET_DEST (set))
+      && !choices[speed_p].splitting_zext)
+    {
+      rtx tmp = force_reg (word_mode, copy_rtx (src_reg));
+      tmp = simplify_gen_unary (ZERO_EXTEND, twice_word_mode, tmp, word_mode);
+      emit_move_insn (SET_DEST (set), tmp);
+    }
   else
-    emit_move_insn (dest_upper, upper_src);
+    {
+      if (dest_reg != src_reg)
+	emit_move_insn (dest_reg, src_reg);
+      if (GET_CODE (op) != ASHIFTRT)
+	emit_move_insn (dest_upper, CONST0_RTX (word_mode));
+      else if (INTVAL (XEXP (op, 1)) == 2 * BITS_PER_WORD - 1)
+	emit_move_insn (dest_upper, copy_rtx (src_reg));
+      else
+	emit_move_insn (dest_upper, upper_src);
+    }
+
   insns = get_insns ();
 
   end_sequence ();
@@ -1670,7 +1687,7 @@ decompose_multiword_subregs (bool decompose_copies)
 		    {
 		      rtx_insn *decomposed_shift;
 
-		      decomposed_shift = resolve_shift_zext (insn);
+		      decomposed_shift = resolve_shift_zext (insn, speed_p);
 		      if (decomposed_shift != NULL_RTX)
 			{
 			  insn = decomposed_shift;
diff --git a/gcc/testsuite/gcc.target/avr/mmcu/pr109476.c b/gcc/testsuite/gcc.target/avr/mmcu/pr109476.c
new file mode 100644
index 00000000000..6e2269a82f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/mmcu/pr109476.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -mmcu=avrxmega3" } */
+
+unsigned short foo(unsigned char a, unsigned short b) {
+    return (unsigned char)((b >> 8) + 0) * a ;
+}
+
+/* { dg-final { scan-assembler-times "mul" 1 } } */
+/* { dg-final { scan-assembler-times "mov" 1 } } */
+/* { dg-final { scan-assembler-not "add" } } */
+/* { dg-final { scan-assembler-not "ldi" } } */

                 reply	other threads:[~2023-04-28 13:22 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230428132238.CA3953858D37@sourceware.org \
    --to=sayle@gcc.gnu.org \
    --cc=gcc-cvs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).