public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: <gcc-patches@gcc.gnu.org>
Cc: "'Jeff Law'" <jeffreyalaw@gmail.com>, "'YunQiang Su'" <syq@gcc.gnu.org>
Subject: [PATCH] Improved RTL expansion of field assignments into promoted registers.
Date: Thu, 28 Dec 2023 14:59:40 -0000	[thread overview]
Message-ID: <005901da399e$7d13b330$773b1990$@nextmovesoftware.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3833 bytes --]


This patch fixes PR rtl-optmization/104914 by tweaking/improving the way
that fields are written into a pseudo register that needs to be kept sign
extended.

The motivating example from the bugzilla PR is:

extern void ext(int);
void foo(const unsigned char *buf) {
  int val;
  ((unsigned char*)&val)[0] = *buf++;
  ((unsigned char*)&val)[1] = *buf++;
  ((unsigned char*)&val)[2] = *buf++;
  ((unsigned char*)&val)[3] = *buf++;
  if(val > 0)
    ext(1);
  else
    ext(0);
}

which at the end of the tree optimization passes looks like:

void foo (const unsigned char * buf)
{
  int val;
  unsigned char _1;
  unsigned char _2;
  unsigned char _3;
  unsigned char _4;
  int val.5_5;

  <bb 2> [local count: 1073741824]:
  _1 = *buf_7(D);
  MEM[(unsigned char *)&val] = _1;
  _2 = MEM[(const unsigned char *)buf_7(D) + 1B];
  MEM[(unsigned char *)&val + 1B] = _2;
  _3 = MEM[(const unsigned char *)buf_7(D) + 2B];
  MEM[(unsigned char *)&val + 2B] = _3;
  _4 = MEM[(const unsigned char *)buf_7(D) + 3B];
  MEM[(unsigned char *)&val + 3B] = _4;
  val.5_5 = val;
  if (val.5_5 > 0)
    goto <bb 3>; [59.00%]
  else
    goto <bb 4>; [41.00%]

  <bb 3> [local count: 633507681]:
  ext (1);
  goto <bb 5>; [100.00%]

  <bb 4> [local count: 440234144]:
  ext (0);

  <bb 5> [local count: 1073741824]:
  val ={v} {CLOBBER(eol)};
  return;

}

Here four bytes are being sequentially written into the SImode value
val.  On some platforms, such as MIPS64, this SImode value is kept in
a 64-bit register, suitably sign-extended.  The function expand_assignment
contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264
in expr.cc) which outputs an explicit extension operation after each
store_field (typically insv) to such promoted/extended pseudos.

The first observation is that there's no need to perform sign extension
after each byte in the example above; the extension is only required
after changes to the most significant byte (i.e. to a field that overlaps
the most significant bit).

The bug fix is actually a bit more subtle, but at this point during
code expansion it's not safe to use a SUBREG when sign-extending this
field.  Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0))
but combine (and other RTL optimizers) later realize that because SImode
values are always sign-extended in their 64-bit hard registers that
this is a no-op and eliminates it.  The trouble is that it's unsafe to
refer to the SImode lowpart of a 64-bit register using SUBREG at those
critical points when temporarily the value isn't correctly sign-extended,
and the usual backend invariants don't hold.  At these critical points,
the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a
TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like
(sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem.

Note that MODE_REP_EXTENDED (NARROW, WIDE) != UNKOWN implies (or should
imply) !TRULY_NOOP_TRUNCATION (NARROW, WIDE).  I've another (independent)
patch that I'll post in a few minutes.


This middle-end patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, both with and without
--target_board=unix{-m32} with no new failures.  The cc1 from a
cross-compiler to mips64 appears to generate much better code for
the above test case.  Ok for mainline?


2023-12-28  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR rtl-optimization/104914
        * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P
        a sign or zero extension is only required if the modified field
        overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
        targets, don't refer to the temporarily incorrectly extended value
        using a SUBREG, but instead generate an explicit TRUNCATE rtx.


Thanks in advance,
Roger
--


[-- Attachment #2: patchyq.txt --]
[-- Type: text/plain, Size: 1756 bytes --]

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 9fef2bf6585..1a34b48e38f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6272,19 +6272,32 @@ expand_assignment (tree to, tree from, bool nontemporal)
 		  && known_eq (bitpos, 0)
 		  && known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
 		result = store_expr (from, to_rtx, 0, nontemporal, false);
-	      else
+	      /* Check if the field overlaps the MSB, requiring extension.  */
+	      else if (known_eq (bitpos + bitsize,
+				 GET_MODE_BITSIZE (GET_MODE (to_rtx))))
 		{
-		  rtx to_rtx1
-		    = lowpart_subreg (subreg_unpromoted_mode (to_rtx),
-				      SUBREG_REG (to_rtx),
-				      subreg_promoted_mode (to_rtx));
+		  scalar_int_mode imode = subreg_unpromoted_mode (to_rtx);
+		  scalar_int_mode omode = subreg_promoted_mode (to_rtx);
+		  rtx to_rtx1 = lowpart_subreg (imode, SUBREG_REG (to_rtx),
+						omode);
 		  result = store_field (to_rtx1, bitsize, bitpos,
 					bitregion_start, bitregion_end,
 					mode1, from, get_alias_set (to),
 					nontemporal, reversep);
+		  /* If the target usually keeps IMODE appropriately
+		     extended in OMODE it's unsafe to refer to it using
+		     a SUBREG whilst this invariant doesn't hold.  */
+		  if (targetm.mode_rep_extended (imode, omode) != UNKNOWN)
+		    to_rtx1 = simplify_gen_unary (TRUNCATE, imode,
+						  SUBREG_REG (to_rtx), omode);
 		  convert_move (SUBREG_REG (to_rtx), to_rtx1,
 				SUBREG_PROMOTED_SIGN (to_rtx));
 		}
+	      else
+		result = store_field (to_rtx, bitsize, bitpos,
+				      bitregion_start, bitregion_end,
+				      mode1, from, get_alias_set (to),
+				      nontemporal, reversep);
 	    }
 	  else
 	    result = store_field (to_rtx, bitsize, bitpos,

             reply	other threads:[~2023-12-28 14:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-28 14:59 Roger Sayle [this message]
2023-12-28 18:22 ` Jeff Law
2023-12-28 19:35   ` Roger Sayle
2023-12-29  2:07     ` YunQiang Su
2023-12-30  3:27       ` Jeff Law
2023-12-31  4:34         ` YunQiang Su
2024-01-02 16:46           ` Jeff Law
2024-01-02 16:55     ` Jeff Law
2023-12-29  1:51   ` YunQiang Su

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='005901da399e$7d13b330$773b1990$@nextmovesoftware.com' \
    --to=roger@nextmovesoftware.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=syq@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).