public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <jeffreyalaw@gmail.com>
To: Jivan Hakobyan <jivanhakobyan9@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: RISC-V: Folding memory for FP + constant case
Date: Wed, 9 Aug 2023 13:31:41 -0600	[thread overview]
Message-ID: <606e52ed-019d-9347-5a24-41385b62f567@gmail.com> (raw)
In-Reply-To: <CAHso6sPmfwnyQq4C2AQHgMbm6uxggVFGFK63_Q=yyXC+KCwTOA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]



On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote:
> Accessing local arrays element turned into load form (fp + (index << C1)) +
> C2 address.
> In the case when access is in the loop we got loop invariant computation.
> For some reason, moving out that part cannot be done in
> loop-invariant passes.
> But we can handle that in target-specific hook (legitimize_address).
> That provides an opportunity to rewrite memory access more suitable for the
> target architecture.
> 
> This patch solves the mentioned case by rewriting mentioned case to ((fp +
> C2) + (index << C1))
> I have evaluated it on SPEC2017 and got an improvement on leela (over 7b
> instructions,
> .39% of the dynamic count) and dwarfs the regression for gcc (14m
> instructions, .0012%
> of the dynamic count).
> 
> 
> gcc/ChangeLog:
>          * config/riscv/riscv.cc (riscv_legitimize_address): Handle folding.
>          (mem_shadd_or_shadd_rtx_p): New predicate.
So I poked a bit more in this space today.

As you may have noted, Manolis's patch still needs another rev.  But I 
was able to test this patch in conjunction with the f-m-o patch as well 
as the additional improvements made to hard register cprop.  The net 
result was that this patch still shows a nice decrease in instruction 
counts on leela.  It's a bit of a mixed bag elsewhere.

I dove a bit deeper into the small regression in x264.  In the case I 
looked at the reason the patch regresses is the original form of the 
address calculations exposes a common subexpression ie

addr1 = (reg1 << 2) + fp + C1
addr2 = (reg1 << 2) + fp + C2

(reg1 << 2) + fp is a common subexpression resulting in something like 
this as we leave CSE:

t = (reg1 << 2) + fp;
addr1 = t + C1
addr2 = t + C2
mem (addr1)
mem (addr2)

C1 and C2 are small constants, so combine generates

t = (reg1 << 2) + fp;
mem (t+C1)
mem (t+C2)

FP elimination occurs after IRA and we get:

t2 = sp + C3
t = (reg << 2) + t2
mem (t + C1)
mem (t + C2)


Not bad.  Manolis's work should allow us to improve that a bit more.


With this patch we don't capture the CSE and ultimately generate 
slightly worse code.  This kind of issue is fairly inherent in 
reassociations -- and given the regression is 2 orders of magnitude 
smaller than the improvement my inclination is to go forward with this 
patch.



I've fixed a few formatting issues and changed once conditional to use 
CONST_INT_P rather than checking the code directory and pushed the final 
version to the trunk.

Thanks for your patience.

jeff

[-- Attachment #2: P --]
[-- Type: text/plain, Size: 3381 bytes --]

commit a16dc729fda9fabd6472d50cce45791cb3b6ada8
Author: Jivan Hakobyan <jivanhakobyan9@gmail.com>
Date:   Wed Aug 9 13:26:58 2023 -0600

    RISC-V: Folding memory for FP + constant case
    
    Accessing local arrays element turned into load form (fp + (index << C1)) +
    C2 address.
    
    In the case when access is in the loop we got loop invariant computation.  For
    some reason, moving out that part cannot be done in loop-invariant passes.  But
    we can handle that in target-specific hook (legitimize_address).  That provides
    an opportunity to rewrite memory access more suitable for the target
    architecture.
    
    This patch solves the mentioned case by rewriting mentioned case to ((fp +
    C2) + (index << C1))
    
    I have evaluated it on SPEC2017 and got an improvement on leela (over 7b
    instructions, .39% of the dynamic count) and dwarfs the regression for gcc (14m
    instructions, .0012% of the dynamic count).
    
    gcc/ChangeLog:
            * config/riscv/riscv.cc (riscv_legitimize_address): Handle folding.
            (mem_shadd_or_shadd_rtx_p): New function.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 77892da2920..7f2041a54ba 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1805,6 +1805,22 @@ riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset)
   return addr;
 }
 
+/* Helper for riscv_legitimize_address. Given X, return true if it
+   is a left shift by 1, 2 or 3 positions or a multiply by 2, 4 or 8.
+
+   This respectively represent canonical shift-add rtxs or scaled
+   memory addresses.  */
+static bool
+mem_shadd_or_shadd_rtx_p (rtx x)
+{
+  return ((GET_CODE (x) == ASHIFT
+	   || GET_CODE (x) == MULT)
+	  && CONST_INT_P (XEXP (x, 1))
+	  && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
+	      || (GET_CODE (x) == MULT
+		  && IN_RANGE (exact_log2 (INTVAL (XEXP (x, 1))), 1, 3))));
+}
+
 /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
    be legitimized in a way that the generic machinery might not expect,
    return a new address, otherwise return NULL.  MODE is the mode of
@@ -1830,6 +1846,32 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED,
       rtx base = XEXP (x, 0);
       HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
 
+      /* Handle (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case.  */
+      if (GET_CODE (base) == PLUS && mem_shadd_or_shadd_rtx_p (XEXP (base, 0))
+	  && SMALL_OPERAND (offset))
+	{
+	  rtx index = XEXP (base, 0);
+	  rtx fp = XEXP (base, 1);
+	  if (REGNO (fp) == VIRTUAL_STACK_VARS_REGNUM)
+	    {
+
+	      /* If we were given a MULT, we must fix the constant
+		 as we're going to create the ASHIFT form.  */
+	      int shift_val = INTVAL (XEXP (index, 1));
+	      if (GET_CODE (index) == MULT)
+		shift_val = exact_log2 (shift_val);
+
+	      rtx reg1 = gen_reg_rtx (Pmode);
+	      rtx reg2 = gen_reg_rtx (Pmode);
+	      rtx reg3 = gen_reg_rtx (Pmode);
+	      riscv_emit_binary (PLUS, reg1, fp, GEN_INT (offset));
+	      riscv_emit_binary (ASHIFT, reg2, XEXP (index, 0), GEN_INT (shift_val));
+	      riscv_emit_binary (PLUS, reg3, reg2, reg1);
+
+	      return reg3;
+	    }
+	}
+
       if (!riscv_valid_base_register_p (base, mode, false))
 	base = copy_to_mode_reg (Pmode, base);
       if (optimize_function_for_size_p (cfun)

      parent reply	other threads:[~2023-08-09 19:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-12 20:59 Jivan Hakobyan
2023-07-15  6:16 ` Jeff Law
2023-07-25 11:24   ` Jivan Hakobyan
2023-07-26  3:31     ` Jeff Law
2023-08-01 19:14       ` Vineet Gupta
2023-08-01 20:16         ` Jivan Hakobyan
2023-08-01 21:55         ` Jeff Law
2023-08-01 22:07           ` Philipp Tomsich
2023-08-01 23:03             ` Vineet Gupta
2023-08-01 23:06               ` Philipp Tomsich
2023-08-01 23:13                 ` Vineet Gupta
2023-08-01 23:27                   ` Jeff Law
2023-08-01 23:38                     ` Vineet Gupta
2023-08-01 23:52                       ` Jeff Law
2023-08-04  9:52                         ` Manolis Tsamis
2023-08-04 16:23                           ` Jeff Law
2023-08-05  9:27                             ` Manolis Tsamis
2023-08-01 23:22                 ` Jeff Law
2023-08-01 23:28                   ` Vineet Gupta
2023-08-01 23:21               ` Jeff Law
2023-08-09 19:31 ` Jeff Law [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=606e52ed-019d-9347-5a24-41385b62f567@gmail.com \
    --to=jeffreyalaw@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jivanhakobyan9@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).