Re: RISC-V: Folding memory for FP + constant case

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jivan Hakobyan <jivanhakobyan9@gmail.com>
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: RISC-V: Folding memory for FP + constant case
Date: Tue, 25 Jul 2023 15:24:40 +0400	[thread overview]
Message-ID: <CAHso6sPJvjWgaJrpG0SunEM6-TOmV5_Tzznk3gr9tFWuHBd7-Q@mail.gmail.com> (raw)
In-Reply-To: <3c1f0f8a-34ed-abb2-8a49-3083a2cc55d2@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3367 bytes --]

Hi.

I re-run the benchmarks and hopefully got the same profit.
I also compared the leela's code and figured out the reason.

Actually, my and Manolis's patches do the same thing. The difference is
only execution order.
Because of f-m-o held after the register allocation it cannot eliminate
redundant move 'sp' to another register.

Here is an example.

int core_bench_state(int *ptr) {
>    int final_counts[100] = {0};

   while (*ptr) {
>      int id = foo();
>      final_counts[id]++;
>      ptr++;
>    }

   return final_counts[0];
> }


For this loop, the f-m-o pass generates the following.

.L3:

call foo
* mv a5,sp*
sh2add a0,a0,a5
lw a5,0(a0)
lw a4,4(s0)
addi s0,s0,4
addiw a5,a5,1
sw a5,0(a0)
bne a4,zero,.L3


Here '*mv a5, sp*' instruction is redundant.
Leela's FastState::try_move() function has a loop that iterates over 1.3 B
times and contains 5 memory folding cases (5 redundant moves).

Besides that, I have checked the build failure on x264_r. It is already
fixed on the third version.

On Sat, Jul 15, 2023 at 10:16 AM Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote:
> > Accessing local arrays element turned into load form (fp + (index <<
> > C1)) + C2 address. In the case when access is in the loop we got loop
> > invariant computation. For some reason, moving out that part cannot
> > be done in loop-invariant passes. But we can handle that in
> > target-specific hook (legitimize_address). That provides an
> > opportunity to rewrite memory access more suitable for the target
> > architecture.
> >
> > This patch solves the mentioned case by rewriting mentioned case to
> > ((fp + C2) + (index << C1)) I have evaluated it on SPEC2017 and got
> > an improvement on leela (over 7b instructions, .39% of the dynamic
> > count) and dwarfs the regression for gcc (14m instructions, .0012% of
> > the dynamic count).
> >
> >
> > gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_address):
> > Handle folding. (mem_shadd_or_shadd_rtx_p): New predicate.
> So I still need to give the new version a review.  But a high level
> question -- did you re-run the benchmarks with this version to verify
> that we still saw the same nice improvement in leela?
>
> The reason I ask is when I use this on Ventana's internal tree I don't
> see any notable differences in the dynamic instruction counts.  And
> probably the most critical difference between the upstream tree and
> Ventana's tree in this space is Ventana's internal tree has an earlier
> version of the fold-mem-offsets work from Manolis.
>
> It may ultimately be the case that this work and Manolis's f-m-o patch
> have a lot of overlap in terms of their final effect on code generation.
>   Manolis's pass runs much later (after register allocation), so it's
> not going to address the loop-invariant-code-motion issue that
> originally got us looking into this space.  But his pass is generic
> enough that it helps other targets.  So we may ultimately want both.
>
> Anyway, just wanted to verify if this variant is still showing the nice
> improvement on leela that the prior version did.
>
> Jeff
>
> ps.  I know you're on PTO.  No rush on responding -- enjoy the time off.
>
>

-- 
With the best regards
Jivan Hakobyan

next prev parent reply	other threads:[~2023-07-25 11:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-12 20:59 Jivan Hakobyan
2023-07-15  6:16 ` Jeff Law
2023-07-25 11:24   ` Jivan Hakobyan [this message]
2023-07-26  3:31     ` Jeff Law
2023-08-01 19:14       ` Vineet Gupta
2023-08-01 20:16         ` Jivan Hakobyan
2023-08-01 21:55         ` Jeff Law
2023-08-01 22:07           ` Philipp Tomsich
2023-08-01 23:03             ` Vineet Gupta
2023-08-01 23:06               ` Philipp Tomsich
2023-08-01 23:13                 ` Vineet Gupta
2023-08-01 23:27                   ` Jeff Law
2023-08-01 23:38                     ` Vineet Gupta
2023-08-01 23:52                       ` Jeff Law
2023-08-04  9:52                         ` Manolis Tsamis
2023-08-04 16:23                           ` Jeff Law
2023-08-05  9:27                             ` Manolis Tsamis
2023-08-01 23:22                 ` Jeff Law
2023-08-01 23:28                   ` Vineet Gupta
2023-08-01 23:21               ` Jeff Law
2023-08-09 19:31 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHso6sPJvjWgaJrpG0SunEM6-TOmV5_Tzznk3gr9tFWuHBd7-Q@mail.gmail.com \
    --to=jivanhakobyan9@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).