public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <jeffreyalaw@gmail.com>
To: Manolis Tsamis <manolis.tsamis@vrull.eu>, gcc-patches@gcc.gnu.org
Cc: Richard Biener <richard.guenther@gmail.com>,
	Palmer Dabbelt <palmer@rivosinc.com>,
	Philipp Tomsich <philipp.tomsich@vrull.eu>,
	Kito Cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
Date: Wed, 7 Jun 2023 23:37:20 -0600	[thread overview]
Message-ID: <91d71dae-b235-fbd0-c8f0-001b7f1e444c@gmail.com> (raw)
In-Reply-To: <20230525123550.1072506-2-manolis.tsamis@vrull.eu>



On 5/25/23 06:35, Manolis Tsamis wrote:
> Implementation of the new RISC-V optimization pass for memory offset
> calculations, documentation and testcases.
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> 	* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> 	pass.
> 	* config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> 	* config/riscv/riscv.opt: New options.
> 	* config/riscv/t-riscv: New build rule.
> 	* doc/invoke.texi: Document new option.
> 	* config/riscv/riscv-fold-mem-offsets.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
So not going into the guts of the patch yet.

 From a benchmark standpoint the only two that get out of the +-0.05% 
range are mcf and deepsjeng (from a dynamic instruction standpoint).  So 
from an evaluation standpoint we can probably focus our efforts there. 
And as we know, mcf is actually memory bound, so while improving its 
dynamic instruction count is good, the end performance improvement may 
be marginal.

As I mentioned to Philipp many months ago this reminds me a lot of a 
problem I've seen before.  Basically register elimination emits code 
that can be terrible in some circumstances.  So I went and poked at this 
again.

I think the key difference between now and what I was dealing with 
before is for the cases that really matter for rv64 we have a shNadd 
insn in the sequence.  That private port I was working on before did not 
have shNadd (don't ask, I probably can't tell).  Our target also had 
reg+reg addressing modes.  What I can't remember was if we were trying 
harder to fold the constant terms into the memory reference or if we 
were more focused on the reg+reg.  Ultimately it's probably not that 
important to remember -- the key is there are very significant 
differences in the target's capabilities which impact how we should be 
generating code in this case.  Those differences affect the code we 
generate *and* the places where we can potentially get control and do 
some address rewriting.

A  key sequence in mcf looks something like this in IRA, others have 
similar structure:

> (insn 237 234 239 26 (set (reg:DI 377)
>         (plus:DI (ashift:DI (reg:DI 200 [ _173 ])
>                 (const_int 3 [0x3]))
>             (reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd}
>      (nil))
> (insn 239 237 235 26 (set (reg/f:DI 380)
>         (plus:DI (reg:DI 513)
>             (reg:DI 377))) "pbeampp.c":139:15 5 {adddi3}
>      (expr_list:REG_DEAD (reg:DI 377)
>         (expr_list:REG_EQUAL (plus:DI (reg:DI 377)
>                 (const_int -32768 [0xffffffffffff8000]))
>             (nil))))
[ ... ]
> (insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ])
>         (mem/f:DI (plus:DI (reg/f:DI 380)
>                 (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit}
>      (expr_list:REG_DEAD (reg/f:DI 380)
>         (nil)))


The key here is insn 237.  It's generally going to be bad to have FP 
show up in a shadd insn because its going to be eliminated into 
sp+offset.  That'll generate an input reload before insn 237 and we 
can't do any combination with the constant in insn 239.

After LRA it looks like this:

> (insn 1540 234 1541 26 (set (reg:DI 11 a1 [750])
>         (const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bit}
>      (nil))
> (insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749])
>         (plus:DI (reg:DI 11 a1 [750])
>             (const_int -272 [0xfffffffffffffef0]))) "pbeampp.c":139:15 5 {adddi3}
>      (expr_list:REG_EQUAL (const_int 32496 [0x7ef0])
>         (nil))) 
> (insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795])
>         (plus:DI (reg/f:DI 2 sp)
>             (const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3}
>      (nil))
> (insn 1542 1611 237 26 (set (reg:DI 12 a2 [749])
>         (plus:DI (reg:DI 12 a2 [749])
>             (reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3}
>      (nil))
> (insn 237 1542 239 26 (set (reg:DI 12 a2 [377])
>         (plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200])
>                 (const_int 3 [0x3]))
>             (reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd}
>      (nil))
> (insn 239 237 235 26 (set (reg/f:DI 12 a2 [380])
>         (plus:DI (reg:DI 10 a0 [513])
>             (reg:DI 12 a2 [377]))) "pbeampp.c":139:15 5 {adddi3}
>      (expr_list:REG_EQUAL (plus:DI (reg:DI 12 a2 [377])
>             (const_int -32768 [0xffffffffffff8000]))
>         (nil))) 
[ ... ]
> (insn 240 235 255 26 (set (reg/f:DI 14 a4 [orig:204 _177 ] [204])
>         (mem/f:DI (plus:DI (reg/f:DI 12 a2 [380])
>                 (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit}
>      (nil))


Reload/LRA made an absolute mess of that code.

But before we add a new pass (target specific or generic), I think it 
may be in our best interest experiment a bit of creative rewriting to 
preserve the shadd, but without the frame pointer.  Perhaps also looking 
for a way to fold the constants, both the explicit ones and the implicit 
one from FP elimination.

This looks particularly promising:

> Trying 237, 239 -> 240:
>   237: r377:DI=r200:DI<<0x3+frame:DI
>       REG_DEAD r200:DI
>   239: r380:DI=r513:DI+r377:DI
>       REG_DEAD r377:DI
>       REG_EQUAL r377:DI-0x8000
>   240: r204:DI=[r380:DI+0x118]
>       REG_DEAD r380:DI
> Failed to match this instruction:
> (set (reg/f:DI 204 [ _177 ])
>     (mem/f:DI (plus:DI (plus:DI (plus:DI (mult:DI (reg:DI 200 [ _173 ])
>                         (const_int 8 [0x8]))
>                     (reg/f:DI 65 frame))
>                 (reg:DI 513))
>             (const_int 280 [0x118])) [7 *_176+0 S8 A64]))


We could reassociate this as

t1 = r200 * 8 + r513
t2 = frame + 280
t3 = t1 + t2
r204 = *t3

Which after elimination would be

t1 = r2000 * 8 + r513
t2 = sp + C + 280
t3 = t1 + t2
r204 = *t3

C + 280 will simplify.  And we'll probably end up in the addptr3 case 
which I think gives us a chance to write this a bit so that we end up wit
t1 = r200 * 8 + r513
t2 = sp + t1
r204 = *(t2 + 280 + C)


Or at least I *think* we might be able to get there.  Anyway, as I said, 
I think this deserves a bit of playing around before we jump straight 
into adding a new pass.

jeff


  parent reply	other threads:[~2023-06-08  5:37 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-25 12:35 [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Manolis Tsamis
2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
2023-05-25 13:01   ` Richard Biener
2023-05-25 13:25     ` Manolis Tsamis
2023-05-25 13:31     ` Jeff Law
2023-05-25 13:50       ` Richard Biener
2023-05-25 14:02         ` Manolis Tsamis
2023-05-29 23:30           ` Jeff Law
2023-05-31 12:19             ` Manolis Tsamis
2023-05-31 14:00               ` Jeff Law
2023-05-25 14:13         ` Jeff Law
2023-05-25 14:18           ` Philipp Tomsich
2023-06-08  5:37   ` Jeff Law [this message]
2023-06-12  7:36     ` Manolis Tsamis
2023-06-12 14:37       ` Jeff Law
2023-06-09  0:57   ` Jeff Law
2023-06-12  7:32     ` Manolis Tsamis
2023-06-12 21:58       ` Jeff Law
2023-06-15 17:34         ` Manolis Tsamis
2023-06-10 15:49   ` Jeff Law
2023-06-12  7:41     ` Manolis Tsamis
2023-06-12 21:36       ` Jeff Law
2023-05-25 12:35 ` [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible Manolis Tsamis
2023-05-25 13:38   ` Jeff Law
2023-05-31 12:15     ` Manolis Tsamis
2023-06-07 22:16       ` Jeff Law
2023-06-07 22:18   ` Jeff Law
2023-06-08  6:15     ` Manolis Tsamis
2023-06-15 20:13     ` Philipp Tomsich
2023-06-19 16:57       ` Thiago Jung Bauermann
2023-06-19 17:07         ` Manolis Tsamis
2023-06-19 23:40         ` Andrew Pinski
2023-06-19 23:48           ` Andrew Pinski
2023-06-20  2:16             ` Jeff Law
2023-06-20  4:52               ` Tamar Christina
2023-06-20  5:00                 ` Jeff Law
2023-06-21 23:42                   ` Thiago Jung Bauermann
2023-06-22  7:37                     ` Richard Biener
2023-06-22  7:58                       ` Philipp Tomsich
2023-05-25 13:42 ` [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Jeff Law
2023-05-25 13:57   ` Manolis Tsamis
2023-06-15 15:04   ` Jeff Law
2023-06-15 15:30     ` Manolis Tsamis
2023-06-15 15:56       ` Jeff Law
2023-06-18 18:11       ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91d71dae-b235-fbd0-c8f0-001b7f1e444c@gmail.com \
    --to=jeffreyalaw@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=kito.cheng@gmail.com \
    --cc=manolis.tsamis@vrull.eu \
    --cc=palmer@rivosinc.com \
    --cc=philipp.tomsich@vrull.eu \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).