From: Manolis Tsamis <manolis.tsamis@vrull.eu>
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: gcc-patches@gcc.gnu.org,
Richard Biener <richard.guenther@gmail.com>,
Palmer Dabbelt <palmer@rivosinc.com>,
Philipp Tomsich <philipp.tomsich@vrull.eu>,
Kito Cheng <kito.cheng@gmail.com>
Subject: Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
Date: Mon, 12 Jun 2023 10:36:20 +0300 [thread overview]
Message-ID: <CAM3yNXp73iBm-sYzFa1v8XWB+WHAZocN7QXcf_p8eYN=QjK0VA@mail.gmail.com> (raw)
In-Reply-To: <91d71dae-b235-fbd0-c8f0-001b7f1e444c@gmail.com>
On Thu, Jun 8, 2023 at 8:37 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
> >
> > gcc/ChangeLog:
> >
> > * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> > * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> > pass.
> > * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> > * config/riscv/riscv.opt: New options.
> > * config/riscv/t-riscv: New build rule.
> > * doc/invoke.texi: Document new option.
> > * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> > * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> > * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> So not going into the guts of the patch yet.
>
> From a benchmark standpoint the only two that get out of the +-0.05%
> range are mcf and deepsjeng (from a dynamic instruction standpoint). So
> from an evaluation standpoint we can probably focus our efforts there.
> And as we know, mcf is actually memory bound, so while improving its
> dynamic instruction count is good, the end performance improvement may
> be marginal.
>
Even if late, one question for the dynamic instruction numbers.
Was this measured just with f-m-o or with the stack pointer fold patch
applied too?
I remember I was getting better improvements in the past, but most of
the cases had to do with the stack pointer so the second patch is
necessary.
> As I mentioned to Philipp many months ago this reminds me a lot of a
> problem I've seen before. Basically register elimination emits code
> that can be terrible in some circumstances. So I went and poked at this
> again.
>
> I think the key difference between now and what I was dealing with
> before is for the cases that really matter for rv64 we have a shNadd
> insn in the sequence. That private port I was working on before did not
> have shNadd (don't ask, I probably can't tell). Our target also had
> reg+reg addressing modes. What I can't remember was if we were trying
> harder to fold the constant terms into the memory reference or if we
> were more focused on the reg+reg. Ultimately it's probably not that
> important to remember -- the key is there are very significant
> differences in the target's capabilities which impact how we should be
> generating code in this case. Those differences affect the code we
> generate *and* the places where we can potentially get control and do
> some address rewriting.
>
> A key sequence in mcf looks something like this in IRA, others have
> similar structure:
>
> > (insn 237 234 239 26 (set (reg:DI 377)
> > (plus:DI (ashift:DI (reg:DI 200 [ _173 ])
> > (const_int 3 [0x3]))
> > (reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd}
> > (nil))
> > (insn 239 237 235 26 (set (reg/f:DI 380)
> > (plus:DI (reg:DI 513)
> > (reg:DI 377))) "pbeampp.c":139:15 5 {adddi3}
> > (expr_list:REG_DEAD (reg:DI 377)
> > (expr_list:REG_EQUAL (plus:DI (reg:DI 377)
> > (const_int -32768 [0xffffffffffff8000]))
> > (nil))))
> [ ... ]
> > (insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ])
> > (mem/f:DI (plus:DI (reg/f:DI 380)
> > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit}
> > (expr_list:REG_DEAD (reg/f:DI 380)
> > (nil)))
>
>
> The key here is insn 237. It's generally going to be bad to have FP
> show up in a shadd insn because its going to be eliminated into
> sp+offset. That'll generate an input reload before insn 237 and we
> can't do any combination with the constant in insn 239.
>
> After LRA it looks like this:
>
> > (insn 1540 234 1541 26 (set (reg:DI 11 a1 [750])
> > (const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bit}
> > (nil))
> > (insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749])
> > (plus:DI (reg:DI 11 a1 [750])
> > (const_int -272 [0xfffffffffffffef0]))) "pbeampp.c":139:15 5 {adddi3}
> > (expr_list:REG_EQUAL (const_int 32496 [0x7ef0])
> > (nil)))
> > (insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795])
> > (plus:DI (reg/f:DI 2 sp)
> > (const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3}
> > (nil))
> > (insn 1542 1611 237 26 (set (reg:DI 12 a2 [749])
> > (plus:DI (reg:DI 12 a2 [749])
> > (reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3}
> > (nil))
> > (insn 237 1542 239 26 (set (reg:DI 12 a2 [377])
> > (plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200])
> > (const_int 3 [0x3]))
> > (reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd}
> > (nil))
> > (insn 239 237 235 26 (set (reg/f:DI 12 a2 [380])
> > (plus:DI (reg:DI 10 a0 [513])
> > (reg:DI 12 a2 [377]))) "pbeampp.c":139:15 5 {adddi3}
> > (expr_list:REG_EQUAL (plus:DI (reg:DI 12 a2 [377])
> > (const_int -32768 [0xffffffffffff8000]))
> > (nil)))
> [ ... ]
> > (insn 240 235 255 26 (set (reg/f:DI 14 a4 [orig:204 _177 ] [204])
> > (mem/f:DI (plus:DI (reg/f:DI 12 a2 [380])
> > (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit}
> > (nil))
>
>
> Reload/LRA made an absolute mess of that code.
>
> But before we add a new pass (target specific or generic), I think it
> may be in our best interest experiment a bit of creative rewriting to
> preserve the shadd, but without the frame pointer. Perhaps also looking
> for a way to fold the constants, both the explicit ones and the implicit
> one from FP elimination.
>
> This looks particularly promising:
>
> > Trying 237, 239 -> 240:
> > 237: r377:DI=r200:DI<<0x3+frame:DI
> > REG_DEAD r200:DI
> > 239: r380:DI=r513:DI+r377:DI
> > REG_DEAD r377:DI
> > REG_EQUAL r377:DI-0x8000
> > 240: r204:DI=[r380:DI+0x118]
> > REG_DEAD r380:DI
> > Failed to match this instruction:
> > (set (reg/f:DI 204 [ _177 ])
> > (mem/f:DI (plus:DI (plus:DI (plus:DI (mult:DI (reg:DI 200 [ _173 ])
> > (const_int 8 [0x8]))
> > (reg/f:DI 65 frame))
> > (reg:DI 513))
> > (const_int 280 [0x118])) [7 *_176+0 S8 A64]))
>
>
> We could reassociate this as
>
> t1 = r200 * 8 + r513
> t2 = frame + 280
> t3 = t1 + t2
> r204 = *t3
>
> Which after elimination would be
>
> t1 = r2000 * 8 + r513
> t2 = sp + C + 280
> t3 = t1 + t2
> r204 = *t3
>
> C + 280 will simplify. And we'll probably end up in the addptr3 case
> which I think gives us a chance to write this a bit so that we end up wit
> t1 = r200 * 8 + r513
> t2 = sp + t1
> r204 = *(t2 + 280 + C)
>
>
> Or at least I *think* we might be able to get there. Anyway, as I said,
> I think this deserves a bit of playing around before we jump straight
> into adding a new pass.
>
> jeff
>
next prev parent reply other threads:[~2023-06-12 7:36 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-25 12:35 [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Manolis Tsamis
2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
2023-05-25 13:01 ` Richard Biener
2023-05-25 13:25 ` Manolis Tsamis
2023-05-25 13:31 ` Jeff Law
2023-05-25 13:50 ` Richard Biener
2023-05-25 14:02 ` Manolis Tsamis
2023-05-29 23:30 ` Jeff Law
2023-05-31 12:19 ` Manolis Tsamis
2023-05-31 14:00 ` Jeff Law
2023-05-25 14:13 ` Jeff Law
2023-05-25 14:18 ` Philipp Tomsich
2023-06-08 5:37 ` Jeff Law
2023-06-12 7:36 ` Manolis Tsamis [this message]
2023-06-12 14:37 ` Jeff Law
2023-06-09 0:57 ` Jeff Law
2023-06-12 7:32 ` Manolis Tsamis
2023-06-12 21:58 ` Jeff Law
2023-06-15 17:34 ` Manolis Tsamis
2023-06-10 15:49 ` Jeff Law
2023-06-12 7:41 ` Manolis Tsamis
2023-06-12 21:36 ` Jeff Law
2023-05-25 12:35 ` [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible Manolis Tsamis
2023-05-25 13:38 ` Jeff Law
2023-05-31 12:15 ` Manolis Tsamis
2023-06-07 22:16 ` Jeff Law
2023-06-07 22:18 ` Jeff Law
2023-06-08 6:15 ` Manolis Tsamis
2023-06-15 20:13 ` Philipp Tomsich
2023-06-19 16:57 ` Thiago Jung Bauermann
2023-06-19 17:07 ` Manolis Tsamis
2023-06-19 23:40 ` Andrew Pinski
2023-06-19 23:48 ` Andrew Pinski
2023-06-20 2:16 ` Jeff Law
2023-06-20 4:52 ` Tamar Christina
2023-06-20 5:00 ` Jeff Law
2023-06-21 23:42 ` Thiago Jung Bauermann
2023-06-22 7:37 ` Richard Biener
2023-06-22 7:58 ` Philipp Tomsich
2023-05-25 13:42 ` [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Jeff Law
2023-05-25 13:57 ` Manolis Tsamis
2023-06-15 15:04 ` Jeff Law
2023-06-15 15:30 ` Manolis Tsamis
2023-06-15 15:56 ` Jeff Law
2023-06-18 18:11 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM3yNXp73iBm-sYzFa1v8XWB+WHAZocN7QXcf_p8eYN=QjK0VA@mail.gmail.com' \
--to=manolis.tsamis@vrull.eu \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=kito.cheng@gmail.com \
--cc=palmer@rivosinc.com \
--cc=philipp.tomsich@vrull.eu \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).