From: Jeff Law <jeffreyalaw@gmail.com>
To: Vineet Gupta <vineetg@rivosinc.com>, gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com, Palmer Dabbelt <palmer@rivosinc.com>,
gnu-toolchain@rivosinc.com, Robin Dapp <rdapp.gcc@gmail.com>
Subject: Re: [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265]
Date: Sat, 16 Mar 2024 14:28:03 -0600 [thread overview]
Message-ID: <03d87203-d4a1-47e1-afea-07b2f857492f@gmail.com> (raw)
In-Reply-To: <20240316173524.1147760-2-vineetg@rivosinc.com>
On 3/16/24 11:35 AM, Vineet Gupta wrote:
> ... if the constant can be represented as sum of two S12 values.
> The two S12 values could instead be fused with subsequent ADD insn.
> The helps
> - avoid an additional LUI insn
> - side benefits of not clobbering a reg
>
> e.g.
> w/o patch w/ patch
> long | |
> plus(unsigned long i) | li a5,4096 |
> { | addi a5,a5,-2032 | addi a0, a0, 2047
> return i + 2064; | add a0,a0,a5 | addi a0, a0, 17
> } | ret | ret
>
> NOTE: In theory not having const in a standalone reg might seem less
> CSE friendly, but for workloads in consideration these mat are
> from very late LRA reloads and follow on GCSE is not doing much
> currently.
As you note, there's a bit of natural tension between what to expose and
when. There's no clear cut answer and I might argue there never will be
given the design and implementation of various parts of the RTL pipeline.
We have some ports that expose early and just say "tough" if it's a
minor loss in some cases. Others choose to expose late. We're kindof a
mix of both in RISC-V land. The limited offsets we've got will tend to
make this problem a bit worse for RISC-V compared to other architectures.
>
> The real benefit however is seen in base+offset computation for array
> accesses and especially for stack accesses which are finalized late in
> optim pipeline, during LRA register allocation. Often the finalized
> offsets trigger LRA reloads resulting in mind boggling repetition of
> exact same insn sequence including LUI based constant materialization.
Yea, this is a known sore spot. I spent a good deal of time working on
the PA where we only have a 5 bit displacement for FP memory ops. You
can assume this caused reload fits and often resulted in horrific (and
repetitive code) code.
We did some interesting tricks to avoid the worst of the codegen issues.
None of those tricks really apply here since we're in an LRA world and
there's no difference in the allowed offset in a memory reference vs an
addi instruction.
>
> This shaves off 290 billion dynamic instrustions (QEMU icounts) in
> SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
> suite, there additional 10 billion shaved, with both gains and losses
> in indiv workloads as is usual with compiler changes.
That's a fantastic result.
>
> This should still be considered damage control as the real/deeper fix
> would be to reduce number of LRA reloads or CSE/anchor those during
> LRA constraint sub-pass (re)runs.
Agreed. But I think it's worth tackling from both ends. Generate
better code when we do have these reloads and avoid the reloads when we can.
>
> Implementation Details (for posterity)
> --------------------------------------
> - basic idea is to have a splitter selected via a new predicate for constant
> being possible sum of two S12 and provide the transform.
> This is however a 2 -> 2 transform which combine can't handle.
> So we specify it using a define_insn_and_split.
Right. For the record it looks like a 2->2 case because of the
mvconst_internal define_insn_and_split.
What I was talking about in the Tuesday meeting last week was the desire
to rethink that pattern precisely because it drives us to need to
implement patterns like yours rather than the more natural 3->2 or 4->3/2.
Furthermore, I have a suspicion that there's logicals where we're going
to want to do something similar and if we keep the mvconst_internal
pattern they're all going to have to be implemented as 2->2s with a
define_insn_and_split rather than the more natural 3->2.
Having said all that, I suspect the case you're diving into is far more
important than the logicals.
>
> | linux/scripts/bloat-o-meter build-gcc-240131/target/lib/libc.so.6 \
> build-gcc-240131-new-splitter-1-variant/target/lib/libc.so.6
> |
> | add/remove: 0/0 grow/shrink: 21/49 up/down: 520/-3056 (-2536)
> | Function old new delta
> | getnameinfo 2756 2892 +136
> ...
> | tempnam 136 144 +8
> | padzero 276 284 +8
> ...
> | __GI___register_printf_specifier 284 280 -4
> | __EI_xdr_array 468 464 -4
> | try_file_lock 268 260 -8
> | pthread_create@GLIBC_2 3520 3508 -12
> | __pthread_create_2_1 3520 3508 -12
> ...
> | _nss_files_setnetgrent 932 904 -28
> | _nss_dns_gethostbyaddr2_r 1524 1480 -44
> | build_trtable 3312 3208 -104
> | printf_positional 25000 22580 -2420
> | Total: Before=2107999, After=2105463, chg -0.12%
That's a bit funny. I was inside printf_positional Friday. It's almost
certainly the case that it sees a big improvement because it uses a lot
of stack space and I believe also has a frame pointer.
BUt yea, I wouldn't let the minor regressions stand in the way here.
>
> gcc/ChangeLog:
> PR target/106265 (partially)
> * config/riscv/riscv.h: New macros to check for sum of two S12
> range.
> * config/riscv/constraints.md: New constraint.
> * config/riscv/predicates.md: New Predicate.
> * config/riscv/riscv.md: New splitter to not materialize
> constant in desired range.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/sum-of-two-s12-const-1.c: New test: checks for
> new patterns output.
> * gcc.target/riscv/sum-of-two-s12-const-2.c: New test: should not
> ICE.
>
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> ---
> gcc/config/riscv/constraints.md | 6 +++
> gcc/config/riscv/predicates.md | 6 +++
> gcc/config/riscv/riscv.h | 15 +++++++
> gcc/config/riscv/riscv.md | 34 ++++++++++++++
> .../gcc.target/riscv/sum-of-two-s12-const-1.c | 45 +++++++++++++++++++
> .../gcc.target/riscv/sum-of-two-s12-const-2.c | 22 +++++++++
> 6 files changed, 128 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-2.c
>
> diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
> index 41acaea04eba..435461180c7e 100644
> --- a/gcc/config/riscv/constraints.md
> +++ b/gcc/config/riscv/constraints.md
> @@ -80,6 +80,12 @@
> (and (match_code "const_int")
> (match_test "LUI_OPERAND (ival)")))
>
> +(define_constraint "MiG"
> + "const can be represented as sum of any S12 values."
> + (and (match_code "const_int")
> + (ior (match_test "IN_RANGE (ival, 2048, 4094)")
> + (match_test "IN_RANGE (ival, -4096, -2049)"))))
> +
> (define_constraint "Ds3"
> "@internal
> 1, 2 or 3 immediate"
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 6c87a7bd1f49..89490339c2da 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -420,6 +420,12 @@
> return true;
> })
>
> +(define_predicate "const_two_s12"
> + (match_code "const_int")
> +{
> + return SUM_OF_TWO_S12 (INTVAL (op));
> +})
> +
> ;; CORE-V Predicates:
> (define_predicate "immediate_register_operand"
> (ior (match_operand 0 "register_operand")
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index da089a03e9d1..817661058896 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -624,6 +624,21 @@ enum reg_class
> (((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) - IMM_REACH) \
> || ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0)
>
> +/* True if a VALUE (constant) can be expressed as sum of two S12 constants
> + (in range -2048 to 2047).
> + Range check logic:
> + from: min S12 + 1 (or -1 depending on what side of zero)
> + to: two times the min S12 value (to max out S12 bits). */
> +
> +#define SUM_OF_TWO_S12_N(VALUE) \
> + (((VALUE) >= (-2048 * 2)) && ((VALUE) <= (-2048 - 1)))
> +
> +#define SUM_OF_TWO_S12_P(VALUE) \
> + (((VALUE) >= ( 2047 + 1)) && ((VALUE) <= ( 2047 * 2)))
> +
> +#define SUM_OF_TWO_S12(VALUE) \
> + (SUM_OF_TWO_S12_N (VALUE) || SUM_OF_TWO_S12_P (VALUE))
> +
> /* If this is a single bit mask, then we can load it with bseti. Special
> handling of SImode 0x80000000 on RV64 is done in riscv_build_integer_1. */
> #define SINGLE_BIT_MASK_OPERAND(VALUE) \
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index b16ed97909c0..79fe861ef91f 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -751,6 +751,40 @@
> [(set_attr "type" "arith")
> (set_attr "mode" "DI")])
>
> +;; Special case of adding a reg and constant if latter is sum of two S12
> +;; values (in range -2048 to 2047). Avoid materialized the const and fuse
> +;; into the add (with an additional add for 2nd value). Makes a 3 insn
> +;; sequence into 2 insn.
> +
> +(define_insn_and_split "*add<mode>3_const_sum_of_two_s12"
> + [(set (match_operand:P 0 "register_operand" "=r,r")
> + (plus:P (match_operand:P 1 "register_operand" " r,r")
> + (match_operand:P 2 "const_two_s12" " MiG,r")))]
> + ""
> + "add %0,%1,%2"
> + ""
> + [(set (match_dup 0)
> + (plus:P (match_dup 1) (match_dup 3)))
> + (set (match_dup 0)
> + (plus:P (match_dup 0) (match_dup 4)))]
> +{
> + int val = INTVAL (operands[2]);
> + if (SUM_OF_TWO_S12_P (val))
> + {
> + operands[3] = GEN_INT (2047);
> + operands[4] = GEN_INT (val - 2047);
> + }
> + else if (SUM_OF_TWO_S12_N (val))
> + {
> + operands[3] = GEN_INT (-2048);
> + operands[4] = GEN_INT (val + 2048);
> + }
> + else
> + gcc_unreachable ();
> +}
> + [(set_attr "type" "arith")
> + (set_attr "mode" "<P:MODE>")])
So why use "P" for your mode iterator? I would have expected GPR since
I think this works for both SI and DI mode as-is.
For the output template "#" for alternative 0 and the add instruction
for alternative 1 is probably better. That way it's clear to everyone
that alternative 0 is always meant to be split.
Don't you need some kind of condition on the split to avoid splitting
when you've got a register for operands[2]? It would seem to me that as
written, it would still try to spit and trigger an RTL checking error
when you try to extract INTVAL (operands[2]).
Jeff
next prev parent reply other threads:[~2024-03-16 20:28 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-16 17:35 [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak Vineet Gupta
2024-03-16 17:35 ` [gcc-15 1/3] RISC-V: avoid LUI based const materialization ... [part of PR/106265] Vineet Gupta
2024-03-16 20:28 ` Jeff Law [this message]
2024-03-19 0:07 ` Vineet Gupta
2024-03-23 5:59 ` Jeff Law
2024-03-16 17:35 ` [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned Vineet Gupta
2024-03-16 20:21 ` Jeff Law
2024-03-19 0:27 ` Vineet Gupta
2024-03-19 6:48 ` Andrew Waterman
2024-03-19 13:10 ` Jeff Law
2024-03-19 20:05 ` Vineet Gupta
2024-03-19 20:58 ` Andrew Waterman
2024-03-19 21:17 ` Palmer Dabbelt
2024-03-20 18:57 ` Jeff Law
2024-03-23 6:05 ` Jeff Law
2024-03-16 17:35 ` [gcc-15 3/3] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733] Vineet Gupta
2024-03-16 20:27 ` Jeff Law
2024-03-19 4:41 ` [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak Jeff Law
2024-03-21 0:45 ` Vineet Gupta
2024-03-21 14:36 ` scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak) Vineet Gupta
2024-03-21 14:45 ` Jeff Law
2024-03-21 17:19 ` Vineet Gupta
2024-03-21 19:56 ` Jeff Law
2024-03-22 0:34 ` scheduler queue flush Vineet Gupta
2024-03-22 8:47 ` scheduler queue flush (was Re: [gcc-15 0/3] RISC-V improve stack/array access by constant mat tweak) Richard Biener
2024-03-22 12:29 ` Jeff Law
2024-03-22 16:56 ` Vineet Gupta
2024-03-25 3:05 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=03d87203-d4a1-47e1-afea-07b2f857492f@gmail.com \
--to=jeffreyalaw@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=gnu-toolchain@rivosinc.com \
--cc=kito.cheng@gmail.com \
--cc=palmer@rivosinc.com \
--cc=rdapp.gcc@gmail.com \
--cc=vineetg@rivosinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).