public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Palmer Dabbelt <palmer@rivosinc.com>
To: Andrew Waterman <andrew@sifive.com>
Cc: Vineet Gupta <vineetg@rivosinc.com>,
	gcc-patches@gcc.gnu.org, Kito Cheng <kito.cheng@gmail.com>,
	jeffreyalaw@gmail.com, gnu-toolchain@rivosinc.com
Subject: Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]
Date: Fri, 30 Jun 2023 17:36:46 -0700 (PDT)	[thread overview]
Message-ID: <mhng-22dd17fe-bbc4-45e4-aa6f-91237084ed01@palmer-ri-x1c9a> (raw)
In-Reply-To: <CA++6G0B9KMD3p4q42TKVpmLFWBV-VKVkaMBhuKUvReY_anLt_g@mail.gmail.com>

On Fri, 30 Jun 2023 17:25:54 PDT (-0700), Andrew Waterman wrote:
> On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>>
>>
>>
>> On 6/30/23 16:50, Andrew Waterman wrote:
>> > I don't believe this is correct; the subtraction is needed to account
>> > for the fact that the low part might be negative, resulting in a
>> > borrow from the high part.  See the output for your test case below:
>> >
>> > $ cat test.c
>> > #include <stdio.h>
>> >
>> > int main()
>> > {
>> >    unsigned long result, tmp;
>> >
>> > asm (
>> >    "li      %1,-252645376\n"
>> >    "addi    %1,%1,240\n"
>> >    "slli    %0,%1,32\n"
>> >    "add     %0,%0,%1"
>> >      : "=r" (result), "=r" (tmp));
>> >
>> >    printf("%lx\n", result);
>> >
>> >    return 0;
>> > }
>> > $ riscv64-unknown-elf-gcc -O2 test.c
>> > $ spike pk a.out
>> > bbl loader
>> > f0f0f0eff0f0f0f0
>> > $
>>
>> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
>> So is it possible to have a better code seq for the testcase at all ?
>
> You're welcome!
>
> When Zba is implemented, then inserting a zext.w would do the trick;
> see below.  (The generalization is that the zext.w is needed if the
> 32-bit constant is negative.)  When Zba is not implemented, I think
> the original sequence is optimal.
>
> li      a5, -252645376
> addi    a5, a5, 240
> slli    a0, a5, 32
> zext.w  a5, a5
> add     a0, a0, a5

For the non-Zba case, I think we can leverage the two high parts 
starting out the same to save an instruction generating the constant.  
So for the original code sequence of 

        li      a5,-252645376
        addi    a5,a5,241
        li      a0,-252645376
        slli    a5,a5,32
        addi    a0,a0,240
        add     a0,a5,a0
        ret

we could instead generate

        li      a5,-252645376
        addi    a0,a5,240
        addi    a5,a5,241
        slli    a5,a5,32
        add     a0,a5,a0
        ret

which IIUC produces the same result.  I think something along the lines 
of this (with the corresponding cost function updates) would do it

    diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
    index de578b5b899..32b6033a966 100644
    --- a/gcc/config/riscv/riscv.cc
    +++ b/gcc/config/riscv/riscv.cc
    @@ -704,7 +704,13 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
       rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
     
       riscv_move_integer (hi, hi, hival, mode);
    -  riscv_move_integer (lo, lo, loval, mode);
    +  if (riscv_integer_cost (loval - hival) + 1 < riscv_integer_cost (loval)) {
    +    rtx delta = gen_reg_rrtx (mode);
    +    riscv_move_integer (delta, delta, loval - hival, mode);
    +    lo = gen_rtx_fmt_ee (PLUS, mode, hi, delta);
    +  } else {
    +    riscv_move_integer (lo, lo, loval, mode);
    +  }
     
       hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
       hi = force_reg (mode, hi);

though I suppose that would produce a slightly different sequence that has the
same number of instructions but a slightly longer dependency chain, something
more like

        li      a5,-252645376
        addi    a5,a5,241
        addi    a0,a5,-1
        slli    a5,a5,32
        add     a0,a5,a0
        ret

Take that all with a grain of salt, though, as I just ate some very spicy
chicken and can barely see straight :)


>
>
>>
>> -Vineet
>>
>> >
>> >
>> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta <vineetg@rivosinc.com> wrote:
>> >>
>> >>
>> >> On 6/30/23 16:33, Vineet Gupta wrote:
>> >>> Ran into a minor snafu in const splitting code when playing with test
>> >>> case from an old PR/23813.
>> >>>
>> >>>        long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
>> >>>
>> >>> This currently generates
>> >>>
>> >>>        li      a5,-252645376
>> >>>        addi    a5,a5,241
>> >>>        li      a0,-252645376
>> >>>        slli    a5,a5,32
>> >>>        addi    a0,a0,240
>> >>>        add     a0,a5,a0
>> >>>        ret
>> >>>
>> >>> The signed math in hival extraction introduces an additional bit,
>> >>> causing loval == hival check to fail.
>> >>>
>> >>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at ../gcc/config/riscv/riscv.cc:702
>> >>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>> >>> | (gdb)n
>> >>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>> >>> | (gdb)
>> >> FWIW (and I missed adding this observation to the changelog) I pondered
>> >> about using unsigned loval/hival with zext_hwi() but that in certain
>> >> cases can cause additional insns
>> >>
>> >> e.g. constant 0x8000_0000 is codegen to LI 1 +SLLI 31 vs, LI
>> >> 0xFFFFFFFF_80000000
>> >>
>> >>
>> >>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>> >>> | (gdb) p/x val
>> >>> | $2 = 0xf0f0f0f0f0f0f0f0
>> >>> | (gdb) p/x loval
>> >>> | $3 = 0xfffffffff0f0f0f0
>> >>> | (gdb) p/x hival
>> >>> | $4 = 0xfffffffff0f0f0f1
>> >>>                          ^^^
>> >>> Fix that by eliding the subtraction in shift.
>> >>>
>> >>> With patch:
>> >>>
>> >>>        li      a5,-252645376
>> >>>        addi    a5,a5,240
>> >>>        slli    a0,a5,32
>> >>>        add     a0,a0,a5
>> >>>        ret
>> >>>
>> >>> gcc/ChangeLog:
>> >>>
>> >>>        * config/riscv/riscv.cc (riscv_split_integer): hival computation
>> >>>          do elide subtraction of loval.
>> >>>        * (riscv_split_integer_cost): Ditto.
>> >>>        * (riscv_build_integer): Ditto
>> >>>
>> >>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
>> >>> ---
>> >>> I wasn't planning to do any more work on large const stuff, but just ran into it this
>> >>> on a random BZ entry when trying search for redundant constant stuff.
>> >>> The test seemed too good to pass :-)
>> >>> ---
>> >>>    gcc/config/riscv/riscv.cc | 6 +++---
>> >>>    1 file changed, 3 insertions(+), 3 deletions(-)
>> >>>
>> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> >>> index 5ac187c1b1b4..377d3aac794b 100644
>> >>> --- a/gcc/config/riscv/riscv.cc
>> >>> +++ b/gcc/config/riscv/riscv.cc
>> >>> @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value,
>> >>>          && (value > INT32_MAX || value < INT32_MIN))
>> >>>        {
>> >>>          unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
>> >>> -      unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
>> >>> +      unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
>> >>>          struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
>> >>>          struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
>> >>>          int hi_cost, lo_cost;
>> >>> @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
>> >>>    {
>> >>>      int cost;
>> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>> >>>      struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
>> >>>
>> >>>      cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
>> >>> @@ -700,7 +700,7 @@ static rtx
>> >>>    riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
>> >>>    {
>> >>>      unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
>> >>> -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
>> >>> +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
>> >>>      rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>> >>>
>> >>>      riscv_move_integer (lo, lo, loval, mode);
>>

  reply	other threads:[~2023-07-01  0:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-30 23:33 Vineet Gupta
2023-06-30 23:41 ` Vineet Gupta
2023-06-30 23:50   ` Andrew Waterman
2023-07-01  0:13     ` Vineet Gupta
2023-07-01  0:25       ` Andrew Waterman
2023-07-01  0:36         ` Palmer Dabbelt [this message]
2023-07-01  8:00           ` Andrew Waterman
2023-07-01 14:04             ` Jeff Law
2023-07-01 14:34               ` Palmer Dabbelt
2023-07-01 15:08               ` Andrew Waterman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mhng-22dd17fe-bbc4-45e4-aa6f-91237084ed01@palmer-ri-x1c9a \
    --to=palmer@rivosinc.com \
    --cc=andrew@sifive.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=gnu-toolchain@rivosinc.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@gmail.com \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).