public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/105733] New: riscv: Poor codegen for large stack frames
@ 2022-05-26  0:49 jrtc27 at jrtc27 dot com
  2022-05-26  6:12 ` [Bug target/105733] " kito at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: jrtc27 at jrtc27 dot com @ 2022-05-26  0:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

            Bug ID: 105733
           Summary: riscv: Poor codegen for large stack frames
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jrtc27 at jrtc27 dot com
  Target Milestone: ---
            Target: riscv*-*-*

For the following test:

#define BUF_SIZE 2064

void
foo(unsigned long i)
{
    volatile char buf[BUF_SIZE];

    buf[i] = 0;
}

GCC currently generates:

foo:
        li      t0,-4096
        addi    t0,t0,2016
        li      a4,4096
        add     sp,sp,t0
        li      a5,-4096
        addi    a4,a4,-2032
        add     a4,a4,a5
        addi    a5,sp,16
        add     a5,a4,a5
        add     a0,a5,a0
        li      t0,4096
        sd      a5,8(sp)
        sb      zero,2032(a0)
        addi    t0,t0,-2016
        add     sp,sp,t0
        jr      ra

whereas Clang generates the much shorter:

foo:
        lui     a1, 1
        addiw   a1, a1, -2016
        sub     sp, sp, a1
        addi    a1, sp, 16
        add     a0, a0, a1
        sb      zero, 0(a0)
        lui     a0, 1
        addiw   a0, a0, -2016
        add     sp, sp, a0
        ret

The:

        li      a4,4096
        ...
        li      a5,-4096
        addi    a4,a4,-2032
        add     a4,a4,a5

sequence in particular is rather surprising to see rather than just li a4,-2032
and constant-folding that would halve the instruction count difference between
GCC and Clang alone.

See: https://godbolt.org/z/8EGc85dsf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105733] riscv: Poor codegen for large stack frames
  2022-05-26  0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
@ 2022-05-26  6:12 ` kito at gcc dot gnu.org
  2022-06-07  2:41 ` wilson at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: kito at gcc dot gnu.org @ 2022-05-26  6:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Kito Cheng <kito at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kito at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-05-26

--- Comment #1 from Kito Cheng <kito at gcc dot gnu.org> ---
Confirmed, Thanks for reporting this!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105733] riscv: Poor codegen for large stack frames
  2022-05-26  0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
  2022-05-26  6:12 ` [Bug target/105733] " kito at gcc dot gnu.org
@ 2022-06-07  2:41 ` wilson at gcc dot gnu.org
  2022-07-25 20:39 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: wilson at gcc dot gnu.org @ 2022-06-07  2:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Jim Wilson <wilson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson <wilson at gcc dot gnu.org> ---
This is a known problem that I have commented on before.  For instance here
    https://github.com/riscv-collab/riscv-gcc/issues/193

Copying my reply to here since this is a better place for the bug report:

This is a known problem. GCC first emits code using a frame pointer, and then
tries to eliminate the frame pointer during register allocation. When the frame
size is larger than the immediate field of an add, we need temporaries to
compute frame offsets. Compiler optimizations like common subexpression
elimination (cse) try to optimize the calculation of the temporaries. Then when
we try to eliminate the frame pointer, we run into trouble because the cse opt
changes interfere with the frame pointer elimination, and we end up with an
ugly inefficient mess.

This problem isn't RISC-V specific, but since RISC-V has 12-bit immediates and
most everyone else has 16-bit immediates, we hit the problem sooner, and thus
makes it much more visible for RISC-V. The example above is putting 12KB on the
stack, which is larger than 12-bits of range (4KB), but well within 16-bits of
range (64KB).

I have a prototype of a patch to fix it by allowing >12-bit offsets for frame
pointer references, and then fixing this when eliminating the frame pointer,
but this can cause other problems, and needs more work to be usable. I have no
idea when someone might try to finish the patch.

End of inclusion from github.

To improve my earlier answer, we have signed 12-bit immediates, so the trouble
actually starts at 2048 bytes, and Jessica's example is larger than that. 
Reduce BUFSIZ to 2044 and you get a reasonable result.

foo:
        li      a5,4096
        addi    a5,a5,-2048
        addi    sp,sp,-2048
        add     a5,a5,a0
        add     a0,a5,sp
        li      t0,4096
        sb      zero,-2048(a0)
        addi    t0,t0,-2048
        add     sp,sp,t0
        jr      ra

There is still a bit of oddness here.  We are adding 2048 to a0 and then using
an address -2048(a0).  I think that is more cse trouble.  2048 requires two
instructions to load into a register which is likely confusing something
somewhere.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105733] riscv: Poor codegen for large stack frames
  2022-05-26  0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
  2022-05-26  6:12 ` [Bug target/105733] " kito at gcc dot gnu.org
  2022-06-07  2:41 ` wilson at gcc dot gnu.org
@ 2022-07-25 20:39 ` pinskia at gcc dot gnu.org
  2022-07-25 20:40 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-25 20:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vineet.gupta at linux dot dev

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 106439 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105733] riscv: Poor codegen for large stack frames
  2022-05-26  0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
                   ` (2 preceding siblings ...)
  2022-07-25 20:39 ` pinskia at gcc dot gnu.org
@ 2022-07-25 20:40 ` pinskia at gcc dot gnu.org
  2023-12-15 21:47 ` vineetg at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-25 20:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105733] riscv: Poor codegen for large stack frames
  2022-05-26  0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
                   ` (3 preceding siblings ...)
  2022-07-25 20:40 ` pinskia at gcc dot gnu.org
@ 2023-12-15 21:47 ` vineetg at gcc dot gnu.org
  2024-05-21 17:35 ` cvs-commit at gcc dot gnu.org
  2024-05-28 17:15 ` vineetg at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: vineetg at gcc dot gnu.org @ 2023-12-15 21:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Vineet Gupta <vineetg at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vineetg at gcc dot gnu.org

--- Comment #4 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
There has been good improvements in gcc codegen specially with commit below.

commit 6619b3d4c15cd754798b1048c67f3806bbcc2e6d
Author: Jivan Hakobyan <jivanhakobyan9@gmail.com>
Date:   Wed Aug 23 14:10:30 2023 -0600

    Improve quality of code from LRA register elimination

    This is primarily Jivan's work, I'm mostly responsible for the write-up and
    coordinating with Vlad on a few questions.

    On targets with limitations on immediates usable in arithmetic
instructions,
    LRA's register elimination phase can construct fairly poor code.

     Tip W/o commit 6619b3d4c    |         With 6619b3d4c   
                                 |
foo:                             | foo:
        li      t0,-4096         |      li      t0,-4096
        addi    t0,t0,2032       |      addi    t0,t0,2032
        li      a5,0             |
        li      a4,0             |
        add     sp,sp,t0         |      add     sp,sp,t0
        add     a4,a4,a5         |
        add     a5,a4,sp         |      add     a5,a5,a0
        add     a5,a5,a0         |
        li      t0,4096          |      li      t0,4096
        sb      zero,0(a5)       |      sb      zero,0(a5)
        addi    t0,t0,-2032      |      addi    t0,t0,-2032
        add     sp,sp,t0         |      add     sp,sp,t0
        jr      ra               |      jr      ra

We still have the weird LUI 4096 based constant construction. I have a patch to
avoid 4096 for certain ranges  [-4096,-2049] or [2048,4094] (cribbed from
llvm).
e.g. 2064 = 2047 + 17 and we could potentially "spread" the 2 parts over 2 adds
to SP, avoiding the LUI. However if a const costs more than 1 insn, gcc wants
to force it in a register rather than split the add operation into 2 adds with
the split constants.

expand_binop
  expand_binop_directly
       avoid_expensive_constant

/* X is to be used in mode MODE as operand OPN to BINOPTAB.  If we're
   optimizing, and if the operand is a constant that costs more than
   1 instruction, force the constant into a register and return that
   register.  Return X otherwise.  UNSIGNEDP says whether X is unsigned.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105733] riscv: Poor codegen for large stack frames
  2022-05-26  0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
                   ` (4 preceding siblings ...)
  2023-12-15 21:47 ` vineetg at gcc dot gnu.org
@ 2024-05-21 17:35 ` cvs-commit at gcc dot gnu.org
  2024-05-28 17:15 ` vineetg at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-21 17:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Vineet Gupta <vineetg@gcc.gnu.org>:

https://gcc.gnu.org/g:f9cfc192ed0127edb7e79818917dd2859fce4d44

commit r15-757-gf9cfc192ed0127edb7e79818917dd2859fce4d44
Author: Vineet Gupta <vineetg@rivosinc.com>
Date:   Mon May 13 11:46:03 2024 -0700

    RISC-V: avoid LUI based const mat in prologue/epilogue expansion
[PR/105733]

    If the constant used for stack offset can be expressed as sum of two S12
    values, the constant need not be materialized (in a reg) and instead the
    two S12 bits can be added to instructions involved with frame pointer.
    This avoids burning a register and more importantly can often get down
    to be 2 insn vs. 3.

    The prev patches to generally avoid LUI based const materialization didn't
    fix this PR and need this directed fix in funcion prologue/epilogue
    expansion.

    This fix doesn't move the neddle for SPEC, at all, but it is still a
    win considering gcc generates one insn fewer than llvm for the test ;-)

       gcc-13.1 release   |      gcc 230823     |                   |
                          |    g6619b3d4c15c    |   This patch      | 
clang/llvm
   
---------------------------------------------------------------------------------
    li      t0,-4096     | li    t0,-4096      | addi  sp,sp,-2048 | addi
sp,sp,-2048
    addi    t0,t0,2016   | addi  t0,t0,2032    | add   sp,sp,-16   | addi
sp,sp,-32
    li      a4,4096      | add   sp,sp,t0      | add   a5,sp,a0    | add 
a1,sp,16
    add     sp,sp,t0     | addi  a5,sp,-2032   | sb    zero,0(a5)  | add 
a0,a0,a1
    li      a5,-4096     | add   a0,a5,a0      | addi  sp,sp,2032  | sb  
zero,0(a0)
    addi    a4,a4,-2032  | li    t0, 4096      | addi  sp,sp,32    | addi
sp,sp,2032
    add     a4,a4,a5     | sb    zero,2032(a0) | ret               | addi
sp,sp,48
    addi    a5,sp,16     | addi  t0,t0,-2032   |                   | ret
    add     a5,a4,a5     | add   sp,sp,t0      |
    add     a0,a5,a0     | ret                 |
    li      t0,4096      |
    sd      a5,8(sp)     |
    sb      zero,2032(a0)|
    addi    t0,t0,-2016  |
    add     sp,sp,t0     |
    ret                  |

    gcc/ChangeLog:
            PR target/105733
            * config/riscv/riscv.h: New macros for with aligned offsets.
            * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
            function to split a sum of two s12 values into constituents.
            (riscv_expand_prologue): Handle offset being sum of two S12.
            (riscv_expand_epilogue): Ditto.
            * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

    gcc/testsuite/ChangeLog:
            * gcc.target/riscv/pr105733.c: New Test.
            * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
            expect LUI 4096.
            * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
            * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
            * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
            * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
            * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
            * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.

    Tested-by: Edwin Lu <ewlu@rivosinc.com> # pre-commit-CI #1568
    Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105733] riscv: Poor codegen for large stack frames
  2022-05-26  0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
                   ` (5 preceding siblings ...)
  2024-05-21 17:35 ` cvs-commit at gcc dot gnu.org
@ 2024-05-28 17:15 ` vineetg at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: vineetg at gcc dot gnu.org @ 2024-05-28 17:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733

Vineet Gupta <vineetg at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
Fixed with aforementioned commit for gcc-15.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-28 17:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-26  0:49 [Bug target/105733] New: riscv: Poor codegen for large stack frames jrtc27 at jrtc27 dot com
2022-05-26  6:12 ` [Bug target/105733] " kito at gcc dot gnu.org
2022-06-07  2:41 ` wilson at gcc dot gnu.org
2022-07-25 20:39 ` pinskia at gcc dot gnu.org
2022-07-25 20:40 ` pinskia at gcc dot gnu.org
2023-12-15 21:47 ` vineetg at gcc dot gnu.org
2024-05-21 17:35 ` cvs-commit at gcc dot gnu.org
2024-05-28 17:15 ` vineetg at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).