public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch
@ 2023-03-06  7:51 xry111 at gcc dot gnu.org
  2023-03-06  7:52 ` [Bug rtl-optimization/109035] " xry111 at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-06  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

            Bug ID: 109035
           Summary: meaningless memory store on RISC-V and LoongArch
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: xry111 at gcc dot gnu.org
  Target Milestone: ---

On LoongArch, without -fPIE, the compiler stores a register with no reason:

$ cat t.c
int test(int x)
{
        char buf[128 << 10];
        return buf[x];
}
$ ./gcc/cc1 t.c -nostdinc  -O2 -fdump-rtl-all -o- 2>/dev/null | grep test: -A20
test:
.LFB0 = .
        lu12i.w $r13,-135168>>12                        # 0xfffffffffffdf000
        ori     $r13,$r13,4080
        add.d   $r3,$r3,$r13
.LCFI0 = .
        lu12i.w $r12,-131072>>12                        # 0xfffffffffffe0000
        lu12i.w $r13,131072>>12                 # 0x20000
        add.d   $r13,$r13,$r12
        addi.d  $r12,$r3,16
        add.d   $r12,$r13,$r12
        lu12i.w $r13,131072>>12                 # 0x20000
        st.d    $r12,$r3,8
        ori     $r13,$r13,16
        ldx.b   $r4,$r12,$r4
        add.d   $r3,$r3,$r13
.LCFI1 = .
        jr      $r1
.LFE0:
        .size   test, .-test
        .section        .eh_frame,"aw",@progbits

Note the "st.d  $r12,$r3,8" instruction is completely meaningless.

The t.c.300r.ira dump contains some "interesting" thing:

Pass 0 for finding pseudo/allocno costs

    a0 (r87,l0) best GR_REGS, allocno GR_REGS
    a1 (r84,l0) best NO_REGS, allocno NO_REGS
    a2 (r83,l0) best GR_REGS, allocno GR_REGS

  a0(r87,l0) costs: SIBCALL_REGS:2000,2000 JIRL_REGS:2000,2000
CSR_REGS:2000,2000 GR_REGS:2000,2000 FP_REGS:8000,8000 ALL_REGS:32000,32000
MEM:8000,8000
  a1(r84,l0) costs: SIBCALL_REGS:1000000,1000000 JIRL_REGS:1000000,1000000
CSR_REGS:1000000,1000000 GR_REGS:1000000,1000000 FP_REGS:1004000,1004000
ALL_REGS:1016000,1016000 MEM:1004000,1004000
  a2(r83,l0) costs: SIBCALL_REGS:1000000,1000000 JIRL_REGS:1000000,1000000
CSR_REGS:1000000,1000000 GR_REGS:1000000,1000000 FP_REGS:1004000,1004000
ALL_REGS:1008000,1008000 MEM:1004000,1004000

Here r84 is the pseudo register for ($frame - 131072).  Any idea why the
compiler selects "NO_REGS" here?

On RISC-V there is also a meaningless "sd      a5,8(sp)" instruction:
https://godbolt.org/z/aPorqj73b

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
@ 2023-03-06  7:52 ` xry111 at gcc dot gnu.org
  2023-03-06  8:02 ` xry111 at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-06  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Target|                            |riscv64gc-linux-gnu,
                   |                            |loongarch64-linux-gnu
   Last reconfirmed|                            |2023-03-06
             Status|UNCONFIRMED                 |NEW

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
  2023-03-06  7:52 ` [Bug rtl-optimization/109035] " xry111 at gcc dot gnu.org
@ 2023-03-06  8:02 ` xry111 at gcc dot gnu.org
  2023-03-08  3:16 ` chenglulu at loongson dot cn
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-06  8:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #1 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
Forgot to mention: a very strange aspect of this issue is adding "-fPIE" covers
it up.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
  2023-03-06  7:52 ` [Bug rtl-optimization/109035] " xry111 at gcc dot gnu.org
  2023-03-06  8:02 ` xry111 at gcc dot gnu.org
@ 2023-03-08  3:16 ` chenglulu at loongson dot cn
  2023-03-08  3:37 ` xry111 at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: chenglulu at loongson dot cn @ 2023-03-08  3:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

chenglulu <chenglulu at loongson dot cn> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chenglulu at loongson dot cn

--- Comment #2 from chenglulu <chenglulu at loongson dot cn> ---
I think this is most likely caused by the implementation of the public code. If
the immediate value is set to be large enough, such as 0xfffffff, aarch64 also
has the same problem.

$ cat t.c
int test(int x)
{
        char buf[0xfffffff];
        return buf[x];
}

$ ./cc1 t.c -o - -O2  2>/dev/null | grep test -A20
test:
.LFB0:
        .cfi_startproc
        mov     x12, 16
        mov     x2, 16
        movk    x12, 0x1000, lsl 16
        sub     sp, sp, x12
        .cfi_def_cfa_offset 268435472
        movk    x2, 0x1000, lsl 16
        mov     x1, -268435456
        add     x1, x2, x1
        add     x1, sp, x1
        str     x1, [sp, 8]
        ldrb    w0, [x1, w0, sxtw]
        add     sp, sp, x12
        .cfi_def_cfa_offset 0
        ret

The "str x1, [sp, 8]" instruction also has the same problem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-03-08  3:16 ` chenglulu at loongson dot cn
@ 2023-03-08  3:37 ` xry111 at gcc dot gnu.org
  2023-03-08  3:55 ` chenglulu at loongson dot cn
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-03-08  3:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #3 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to chenglulu from comment #2)
> I think this is most likely caused by the implementation of the public code.

Agree, so I filled the component as rtl-optimization.

I tracked a (non root) cause to the line 1944 in ira-costs.cc:

          if (i >= first_moveable_pseudo && i < last_moveable_pseudo)
            i_mem_cost = 0;
          else if (equiv_savings < 0)
            i_mem_cost = -equiv_savings;
          else if (equiv_savings > 0)
            {
              i_mem_cost = 0;    // <====== HERE
              for (k = cost_classes_ptr->num - 1; k >= 0; k--)
                i_costs[k] += equiv_savings;
            }

I don't really understand why we should prefer the memory if there is a
REG_EQUIV note, nor why this does not happen with -fPIE.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-03-08  3:37 ` xry111 at gcc dot gnu.org
@ 2023-03-08  3:55 ` chenglulu at loongson dot cn
  2023-03-11  6:44 ` chenglulu at loongson dot cn
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: chenglulu at loongson dot cn @ 2023-03-08  3:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #4 from chenglulu <chenglulu at loongson dot cn> ---
(In reply to Xi Ruoyao from comment #3)

> I don't really understand why we should prefer the memory if there is a
> REG_EQUIV note, nor why this does not happen with -fPIE.
I didn't understand the optimization of this place, but I found that if
FRAME_GROWS_DOWNWARD is set to 0, then this problem can be avoided.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-03-08  3:55 ` chenglulu at loongson dot cn
@ 2023-03-11  6:44 ` chenglulu at loongson dot cn
  2023-03-11  7:00 ` chenglulu at loongson dot cn
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: chenglulu at loongson dot cn @ 2023-03-11  6:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #5 from chenglulu <chenglulu at loongson dot cn> ---
On AARCH64:
$cat t.c
int test(int x)
{
        char buf[128 << 10];
        return buf[x];
}
$cat t-1.c
int test(int x)
{
        char buf[0xfffffff];
        return buf[x];
}

The generated assemblies are as follows:

      t.s                          |             t-1.s
test:                              |test:
.LFB0:                             |.LFB0:
        .cfi_startproc             |        .cfi_startproc
        sub     sp, sp, #131072    |        mov     x12, 16
        .cfi_def_cfa_offset 131072 |        mov     x2, 16
        ldrb    w0, [sp, w0, sxtw] |        movk    x12, 0x1000, lsl 16
        add     sp, sp, 131072     |        sub     sp, sp, x12
        .cfi_def_cfa_offset 0      |        .cfi_def_cfa_offset 268435472
        ret                        |        movk    x2, 0x1000, lsl 16
        .cfi_endproc               |        mov     x1, -268435456
.LFE0:                             |        add     x1, x2, x1
        .size   test, .-test       |        add     x1, sp, x1
                                   |        str     x1, [sp, 8]
                                   |        ldrb    w0, [x1, w0, sxtw]
                                   |        add     sp, sp, x12
                                   |        .cfi_def_cfa_offset 0
                                   |        ret

In my opinion, not only the instruction "str x1, [sp, 8]" is redundant in
t-1.s.
The following instructions are redundant:
"movk   x2, 0x1000, lsl 16
 mov    x1, -268435456
 add    x1, x2, x1
 add    x1, sp, x1
 str    x1, [sp, 8]"

Comparing the intermediate results of the two test cases t.c t-1.c reload pass
optimization, I found the reason for these redundant instructions.
In t.c,
(insn 7 15 12 2 (set (reg/f:DI 96)
        (plus:DI (reg/f:DI 64 sfp)
            (const_int -131072 [0xfffffffffffe0000]))) "t-1.c":4:12 153
{*adddi3_aarch64}
     (expr_list:REG_EQUIV (plus:DI (reg/f:DI 64 sfp)
            (const_int -131072 [0xfffffffffffe0000]))
        (nil)))
It will be deleted after reload.

In t-1.c, the behavior of insn 7 in t.c is realized by two instructions
(insn 7 16 8 2 (set (reg:DI 97)
        (const_int -268435456 [0xfffffffff0000000])) "t.c":4:12 65
{*movdi_aarch64}
     (expr_list:REG_EQUIV (const_int -268435456 [0xfffffffff0000000])
        (nil)))
(insn 8 7 13 2 (set (reg:DI 96)
        (plus:DI (reg/f:DI 64 sfp)
            (reg:DI 97))) "t.c":4:12 153 {*adddi3_aarch64}
     (expr_list:REG_DEAD (reg:DI 97)
        (expr_list:REG_EQUIV (plus:DI (reg/f:DI 64 sfp)
                (const_int -268435456 [0xfffffffff0000000]))
            (nil))))
Due to the problem of reload pass optimization, these two instructions are not
deleted, thus generating redundant instructions.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-03-11  6:44 ` chenglulu at loongson dot cn
@ 2023-03-11  7:00 ` chenglulu at loongson dot cn
  2023-11-02 19:25 ` vmakarov at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: chenglulu at loongson dot cn @ 2023-03-11  7:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #6 from chenglulu <chenglulu at loongson dot cn> ---
I tried changing the code,
diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 42206366669..efaea6922b5 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -914,6 +914,11 @@ eliminate_regs_in_insn (rtx_insn *insn, bool replace_p,
bool first_p,
       /* First see if the source is of the form (plus (...) CST).  */
       if (plus_src && poly_int_rtx_p (XEXP (plus_src, 1), &offset))
        plus_cst_src = plus_src;
+      else if (plus_src && ira_reg_equiv[REGNO (XEXP (plus_src, 1))].constant)
+       {
+         poly_int_rtx_p (ira_reg_equiv[REGNO (XEXP (plus_src, 1))].constant,
&offset);
+         plus_cst_src = gen_rtx_PLUS (GET_MODE (XEXP (plus_src, 0)),XEXP
(plus_src, 0), ira_reg_equiv[REGNO (XEXP (plus_src, 1))].constant);
+       }
       /* Check that the first operand of the PLUS is a hard reg or
         the lowpart subreg of one.  */
       if (plus_cst_src)

Redundant instructions can be eliminated, but I don't know if it can be
modified like this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-03-11  7:00 ` chenglulu at loongson dot cn
@ 2023-11-02 19:25 ` vmakarov at gcc dot gnu.org
  2023-11-02 19:47 ` law at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2023-11-02 19:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #7 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
For last 2 weeks I pushed several patches for better dealing with equivalences
in RA.

It seems the patches solves the current PR.  I checked the test code generation
for loongarch and aarch64 and did not find spilled pseudos which are reported
here.

I think the PR should closed as fixed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-11-02 19:25 ` vmakarov at gcc dot gnu.org
@ 2023-11-02 19:47 ` law at gcc dot gnu.org
  2023-11-03  6:52 ` xry111 at gcc dot gnu.org
  2023-11-03  7:46 ` xry111 at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: law at gcc dot gnu.org @ 2023-11-02 19:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #8 from Jeffrey A. Law <law at gcc dot gnu.org> ---
No spills on rv64 either.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-11-02 19:47 ` law at gcc dot gnu.org
@ 2023-11-03  6:52 ` xry111 at gcc dot gnu.org
  2023-11-03  7:46 ` xry111 at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-11-03  6:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #9 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to Vladimir Makarov from comment #7)
> For last 2 weeks I pushed several patches for better dealing with
> equivalences in RA.
> 
> It seems the patches solves the current PR.  I checked the test code
> generation for loongarch and aarch64 and did not find spilled pseudos which
> are reported here.
> 
> I think the PR should closed as fixed.

Thanks!  For LoongArch this issue has been "papered over" since r14-19 though
it's a completely unrelated change.

I'm closing the PR as fixed anyway.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch
  2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-11-03  6:52 ` xry111 at gcc dot gnu.org
@ 2023-11-03  7:46 ` xry111 at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-11-03  7:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-03  7:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06  7:51 [Bug rtl-optimization/109035] New: meaningless memory store on RISC-V and LoongArch xry111 at gcc dot gnu.org
2023-03-06  7:52 ` [Bug rtl-optimization/109035] " xry111 at gcc dot gnu.org
2023-03-06  8:02 ` xry111 at gcc dot gnu.org
2023-03-08  3:16 ` chenglulu at loongson dot cn
2023-03-08  3:37 ` xry111 at gcc dot gnu.org
2023-03-08  3:55 ` chenglulu at loongson dot cn
2023-03-11  6:44 ` chenglulu at loongson dot cn
2023-03-11  7:00 ` chenglulu at loongson dot cn
2023-11-02 19:25 ` vmakarov at gcc dot gnu.org
2023-11-02 19:47 ` law at gcc dot gnu.org
2023-11-03  6:52 ` xry111 at gcc dot gnu.org
2023-11-03  7:46 ` xry111 at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).