[Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1
@ 2023-09-11 18:24 lis8215 at gmail dot com
  2024-06-04  6:55 ` [Bug target/111376] " syq at gcc dot gnu.org
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: lis8215 at gmail dot com @ 2023-09-11 18:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

            Bug ID: 111376
           Summary: missed optimization of one bit test on MIPS32r1
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lis8215 at gmail dot com
  Target Milestone: ---

Created attachment 55879
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55879&action=edit
Silly patch to enable SLL+BLTZ/BGEZ

Currently for testing bits above 14-th the following instructions emitted:

 LUI $t1, 0x1000         # 0x10000000
 AND $t0, $t1, $t0
 BEQ/BNE $t0, $Lxx

However there's shorter & faster alternative, just need to
shift the bit of interest to the sign bit and jump with BLTZ/BGEZ.
The code above can be replaced with:

 SLL $t0, $0, 3
 BGEZ/BLTZ $t0, $Lxx

Not sure if it can be applied to MIPS64 without EXT/INS instructions
and to older MIPS revisions (I..V).
But for MIPS32 it helps reduce code size by removing 1 insn per ~700.
evaluated on linux kernel and python3.11.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
@ 2024-06-04  6:55 ` syq at gcc dot gnu.org
  2024-06-04  7:00 ` syq at gcc dot gnu.org
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-04  6:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #1 from YunQiang Su <syq at gcc dot gnu.org> ---
RISC-V has this problem, too.
Maybe we can try to combine it in `combine` pass, while it may be not easy.
It may break some code like:

```
int f1();
int f2();

int f(int a) {
        int p = (a & 0x80000);
        if (p)
                return p;
        else
                return f2();
}
```

And in fact your patch also break it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
  2024-06-04  6:55 ` [Bug target/111376] " syq at gcc dot gnu.org
@ 2024-06-04  7:00 ` syq at gcc dot gnu.org
  2024-06-04  7:06 ` lis8215 at gmail dot com
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-04  7:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #2 from YunQiang Su <syq at gcc dot gnu.org> ---
(In reply to YunQiang Su from comment #1)
> RISC-V has this problem, too.
> Maybe we can try to combine it in `combine` pass, while it may be not easy.
> It may break some code like:
> 
> ```
> int f1();
> int f2();
> 
> int f(int a) {
>         int p = (a & 0x80000);
>         if (p)
>                 return p;
>         else
>                 return f2();
> }
> ```
> 
> And in fact your patch also break it.

Ohh, this comment is not correct....

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
  2024-06-04  6:55 ` [Bug target/111376] " syq at gcc dot gnu.org
  2024-06-04  7:00 ` syq at gcc dot gnu.org
@ 2024-06-04  7:06 ` lis8215 at gmail dot com
  2024-06-05 11:02 ` syq at gcc dot gnu.org
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: lis8215 at gmail dot com @ 2024-06-04  7:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #3 from Siarhei Volkau <lis8215 at gmail dot com> ---
I know that the patch breaks condmove cases, that's why it is silly.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (2 preceding siblings ...)
  2024-06-04  7:06 ` lis8215 at gmail dot com
@ 2024-06-05 11:02 ` syq at gcc dot gnu.org
  2024-06-05 16:56 ` syq at gcc dot gnu.org
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-05 11:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #4 from YunQiang Su <syq at gcc dot gnu.org> ---
Ohh, RISC-V has solved this problem in recent release.
So we can just do similar work.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (3 preceding siblings ...)
  2024-06-05 11:02 ` syq at gcc dot gnu.org
@ 2024-06-05 16:56 ` syq at gcc dot gnu.org
  2024-06-06  9:45 ` lis8215 at gmail dot com
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-05 16:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #5 from YunQiang Su <syq at gcc dot gnu.org> ---
I copy the RTL pattern from RISC-V, and it seems work

```
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -6253,6 +6253,40 @@ (define_insn "*branch_bit<bbv><mode>_inverted"
 }
   [(set_attr "type"         "branch")
    (set_attr "branch_likely" "no")])
+
+(define_insn_and_split "*branch_on_bit<mode>"
+  [(set (pc)
+       (if_then_else
+           (match_operator 0 "equality_operator"
+               [(zero_extract:GPR (match_operand:GPR 2 "register_operand" "d")
+                                (const_int 1)
+                                (match_operand:GPR 3 "const_int_operand"))
+                                (const_int 0)])
+           (label_ref (match_operand 1))
+           (pc)))]
+  "!ISA_HAS_BBIT && !ISA_HAS_EXT_INS && !TARGET_MIPS16"
+  "#"
+  "!reload_completed"
+  [(set (match_dup 4)
+       (ashift:GPR (match_dup 2) (match_dup 3)))
+   (set (pc)
+       (if_then_else
+           (match_op_dup 0 [(match_dup 4) (const_int 0)])
+           (label_ref (match_operand 1))
+           (pc)))]
+{
+  int shift = GET_MODE_BITSIZE (<MODE>mode) - 1 - INTVAL (operands[3]);
+  operands[3] = GEN_INT (shift);
+  operands[4] = gen_reg_rtx (<MODE>mode);
+
+  if (GET_CODE (operands[0]) == EQ)
+    operands[0] = gen_rtx_GE (<MODE>mode, operands[4], const0_rtx);
+  else
+    operands[0] = gen_rtx_LT (<MODE>mode, operands[4], const0_rtx);
+}
+[(set_attr "type" "branch")])
+
+
```

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (4 preceding siblings ...)
  2024-06-05 16:56 ` syq at gcc dot gnu.org
@ 2024-06-06  9:45 ` lis8215 at gmail dot com
  2024-06-06 22:16 ` syq at gcc dot gnu.org
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: lis8215 at gmail dot com @ 2024-06-06  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #6 from Siarhei Volkau <lis8215 at gmail dot com> ---
Well, it is work mostly well.
However, it still has issues, addressed in my patch:
 1) Doesn't work for -Os : highly likely costing issue.
 2) Breaks condmoves, as mine does. I have no idea how to avoid that though.
 3) Overlaps preferable ANDI+BEQ/BNE cases: (as it don't break condmoves)

I think it will be okay whether fixed 1 and 3.

PS: tested by applying the patch on GCC 11, will try with upstream this
weekend.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (5 preceding siblings ...)
  2024-06-06  9:45 ` lis8215 at gmail dot com
@ 2024-06-06 22:16 ` syq at gcc dot gnu.org
  2024-06-07  8:25 ` lis8215 at gmail dot com
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-06 22:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #7 from YunQiang Su <syq at gcc dot gnu.org> ---
Ohh, I need add "&&" before "!reload_completed".

It seems work with -Os.
    can you give me you test code?

I cannot figure out a non-workable condmove C code for it.


With the constant less than 0xffff, ANDI+BEQ/BNE do be generated with -Os
but not for -O2.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (6 preceding siblings ...)
  2024-06-06 22:16 ` syq at gcc dot gnu.org
@ 2024-06-07  8:25 ` lis8215 at gmail dot com
  2024-06-13  4:50 ` syq at gcc dot gnu.org
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: lis8215 at gmail dot com @ 2024-06-07  8:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #8 from Siarhei Volkau <lis8215 at gmail dot com> ---
Created attachment 58377
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58377&action=edit
condmove testcase

Tested with current GCC master branch:

- Work with -Os confirmed.
- Condmove issue present in GCC 11 but not current master. Even for GCC 11 it
is very rare case, although found one relatively simple to reproduce: it is
excerpt from Python 3.8.x, reduced as much as I can.
Compilation flags tested: {-O2|-Os} -mips32 -DNDEBUG -mbranch-cost={1|10}

So, my opinion, the patch you propose is perfectly fine.
Condmove issue seems not relevant anymore.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (7 preceding siblings ...)
  2024-06-07  8:25 ` lis8215 at gmail dot com
@ 2024-06-13  4:50 ` syq at gcc dot gnu.org
  2024-06-15  3:22 ` syq at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-13  4:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #9 from YunQiang Su <syq at gcc dot gnu.org> ---
I see about condmove: it is broken since gcc14.

int
f32(int a)
{
  int p = (a & (1<<16));
  if (p)
    return 100;
  else
    return 1000;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (8 preceding siblings ...)
  2024-06-13  4:50 ` syq at gcc dot gnu.org
@ 2024-06-15  3:22 ` syq at gcc dot gnu.org
  2024-06-15  3:24 ` syq at gcc dot gnu.org
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15  3:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #10 from YunQiang Su <syq at gcc dot gnu.org> ---
I have some performance test.

sll+bgez is some slower than lui+and+beqz.
On Loongson 3A4000, it is about 10%.

So this "optimization" makes sense only for -Os.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (9 preceding siblings ...)
  2024-06-15  3:22 ` syq at gcc dot gnu.org
@ 2024-06-15  3:24 ` syq at gcc dot gnu.org
  2024-06-15  5:24 ` lis8215 at gmail dot com
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15  3:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

YunQiang Su <syq at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #11 from YunQiang Su <syq at gcc dot gnu.org> ---
For -Os, let's track it with this one
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115473

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (10 preceding siblings ...)
  2024-06-15  3:24 ` syq at gcc dot gnu.org
@ 2024-06-15  5:24 ` lis8215 at gmail dot com
  2024-06-15  6:47 ` syq at gcc dot gnu.org
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: lis8215 at gmail dot com @ 2024-06-15  5:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #12 from Siarhei Volkau <lis8215 at gmail dot com> ---
Highly likely it's because of data dependency, and not direct cost of shift
operations on LoongArch, although can't find information to prove that.
So, I guess it still might get performance benefit in cases where scheduler can
put some instruction(s) between SLL and BGEZ.

Since you have access to hardware you can  measure performace of two variants:
1) SLL+BGEZ
2) SLL+NOT+BGEZ
if their performance is equal then I'm correct and scheduling automaton for
GS464 seems have to be fixed.

From my side I can confirm that SLL+BGEZ is faster than LUI+AND+BEQ on Ingenic
XBurst 1 cores.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (11 preceding siblings ...)
  2024-06-15  5:24 ` lis8215 at gmail dot com
@ 2024-06-15  6:47 ` syq at gcc dot gnu.org
  2024-06-15  7:18 ` syq at gcc dot gnu.org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15  6:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #13 from YunQiang Su <syq at gcc dot gnu.org> ---
I try to insert 
        li      $3, 500
        li      $5, 500
between SLL/BGEZ  and LUI+AND/BNE.

The later is still some faster on Loongson 3A4000.

I notice something like this in 74K's software manual:

The 74K core’s ALU is pipelined. Some ALU instructions complete the operation
and bypass the results in this cycle. These instructions are referred to as
single-cycle ops and they include all logical instructions (AND, ANDI, OR, ORI,
XOR, XORI, LUI), some shift instructions (SLL sa<=8, SRL 31<=sa<=25), and some
arithmetic instructions (ADD rt=0, ADDU rt=0, SLT, SLTI, SLTU, SLTIU, SEH, SEB,
ZEH, ZEB). In addition, add instructions (ADD, ADDU, ADDI, ADDIU) complete the
operation and bypass results to the ALU pipe in this cycle.

I guess it means that if sa>8, SLL may be some slow.
On Loongson 3A4000, the value seems to be 20/21. It may means that we should be
care about for 64bit.

Can you have a test on XBurst 1?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (12 preceding siblings ...)
  2024-06-15  6:47 ` syq at gcc dot gnu.org
@ 2024-06-15  7:18 ` syq at gcc dot gnu.org
  2024-06-15  8:13 ` lis8215 at gmail dot com
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15  7:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #14 from YunQiang Su <syq at gcc dot gnu.org> ---
And it seems that the performance of SLL is related with the operand.

Just iterate from 0 to 1e9:

```
0000000000000b00 <f32>:
 b00:   000223c0        sll     a0,v0,0xf     #### <-- the code is something
wrong
                                              #### in normal code, we should
access
                                              #### v0 here.  v0 will be 100 or
1000.
 b04:   04810003        bgez    a0,b14 <f32+0x14>
 b08:   00000000        nop
 b0c:   03e00008        jr      ra
 b10:   240203e8        li      v0,1000
 b14:   03e00008        jr      ra
 b18:   24020064        li      v0,100
 b1c:   00000000        nop
```

is slower than

```
0000000000000b00 <f32>:
 b00:   000223c0        sll     a0,a0,0xf
 b04:   04810003        bgez    a0,b14 <f32+0x14>
 b08:   00000000        nop
 b0c:   03e00008        jr      ra
 b10:   240203e8        li      v0,1000
 b14:   03e00008        jr      ra
 b18:   24020064        li      v0,100
 b1c:   00000000        nop
```


I have no idea how to make a trade off here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (13 preceding siblings ...)
  2024-06-15  7:18 ` syq at gcc dot gnu.org
@ 2024-06-15  8:13 ` lis8215 at gmail dot com
  2024-06-15  8:35 ` lis8215 at gmail dot com
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: lis8215 at gmail dot com @ 2024-06-15  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #15 from Siarhei Volkau <lis8215 at gmail dot com> ---
Created attachment 58437
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58437&action=edit
application to test performance of shift

Here is the test application (MIPS32 specific) I wrote.

It allows to detect execution cycles and extra pipeline stalls for SLL if they
take place.

for XBurst 1 (jz4725b) result is the following:

`SLL to use latency test` execution median: 168417 ns, min: 168416 ns
`SLL to use latency test with nop` execution median: 196250 ns, min: 196166 ns

`SLL to branch latency test` execution median: 196250 ns, min: 196166 ns
`SLL to branch latency test with nop` execution median: 224000 ns, min: 224000
ns

`SLL by 7 to use latency test` execution median: 168417 ns, min: 168416 ns
`SLL by 15 to use latency test` execution median: 168417 ns, min: 168416 ns
`SLL by 23 to use latency test` execution median: 168417 ns, min: 168416 ns
`SLL by 31 to use latency test` execution median: 168417 ns, min: 168416 ns

`LUI>AND>BEQZ reference test` execution median: 196250 ns, min: 196166 ns
`SLL>BGEZ reference test` execution median: 168417 ns, min: 168416 ns



and what does it mean:
`SLL to use latency test` 168417 ns and `.. with nop` 196250 ns
means that there's no extra stall cycles between SLL and further use by ALU
operation.

`SLL to branch latency test` and `.. with nop` result
means that there's no extra stall cycles between SLL and further use by branch
operations.

`SLL by N` results means that SLL execution time doesn't depend on shift
amount.

and finally, the reference test results showcases that SLL>BGEZ approach is
faster than LUI>AND>BEQZ.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (14 preceding siblings ...)
  2024-06-15  8:13 ` lis8215 at gmail dot com
@ 2024-06-15  8:35 ` lis8215 at gmail dot com
  2024-06-18  8:06 ` syq at gcc dot gnu.org
  2024-06-19 14:18 ` syq at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: lis8215 at gmail dot com @ 2024-06-15  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #16 from Siarhei Volkau <lis8215 at gmail dot com> ---
Might it be that LoongArch have register reuse dependency?

I observed similar behavior on XBurst with load/store/reuse pattern:

e.g. this code
LW $v0, 0($t1)    # Xburst load latency is 4 but it has bypass 
SW $v0, 0($t2)    # to subsequent store operation, thus no stall here
ADD $v0, $t1, $t2 # but it stalls here, because of register reuse
                  # until LW op is not completed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (15 preceding siblings ...)
  2024-06-15  8:35 ` lis8215 at gmail dot com
@ 2024-06-18  8:06 ` syq at gcc dot gnu.org
  2024-06-19 14:18 ` syq at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-18  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #17 from YunQiang Su <syq at gcc dot gnu.org> ---
I send the patch here.
So we may need some more test.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/111376] missed optimization of one bit test on MIPS32r1
  2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
                   ` (16 preceding siblings ...)
  2024-06-18  8:06 ` syq at gcc dot gnu.org
@ 2024-06-19 14:18 ` syq at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-19 14:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

--- Comment #18 from YunQiang Su <syq at gcc dot gnu.org> ---
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654956.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-06-19 14:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 18:24 [Bug rtl-optimization/111376] New: missed optimization of one bit test on MIPS32r1 lis8215 at gmail dot com
2024-06-04  6:55 ` [Bug target/111376] " syq at gcc dot gnu.org
2024-06-04  7:00 ` syq at gcc dot gnu.org
2024-06-04  7:06 ` lis8215 at gmail dot com
2024-06-05 11:02 ` syq at gcc dot gnu.org
2024-06-05 16:56 ` syq at gcc dot gnu.org
2024-06-06  9:45 ` lis8215 at gmail dot com
2024-06-06 22:16 ` syq at gcc dot gnu.org
2024-06-07  8:25 ` lis8215 at gmail dot com
2024-06-13  4:50 ` syq at gcc dot gnu.org
2024-06-15  3:22 ` syq at gcc dot gnu.org
2024-06-15  3:24 ` syq at gcc dot gnu.org
2024-06-15  5:24 ` lis8215 at gmail dot com
2024-06-15  6:47 ` syq at gcc dot gnu.org
2024-06-15  7:18 ` syq at gcc dot gnu.org
2024-06-15  8:13 ` lis8215 at gmail dot com
2024-06-15  8:35 ` lis8215 at gmail dot com
2024-06-18  8:06 ` syq at gcc dot gnu.org
2024-06-19 14:18 ` syq at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).