public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/115500] New: RISC-V: Performance regression on 1bit test
@ 2024-06-15  3:41 syq at gcc dot gnu.org
  2024-06-15  3:42 ` [Bug target/115500] " syq at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15  3:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500

            Bug ID: 115500
           Summary: RISC-V: Performance regression on 1bit test
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: syq at gcc dot gnu.org
  Target Milestone: ---

```x.c
#include <stdio.h>

int f32(int);

int main() {
        for(int i=0; i<1e9; i++) {
                f32(i);
        }
}
```

```f32.c
int f32(int x) {
        if (x & 0x80000)
                return 100;
        return 1000;
}
```

I test it on 
isa             : rv64imafdc_zicntr_zicsr_zifencei_zihpm
mmu             : sv39
uarch           : sifive,bullet0
mvendorid       : 0x489
marchid         : 0x8000000000000007
mimpid          : 0x20181004
hart isa        : rv64imafdc_zicntr_zicsr_zifencei_zihpm

With GCC12, the time cost is
   real    0m7.140s
   user    0m7.134s
   sys     0m0.005s

With GCC13, the time cost is
   real    0m9.298s
   user    0m9.291s
   sys     0m0.005s


The problem is about
   0:   814d                    srli    a0,a0,0x13
   2:   8905                    andi    a0,a0,1
   4:   e501                    bnez    a0,c <.L3>
vs 
   0:   02c51793                slli    a5,a0,0x2c
   4:   0007c563                bltz    a5,e <.L3>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/115500] RISC-V: Performance regression on 1bit test
  2024-06-15  3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
@ 2024-06-15  3:42 ` syq at gcc dot gnu.org
  2024-06-15  3:43 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15  3:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500

--- Comment #1 from YunQiang Su <syq at gcc dot gnu.org> ---
Talks about MIPS here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/115500] RISC-V: Performance regression on 1bit test
  2024-06-15  3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
  2024-06-15  3:42 ` [Bug target/115500] " syq at gcc dot gnu.org
@ 2024-06-15  3:43 ` pinskia at gcc dot gnu.org
  2024-06-15  3:46 ` syq at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-15  3:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The big question is non zbs riscv arch matter any more?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/115500] RISC-V: Performance regression on 1bit test
  2024-06-15  3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
  2024-06-15  3:42 ` [Bug target/115500] " syq at gcc dot gnu.org
  2024-06-15  3:43 ` pinskia at gcc dot gnu.org
@ 2024-06-15  3:46 ` syq at gcc dot gnu.org
  2024-06-16 21:17 ` law at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15  3:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500

--- Comment #3 from YunQiang Su <syq at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> The big question is non zbs riscv arch matter any more?

I have no idea. This is the Debian's porterbox, so I guess it meets the
requirement of Debian's RV64 port baseline.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/115500] RISC-V: Performance regression on 1bit test
  2024-06-15  3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-06-15  3:46 ` syq at gcc dot gnu.org
@ 2024-06-16 21:17 ` law at gcc dot gnu.org
  2024-06-16 23:02 ` syq at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2024-06-16 21:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |14.1.1
      Known to fail|                            |13.1.1
             Status|UNCONFIRMED                 |WAITING
                 CC|                            |law at gcc dot gnu.org
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-06-16

--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
On the gcc-13, gcc-14 and the trunk I get this with -O2 on rv64gc:

        slli    a5,a0,44
        blt     a5,zero,.L3


So ISTM that we must be doing something different.  YunQiang, please make sure
to include the optimization options used when reporting a bug.

WRT Andrew's question.  Sadly the most interesting box available in the wild
for builds and such is that milk-v pioneer system.  Which sadly doesn't have
the B extension.  The 64 cores are what make that milk-v pioneer interesting
:-0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/115500] RISC-V: Performance regression on 1bit test
  2024-06-15  3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-06-16 21:17 ` law at gcc dot gnu.org
@ 2024-06-16 23:02 ` syq at gcc dot gnu.org
  2024-06-17  3:23 ` law at gcc dot gnu.org
  2024-06-17 13:59 ` law at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-16 23:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500

--- Comment #5 from YunQiang Su <syq at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #4)
> On the gcc-13, gcc-14 and the trunk I get this with -O2 on rv64gc:
> 
>         slli    a5,a0,44
>         blt     a5,zero,.L3
> 
> 
> So ISTM that we must be doing something different.  YunQiang, please make
> sure to include the optimization options used when reporting a bug.
> 

Thanks. I used -O2, and yes, slli/bltz is slower than srli/andi/bnez.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/115500] RISC-V: Performance regression on 1bit test
  2024-06-15  3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-06-16 23:02 ` syq at gcc dot gnu.org
@ 2024-06-17  3:23 ` law at gcc dot gnu.org
  2024-06-17 13:59 ` law at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2024-06-17  3:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500

--- Comment #6 from Jeffrey A. Law <law at gcc dot gnu.org> ---
That's going to be a uarch issue if the slli/bltz is slower.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/115500] RISC-V: Performance regression on 1bit test
  2024-06-15  3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-06-17  3:23 ` law at gcc dot gnu.org
@ 2024-06-17 13:59 ` law at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2024-06-17 13:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500

--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
And to be clearer, if you look at the two assembly snippets:

The problem is about
   0:   814d                    srli    a0,a0,0x13
   2:   8905                    andi    a0,a0,1
   4:   e501                    bnez    a0,c <.L3>
vs 
   0:   02c51793                slli    a5,a0,0x2c
   4:   0007c563                bltz    a5,e <.L3>



They're both using the same basic idioms (logical shifts and simple conditional
branch), one just has an extra andi.   The second one has a smaller data
dependency critical path.  So it's hard to see how the first would ever be
better.

More likely than not what's going on here is going to be something highly
specific to the micro-architecture implementation of whatever chip you tested. 
So for example, some uarchs are particularly sensitive to code alignments. 
That could effect the little loop or the function call.

To put this in perspective, I'm aware of a uarch that would show a double-digit
performance delta due to a 2 instruction, 6 byte sequence moving across a
particular boundary -- in a real world benchmark that executes nearly a
trillion instructions.

Point is you have to be *very* careful analyzing this stuff and sometimes
things can be very surprising.

So probably the next question is what did you use to test this and what do we
know about its uarch and can we correlate what is public about that uarch to
the behavior your seeing.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-06-17 13:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-15  3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
2024-06-15  3:42 ` [Bug target/115500] " syq at gcc dot gnu.org
2024-06-15  3:43 ` pinskia at gcc dot gnu.org
2024-06-15  3:46 ` syq at gcc dot gnu.org
2024-06-16 21:17 ` law at gcc dot gnu.org
2024-06-16 23:02 ` syq at gcc dot gnu.org
2024-06-17  3:23 ` law at gcc dot gnu.org
2024-06-17 13:59 ` law at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).