* [Bug target/115500] RISC-V: Performance regression on 1bit test
2024-06-15 3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
@ 2024-06-15 3:42 ` syq at gcc dot gnu.org
2024-06-15 3:43 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15 3:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500
--- Comment #1 from YunQiang Su <syq at gcc dot gnu.org> ---
Talks about MIPS here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115500] RISC-V: Performance regression on 1bit test
2024-06-15 3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
2024-06-15 3:42 ` [Bug target/115500] " syq at gcc dot gnu.org
@ 2024-06-15 3:43 ` pinskia at gcc dot gnu.org
2024-06-15 3:46 ` syq at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-15 3:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The big question is non zbs riscv arch matter any more?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115500] RISC-V: Performance regression on 1bit test
2024-06-15 3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
2024-06-15 3:42 ` [Bug target/115500] " syq at gcc dot gnu.org
2024-06-15 3:43 ` pinskia at gcc dot gnu.org
@ 2024-06-15 3:46 ` syq at gcc dot gnu.org
2024-06-16 21:17 ` law at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-15 3:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500
--- Comment #3 from YunQiang Su <syq at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #2)
> The big question is non zbs riscv arch matter any more?
I have no idea. This is the Debian's porterbox, so I guess it meets the
requirement of Debian's RV64 port baseline.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115500] RISC-V: Performance regression on 1bit test
2024-06-15 3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
` (2 preceding siblings ...)
2024-06-15 3:46 ` syq at gcc dot gnu.org
@ 2024-06-16 21:17 ` law at gcc dot gnu.org
2024-06-16 23:02 ` syq at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2024-06-16 21:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |14.1.1
Known to fail| |13.1.1
Status|UNCONFIRMED |WAITING
CC| |law at gcc dot gnu.org
Ever confirmed|0 |1
Last reconfirmed| |2024-06-16
--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
On the gcc-13, gcc-14 and the trunk I get this with -O2 on rv64gc:
slli a5,a0,44
blt a5,zero,.L3
So ISTM that we must be doing something different. YunQiang, please make sure
to include the optimization options used when reporting a bug.
WRT Andrew's question. Sadly the most interesting box available in the wild
for builds and such is that milk-v pioneer system. Which sadly doesn't have
the B extension. The 64 cores are what make that milk-v pioneer interesting
:-0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115500] RISC-V: Performance regression on 1bit test
2024-06-15 3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
` (3 preceding siblings ...)
2024-06-16 21:17 ` law at gcc dot gnu.org
@ 2024-06-16 23:02 ` syq at gcc dot gnu.org
2024-06-17 3:23 ` law at gcc dot gnu.org
2024-06-17 13:59 ` law at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: syq at gcc dot gnu.org @ 2024-06-16 23:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500
--- Comment #5 from YunQiang Su <syq at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #4)
> On the gcc-13, gcc-14 and the trunk I get this with -O2 on rv64gc:
>
> slli a5,a0,44
> blt a5,zero,.L3
>
>
> So ISTM that we must be doing something different. YunQiang, please make
> sure to include the optimization options used when reporting a bug.
>
Thanks. I used -O2, and yes, slli/bltz is slower than srli/andi/bnez.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115500] RISC-V: Performance regression on 1bit test
2024-06-15 3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
` (4 preceding siblings ...)
2024-06-16 23:02 ` syq at gcc dot gnu.org
@ 2024-06-17 3:23 ` law at gcc dot gnu.org
2024-06-17 13:59 ` law at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2024-06-17 3:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500
--- Comment #6 from Jeffrey A. Law <law at gcc dot gnu.org> ---
That's going to be a uarch issue if the slli/bltz is slower.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115500] RISC-V: Performance regression on 1bit test
2024-06-15 3:41 [Bug target/115500] New: RISC-V: Performance regression on 1bit test syq at gcc dot gnu.org
` (5 preceding siblings ...)
2024-06-17 3:23 ` law at gcc dot gnu.org
@ 2024-06-17 13:59 ` law at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: law at gcc dot gnu.org @ 2024-06-17 13:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115500
--- Comment #7 from Jeffrey A. Law <law at gcc dot gnu.org> ---
And to be clearer, if you look at the two assembly snippets:
The problem is about
0: 814d srli a0,a0,0x13
2: 8905 andi a0,a0,1
4: e501 bnez a0,c <.L3>
vs
0: 02c51793 slli a5,a0,0x2c
4: 0007c563 bltz a5,e <.L3>
They're both using the same basic idioms (logical shifts and simple conditional
branch), one just has an extra andi. The second one has a smaller data
dependency critical path. So it's hard to see how the first would ever be
better.
More likely than not what's going on here is going to be something highly
specific to the micro-architecture implementation of whatever chip you tested.
So for example, some uarchs are particularly sensitive to code alignments.
That could effect the little loop or the function call.
To put this in perspective, I'm aware of a uarch that would show a double-digit
performance delta due to a 2 instruction, 6 byte sequence moving across a
particular boundary -- in a real world benchmark that executes nearly a
trillion instructions.
Point is you have to be *very* careful analyzing this stuff and sometimes
things can be very surprising.
So probably the next question is what did you use to test this and what do we
know about its uarch and can we correlate what is public about that uarch to
the behavior your seeing.
^ permalink raw reply [flat|nested] 8+ messages in thread