public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113827] New: MrBayes benchmark redundant load
@ 2024-02-08 11:38 rdapp at gcc dot gnu.org
2024-02-08 14:29 ` [Bug target/113827] MrBayes benchmark redundant load on riscv rdapp at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: rdapp at gcc dot gnu.org @ 2024-02-08 11:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827
Bug ID: 113827
Summary: MrBayes benchmark redundant load
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: rdapp at gcc dot gnu.org
CC: juzhe.zhong at rivai dot ai, law at gcc dot gnu.org,
pan2.li at intel dot com
Blocks: 79704
Target Milestone: ---
Target: riscv
A hot block in the MrBayes benchmark (as used in the Phoronix testsuite) has a
redundant scalar load when vectorized.
Minimal example, compiled with -march=rv64gcv -O3
int foo (float **a, float f, int n)
{
for (int i = 0; i < n; i++)
{
a[i][0] /= f;
a[i][1] /= f;
a[i][2] /= f;
a[i][3] /= f;
a[i] += 4;
}
}
GCC:
.L3:
ld a5,0(a0)
vle32.v v1,0(a5)
vfmul.vv v1,v1,v2
vse32.v v1,0(a5)
addi a5,a5,16
sd a5,0(a0)
addi a0,a0,8
bne a0,a4,.L3
The value of a5 doesn't change after the store to 0(a0).
LLVM:
.L3
vle32.v v8,(a1)
addi a3,a1,16
sd a3,0(a2)
vfdiv.vf v8,v8,fa5
addi a2,a2,8
vse32.v v8,(a1)
bne a2,a0,.L3
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704
[Bug 79704] [meta-bug] Phoronix Test Suite compiler performance issues
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113827] MrBayes benchmark redundant load on riscv
2024-02-08 11:38 [Bug target/113827] New: MrBayes benchmark redundant load rdapp at gcc dot gnu.org
@ 2024-02-08 14:29 ` rdapp at gcc dot gnu.org
2024-02-09 4:28 ` [Bug target/113827] MrBayes benchmark redundant load pinskia at gcc dot gnu.org
2024-02-09 5:10 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rdapp at gcc dot gnu.org @ 2024-02-08 14:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827
--- Comment #1 from Robin Dapp <rdapp at gcc dot gnu.org> ---
x86 (-march=native -O3 on an i7 12th gen) looks pretty similar:
.L3:
movq (%rdi), %rax
vmovups (%rax), %xmm1
vdivps %xmm0, %xmm1, %xmm1
vmovups %xmm1, (%rax)
addq $16, %rax
movq %rax, (%rdi)
addq $8, %rdi
cmpq %rdi, %rdx
jne .L3
So probably not target specific. Costing?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113827] MrBayes benchmark redundant load
2024-02-08 11:38 [Bug target/113827] New: MrBayes benchmark redundant load rdapp at gcc dot gnu.org
2024-02-08 14:29 ` [Bug target/113827] MrBayes benchmark redundant load on riscv rdapp at gcc dot gnu.org
@ 2024-02-09 4:28 ` pinskia at gcc dot gnu.org
2024-02-09 5:10 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-09 4:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2024-02-09
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>a redundant scalar load
I don't see any redundant load in that loop.
```
L3:
movq (%rdi), %rax ;; load a[i] from rdi
vmovups (%rax), %xmm1 ;; load rax[0-3] into vector
vdivps %xmm0, %xmm1, %xmm1 ;; divide
vmovups %xmm1, (%rax) ;; store result back into rax[0-3]
addq $16, %rax ;; add 4*4 to rax
movq %rax, (%rdi) ;; store rax back into rdi
addq $8, %rdi ;; add 8 to rdi
cmpq %rdi, %rdx
jne .L3 ;; compare and loop back
```
That is a[i] is different between each iterations.
Maybe you reduced this code too much?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113827] MrBayes benchmark redundant load
2024-02-08 11:38 [Bug target/113827] New: MrBayes benchmark redundant load rdapp at gcc dot gnu.org
2024-02-08 14:29 ` [Bug target/113827] MrBayes benchmark redundant load on riscv rdapp at gcc dot gnu.org
2024-02-09 4:28 ` [Bug target/113827] MrBayes benchmark redundant load pinskia at gcc dot gnu.org
@ 2024-02-09 5:10 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-09 5:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Robin Dapp from comment #0)
> A hot block in the MrBayes benchmark (as used in the Phoronix testsuite) has
> a redundant scalar load when vectorized.
>
> Minimal example, compiled with -march=rv64gcv -O3
>
> int foo (float **a, float f, int n)
> {
> for (int i = 0; i < n; i++)
> {
> a[i][0] /= f;
> a[i][1] /= f;
> a[i][2] /= f;
> a[i][3] /= f;
> a[i] += 4;
> }
> }
LLVM for aarch64 with the above testcase:
``
.L3:
ldr x2, [x0]
mov x1, x2
ldr q31, [x2]
fdiv v31.4s, v31.4s, v0.4s
str q31, [x1], 16
str x1, [x0], 8 ;;;; HERE
cmp x3, x0
bne .L3
```
There is a store of x1 there.
I really think you messed up reducing the testcase.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-02-09 5:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-08 11:38 [Bug target/113827] New: MrBayes benchmark redundant load rdapp at gcc dot gnu.org
2024-02-08 14:29 ` [Bug target/113827] MrBayes benchmark redundant load on riscv rdapp at gcc dot gnu.org
2024-02-09 4:28 ` [Bug target/113827] MrBayes benchmark redundant load pinskia at gcc dot gnu.org
2024-02-09 5:10 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).