public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113827] New: MrBayes benchmark redundant load
@ 2024-02-08 11:38 rdapp at gcc dot gnu.org
  2024-02-08 14:29 ` [Bug target/113827] MrBayes benchmark redundant load on riscv rdapp at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: rdapp at gcc dot gnu.org @ 2024-02-08 11:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827

            Bug ID: 113827
           Summary: MrBayes benchmark redundant load
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rdapp at gcc dot gnu.org
                CC: juzhe.zhong at rivai dot ai, law at gcc dot gnu.org,
                    pan2.li at intel dot com
            Blocks: 79704
  Target Milestone: ---
            Target: riscv

A hot block in the MrBayes benchmark (as used in the Phoronix testsuite) has a
redundant scalar load when vectorized.

Minimal example, compiled with -march=rv64gcv -O3

int foo (float **a, float f, int n)
{
  for (int i = 0; i < n; i++)
    {
      a[i][0] /= f;
      a[i][1] /= f;
      a[i][2] /= f;
      a[i][3] /= f;
      a[i] += 4;
    }
}

GCC:
.L3:
        ld      a5,0(a0)
        vle32.v v1,0(a5)
        vfmul.vv        v1,v1,v2
        vse32.v v1,0(a5)
        addi    a5,a5,16
        sd      a5,0(a0)
        addi    a0,a0,8
        bne     a0,a4,.L3

The value of a5 doesn't change after the store to 0(a0).

LLVM:
.L3
        vle32.v   v8,(a1)
        addi      a3,a1,16
        sd        a3,0(a2)
        vfdiv.vf  v8,v8,fa5
        addi      a2,a2,8
        vse32.v   v8,(a1)
        bne       a2,a0,.L3


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704
[Bug 79704] [meta-bug] Phoronix Test Suite compiler performance issues

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/113827] MrBayes benchmark redundant load on riscv
  2024-02-08 11:38 [Bug target/113827] New: MrBayes benchmark redundant load rdapp at gcc dot gnu.org
@ 2024-02-08 14:29 ` rdapp at gcc dot gnu.org
  2024-02-09  4:28 ` [Bug target/113827] MrBayes benchmark redundant load pinskia at gcc dot gnu.org
  2024-02-09  5:10 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rdapp at gcc dot gnu.org @ 2024-02-08 14:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827

--- Comment #1 from Robin Dapp <rdapp at gcc dot gnu.org> ---
x86 (-march=native -O3 on an i7 12th gen) looks pretty similar:

.L3:
        movq    (%rdi), %rax
        vmovups (%rax), %xmm1
        vdivps  %xmm0, %xmm1, %xmm1
        vmovups %xmm1, (%rax)
        addq    $16, %rax
        movq    %rax, (%rdi)
        addq    $8, %rdi
        cmpq    %rdi, %rdx
        jne     .L3

So probably not target specific.  Costing?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/113827] MrBayes benchmark redundant load
  2024-02-08 11:38 [Bug target/113827] New: MrBayes benchmark redundant load rdapp at gcc dot gnu.org
  2024-02-08 14:29 ` [Bug target/113827] MrBayes benchmark redundant load on riscv rdapp at gcc dot gnu.org
@ 2024-02-09  4:28 ` pinskia at gcc dot gnu.org
  2024-02-09  5:10 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-09  4:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2024-02-09

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>a redundant scalar load 

I don't see any redundant load in that loop.


```
L3:
        movq    (%rdi), %rax   ;; load a[i] from rdi
        vmovups (%rax), %xmm1  ;; load rax[0-3] into vector
        vdivps  %xmm0, %xmm1, %xmm1 ;; divide
        vmovups %xmm1, (%rax)  ;; store result back into rax[0-3]
        addq    $16, %rax   ;; add 4*4 to rax
        movq    %rax, (%rdi) ;; store rax back into rdi
        addq    $8, %rdi     ;; add 8 to rdi
        cmpq    %rdi, %rdx
        jne     .L3          ;; compare and loop back
```

That is a[i] is different between each iterations.

Maybe you reduced this code too much?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/113827] MrBayes benchmark redundant load
  2024-02-08 11:38 [Bug target/113827] New: MrBayes benchmark redundant load rdapp at gcc dot gnu.org
  2024-02-08 14:29 ` [Bug target/113827] MrBayes benchmark redundant load on riscv rdapp at gcc dot gnu.org
  2024-02-09  4:28 ` [Bug target/113827] MrBayes benchmark redundant load pinskia at gcc dot gnu.org
@ 2024-02-09  5:10 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-09  5:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Robin Dapp from comment #0)
> A hot block in the MrBayes benchmark (as used in the Phoronix testsuite) has
> a redundant scalar load when vectorized.
> 
> Minimal example, compiled with -march=rv64gcv -O3
> 
> int foo (float **a, float f, int n)
> {
>   for (int i = 0; i < n; i++)
>     {
>       a[i][0] /= f;
>       a[i][1] /= f;
>       a[i][2] /= f;
>       a[i][3] /= f;
>       a[i] += 4;
>     }
> }

LLVM for aarch64 with the above testcase:
``
.L3:
        ldr     x2, [x0]
        mov     x1, x2
        ldr     q31, [x2]
        fdiv    v31.4s, v31.4s, v0.4s
        str     q31, [x1], 16
        str     x1, [x0], 8 ;;;; HERE
        cmp     x3, x0
        bne     .L3
```

There is a store of x1 there.
I really think you messed up reducing the testcase.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-09  5:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-08 11:38 [Bug target/113827] New: MrBayes benchmark redundant load rdapp at gcc dot gnu.org
2024-02-08 14:29 ` [Bug target/113827] MrBayes benchmark redundant load on riscv rdapp at gcc dot gnu.org
2024-02-09  4:28 ` [Bug target/113827] MrBayes benchmark redundant load pinskia at gcc dot gnu.org
2024-02-09  5:10 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).