public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951.
@ 2023-05-29 15:14 d_vampile at 163 dot com
  2023-05-29 15:17 ` [Bug tree-optimization/110026] " jakub at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: d_vampile at 163 dot com @ 2023-05-29 15:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026

            Bug ID: 110026
           Summary: [Bug] 5% performance drop on important benchmark after
                    r260951.
           Product: gcc
           Version: 10.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: d_vampile at 163 dot com
  Target Milestone: ---

Created attachment 55184
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55184&action=edit
Open-source stream benchmark

After the patch is submitted on AArch64, the performance of copying subitems in
the stream benchmark decreases by 3%.

Alternatively, you can obtain it from
https://github.com/jeffhammond/stream/archive/master.zip.

Compiling & Running:
gcc -fopenmp -O -DSTREAM_ARRAY_SIZE=100000000 stream.c  -o stream
./stream

Before modification: (copy subitem)
ldr x2, [x3, x0, lsl #3]
str x2, [x4, x0, lsl #3]
add x0, x0, #0x1
cmp x1, x0
b.ne 400a00 <main._omp_fn.4+0x54>
ldr x19, [sp, #16]
ldp x29, x30, [sp], #32
ret

After the modification:
ldr d0, [x2, x0, lsl #3]
str d0, [x3, x0, lsl #3]
add x0, x0, #0x1
cmp x1, x0
b.ne 400a00 <main._omp_fn.4+0x54>
ldr x19, [sp, #16]
ldp x29, x30, [sp], #32
ret

It can be seen that the vector register (X0) is used before the modification,
and the common register (D0) is used after the modification.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/110026] [Bug] 5% performance drop on important benchmark after r260951.
  2023-05-29 15:14 [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
@ 2023-05-29 15:17 ` jakub at gcc dot gnu.org
  2023-05-30 14:37 ` [Bug target/110026] " d_vampile at 163 dot com
  2023-05-30 14:48 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-29 15:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, any benchmarking for speed with -O rather than -O2/-O3 is intentionally
missing various optimizations which can greatly improve performance.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/110026] [Bug] 5% performance drop on important benchmark after r260951.
  2023-05-29 15:14 [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
  2023-05-29 15:17 ` [Bug tree-optimization/110026] " jakub at gcc dot gnu.org
@ 2023-05-30 14:37 ` d_vampile at 163 dot com
  2023-05-30 14:48 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: d_vampile at 163 dot com @ 2023-05-30 14:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026

--- Comment #2 from d_vampile <d_vampile at 163 dot com> ---
(In reply to Jakub Jelinek from comment #1)
> Note, any benchmarking for speed with -O rather than -O2/-O3 is
> intentionally missing various optimizations which can greatly improve
> performance.

O0 does miss a lot of optimizations. However, for the problem I mentioned, the
GPRs used before and the FP registers after modification are used. When
vectorization is not applicable, the X0 register is faster than the D0
register. Is it appropriate to modify here?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/110026] [Bug] 5% performance drop on important benchmark after r260951.
  2023-05-29 15:14 [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
  2023-05-29 15:17 ` [Bug tree-optimization/110026] " jakub at gcc dot gnu.org
  2023-05-30 14:37 ` [Bug target/110026] " d_vampile at 163 dot com
@ 2023-05-30 14:48 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-30 14:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to d_vampile from comment #2)
> O0 does miss a lot of optimizations. However, for the problem I mentioned,
> the GPRs used before and the FP registers after modification are used. When
> vectorization is not applicable, the X0 register is faster than the D0
> register. Is it appropriate to modify here?


Well the generic_tunings has:
  { 4, /* load_int.  */
    4, /* store_int.  */
    4, /* load_fp.  */
    4, /* store_fp.  */
    4, /* load_pred.  */
    4 /* store_pred.  */
  }, /* memmov_cost.  */


Which says the load/store of fp has the same cost as ints (gprs) (this is the
same as a53's tuning).

If anything that should be changed ....

Of you should use -mcpu=* where appliable.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-30 14:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-29 15:14 [Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951 d_vampile at 163 dot com
2023-05-29 15:17 ` [Bug tree-optimization/110026] " jakub at gcc dot gnu.org
2023-05-30 14:37 ` [Bug target/110026] " d_vampile at 163 dot com
2023-05-30 14:48 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).