[Bug c/105181] New: [optimization] gcc generate worse code than clang base on neon

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/105181] New: [optimization] gcc generate worse code than clang base on neon
@ 2022-04-06 15:25 zhongyunde at huawei dot com
  2022-04-06 16:14 ` [Bug target/105181] Store and load with updating the pointer is not used as often as it should be on aarch64 pinskia at gcc dot gnu.org
  2022-04-07  7:05 ` rguenth at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: zhongyunde at huawei dot com @ 2022-04-06 15:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105181

            Bug ID: 105181
           Summary: [optimization] gcc generate worse code than clang base
                    on neon
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zhongyunde at huawei dot com
  Target Milestone: ---

test case:
void loop(int N, double *a, double *b) {
  // #pragma clang loop vectorize_width(4, scalable)
  for (int i = 0; i < N; i++) {
    a[i] = b[i] + 1.0;
  }
}

gcc's kernel loop body:
.L4:
        ldr     q0, [x2, x3]
        fadd    v0.2d, v0.2d, v1.2d
        str     q0, [x1, x3]
        add     x3, x3, 16
        cmp     x3, x0
        bne     .L4

llvm's kernel loop body:
.LBB0_9:                                // =>This Inner Loop Header: Depth=1
        ldr     q1, [x12], #16
        subs    x10, x10, #2
        fadd    v1.2d, v1.2d, v0.2d
        str     q1, [x11], #16
        b.ne    .LBB0_9

see detail in https://godbolt.org/z/54nssME4f

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/105181] Store and load with updating the pointer is not used as often as it should be on aarch64
  2022-04-06 15:25 [Bug c/105181] New: [optimization] gcc generate worse code than clang base on neon zhongyunde at huawei dot com
@ 2022-04-06 16:14 ` pinskia at gcc dot gnu.org
  2022-04-07  7:05 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-04-06 16:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105181

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
            Summary|[optimization] gcc generate |Store and load with
                   |worse code than clang base  |updating the pointer is not
                   |on advanced simd            |used as often as it should
                   |                            |be on aarch64

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is just an induction variable selection issue. Gcc decides using one is
better than using 3 where the store and loads do the update too.
This is most likely a cost model issue as ivopts does have some support for
these kinds of instructions.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/105181] Store and load with updating the pointer is not used as often as it should be on aarch64
  2022-04-06 15:25 [Bug c/105181] New: [optimization] gcc generate worse code than clang base on neon zhongyunde at huawei dot com
  2022-04-06 16:14 ` [Bug target/105181] Store and load with updating the pointer is not used as often as it should be on aarch64 pinskia at gcc dot gnu.org
@ 2022-04-07  7:05 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-07  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105181

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |12.0

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So is it really a missed-optimization for the particular testcase?  Possibly
the decrement and test loop branch choice is?  If I correctly match subs + b.ne
as that.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-04-07  7:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 15:25 [Bug c/105181] New: [optimization] gcc generate worse code than clang base on neon zhongyunde at huawei dot com
2022-04-06 16:14 ` [Bug target/105181] Store and load with updating the pointer is not used as often as it should be on aarch64 pinskia at gcc dot gnu.org
2022-04-07  7:05 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).