public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95962] New: Inefficient code for simple arm_neon.h iota operation
@ 2020-06-29 12:49 rsandifo at gcc dot gnu.org
  2021-08-12  8:01 ` [Bug target/95962] " tnfchris at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-06-29 12:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95962

            Bug ID: 95962
           Summary: Inefficient code for simple arm_neon.h iota operation
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
            Blocks: 95958
  Target Milestone: ---
            Target: aarch64*-*-*

For:

#include <arm_neon.h>

int32x4_t
foo (void)
{
  int32_t array[] = { 0, 1, 2, 3 };
  return vld1q_s32 (array);
}

we produce:

foo:
.LFB4217:
        .cfi_startproc
        sub     sp, sp, #16
        .cfi_def_cfa_offset 16
        mov     x0, 2
        mov     x1, 4294967296
        movk    x0, 0x3, lsl 32
        stp     x1, x0, [sp]
        ldr     q0, [sp]
        add     sp, sp, 16
        .cfi_def_cfa_offset 0
        ret

In contrast, clang produces essentially perfect code:

        adrp    x8, .LCPI0_0
        ldr     q0, [x8, :lo12:.LCPI0_0]
        ret

I think the problem is a combination of two things:

- __builtin_aarch64_ld1v4si & co. are treated as general
  functions rather than pure functions, so in principle
  it could write to the given address.  This stops us
  promoting the array to a constant.

- The loads could be reduced to native gimple-level
  operations, at least on little-endian targets.

IMO this a bug rather than an enhancement.  Intrinsics only
exist to optimise code, and what GCC is doing falls short
of what users should reasonably expect.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958
[Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-03 17:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 12:49 [Bug target/95962] New: Inefficient code for simple arm_neon.h iota operation rsandifo at gcc dot gnu.org
2021-08-12  8:01 ` [Bug target/95962] " tnfchris at gcc dot gnu.org
2021-08-20 11:52 ` rsandifo at gcc dot gnu.org
2021-11-15 15:09 ` tnfchris at gcc dot gnu.org
2021-12-03 17:05 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).