public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/114521] New: aarch64: wrong code with Neon ld1/st1x4 intrinsics gcc-11 and earlier
@ 2024-03-28 15:52 jswinney at amazon dot com
  2024-03-28 15:56 ` [Bug target/114521] " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: jswinney at amazon dot com @ 2024-03-28 15:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114521

            Bug ID: 114521
           Summary: aarch64: wrong code with Neon ld1/st1x4 intrinsics
                    gcc-11 and earlier
           Product: gcc
           Version: 11.4.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jswinney at amazon dot com
  Target Milestone: ---

Created attachment 57831
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57831&action=edit
patch to fix the broken test

Using a half-width 4-register load aarch64 Neon intrinsic results in incorrect
stack spill, or at least incorrect offsetting into the resulting stack spill.
This happens at any level of optimization, -O0..3.


```
#include <arm_neon.h>
#include <inttypes.h>
#include <stdio.h>

uint8x8_t global[4] = {0};

void test(const uint8_t* arr)
{
  const uint8x8x4_t parr = vld1_u8_x4(arr);

  global[0] = parr.val[0];
  global[1] = parr.val[1];
  global[2] = parr.val[2];
  global[3] = parr.val[3];
}

int main()
{
  const uint8_t arr[32] = {
    0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B,
    0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B,
    0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B,
    0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B, 0x0A, 0x0B,
  };

  for (int i = 0; i < 4; i++) {
    printf("%llx ", (uint64_t) global[i]);
  }
  printf("\n");

  test(arr);

  for (int i = 0; i < 4; i++) {
    printf("%llx ", (uint64_t) global[i]);
  }
  printf("\n");

  return 0;
}
```

From the compiled "test" function above, the compiler emits the correct
half-width load instruction followed by a full-width store:
```
test(unsigned char const*):
        ld1     {v0.8b - v3.8b}, [x0]
        sub     sp, sp, #64
...
        st1     {v0.16b - v3.16b}, [sp]
```

This issue is corrected by a change in gcc-12 in:
66f206b85395c273980e2b81a54dbddc4897e4a7

Additionally the test used to verify this code silently ignores the error. I
have attached a patch which fixes the test.


```
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-amazon-linux/11/lto-wrapper
Target: aarch64-amazon-linux
Configured with: ../configure --enable-bootstrap
--enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info
--with-bugurl=https://github.com/amazonlinux/amazon-linux-2022 --enable-shared
--enable-threads=posix --enable-checking=release --enable-multilib
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array
--with-isl=/builddir/build/BUILD/gcc-11.4.1-20230605/obj-aarch64-amazon-linux/isl-install
--enable-gnu-indirect-function --with-tune=neoverse-n1
--with-arch=armv8.2-a+crypto --build=aarch64-amazon-linux
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.1 20230605 (Red Hat 11.4.1-2) (GCC)
```

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-28 17:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 15:52 [Bug c/114521] New: aarch64: wrong code with Neon ld1/st1x4 intrinsics gcc-11 and earlier jswinney at amazon dot com
2024-03-28 15:56 ` [Bug target/114521] " pinskia at gcc dot gnu.org
2024-03-28 15:57 ` [Bug target/114521] [11 only] " pinskia at gcc dot gnu.org
2024-03-28 17:16 ` rsandifo at gcc dot gnu.org
2024-03-28 17:45 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).