public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
@ 2023-12-14 21:39 juki at gcc dot mail.kapsi.fi
  2023-12-14 21:44 ` [Bug c++/113025] " pinskia at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: juki at gcc dot mail.kapsi.fi @ 2023-12-14 21:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

            Bug ID: 113025
           Summary: Pointer is sometimes assumed to be 16-byte aligned
                    even when there is no such guarantee
           Product: gcc
           Version: 8.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juki at gcc dot mail.kapsi.fi
  Target Milestone: ---

The following code is miscompiled in some cases when optimization levels is
-O3:

// from https://github.com/intel/ARM_NEON_2_x86_SSE/blob/master/NEON_2_SSE.h

#define LOAD_SI128(ptr) \
        ( ((uintptr_t)(ptr) & 15) == 0 ) ? _mm_load_si128((__m128i*)(ptr)) :
_mm_loadu_si128((__m128i*)(ptr))

This macro is used by several different operations in the linked header file
that simulate ARM NEON intrinsics that load 128-bit long integer vector from
unaligned memory addresses.

With low optimization levels and most of the time anyway, function works as
expected:
- If pointer to the memory location is 16-byte aligned and compiler knows this,
it generates opcode "movdqa" matching __mm_load_si128() intrinsic.
- If pointer has unknown or non-16-byte alignment, opcode "movdqu" matching
_mm_loadu_si128() intrinsic is generated and actual alignment test is optimized
away as unnecessary.

However, in some cases when macro is used to load 1 or 2 byte aligned data,
16-byte aligned opcode is generated instead and General Protection Fault
happens due to invalid alignment. Function where this happens just gets a raw
pointer, for example, const uint8_t *as an input and compiler should have no
reason to assume that it would be 16-byte aligned all the time.

Issue was first detected with gcc 8.4.0 but it was also verified to happen with
gcc 9.4.0 and gcc 12.2.0 in different places depending on the version.

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
8.4.0-1ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-8
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-libmpx
--enable-plugin --enable-default-pie --with-system-zlib
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none --without-cuda-driver
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04) 

I was unable to generate a simple example when this happens and complex
examples are from proprietary code. Hopefully this still helps someone to
understand the issue better.

What I do know at the moment:
- Bug happens at least with C++ frontend when compiling for x86_64

- Bug happens with and without LTO

- Bug has only happened with -O3, never with -O2 or -O1

- Bug seems to only happen in very specific cases but it is common enough to
crop up in several very different algorithms that use this same operation
above. 

- Minor changes, like changing inline keyword for a related function or
changing -DNDEBUG from commandline to another setting, has a potential to "fix"
the issue momentarily for that particular location.

- Only the first access in the generated function with offset 0 to that pointer
is wrong. Later accesses with some variable offset added to that pointer again
use unaligned access like they should.

- 16-byte aligned access was assumed even when the parent function was looping
through different offsets with steps of 1 and calling function with miscompiled
code in the same translation unit. So context has given no reason to assume
16-byte alignment for the pointer.

- All tested compilers from 8.4.0 to 12.2.0 were producing the same error with
the same compiler parameters but errors were not necessarily generated in the
same functions. No GCC version from the tested set was found to produce only
working code with full optimizations enabled. Clang does not seem to share this
issue.

The only thing I can think of is that during some more aggressive optimization
passes, pointer somehow gets wrong alignment information attached to it.
However, I know nothing of GCC's internals to understand how this could happen.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/113025] Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
  2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
@ 2023-12-14 21:44 ` pinskia at gcc dot gnu.org
  2023-12-18 17:10 ` juki at gcc dot mail.kapsi.fi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-14 21:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So __m128i is a 16byte aligned type, you need to use __m128i_u if it is
supported.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/113025] Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
  2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
  2023-12-14 21:44 ` [Bug c++/113025] " pinskia at gcc dot gnu.org
@ 2023-12-18 17:10 ` juki at gcc dot mail.kapsi.fi
  2023-12-18 17:13 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: juki at gcc dot mail.kapsi.fi @ 2023-12-18 17:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

juki at gcc dot mail.kapsi.fi changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|INVALID                     |FIXED

--- Comment #2 from juki at gcc dot mail.kapsi.fi ---
Unfortunately alignment of the cast type was not causing this issue.

I changed all calls that were defined in GCC headers to use __m128i_u or
__m128d_u types to use those types before unaligned intrinsic.

For example LOAD_SI128 macro looks like the following:

#define LOAD_SI128(ptr) \
        ( ((uintptr_t)(ptr) & 15) == 0 ) ? _mm_load_si128((__m128i*)(ptr)) :
_mm_loadu_si128((__m128i_u*)(ptr))

My changes only changed the debug information locations but did not lead to the
generation of different kind of load operations. In fact, generated assembly
was identical outside of debug line information changes:

$ diff -u0 orig.s fixed.s|grep movdq| wc
      0       0       0

But if aligned loads are removed completely as an option and only unaligned
loads (even with the wrong intrinsic type) are used, no invalid aligned loads
are generated and assembly changes significantly regarding movdq* instructions:

#define LOAD_SI128(ptr) \
        ( 0 ) ? _mm_load_si128((__m128i*)(ptr)) :
_mm_loadu_si128((__m128i*)(ptr))

diff -u0 orig.s align-loads-removed.s|grep movdq| wc
  11001   44004  263376

Above code fixes all our invalid instruction generation while only using
correct types does not.

While I can't share the related sources, I could still try to run different
tests locally to see what is be causing the issue. What could I do next to help
solve this as I do have reliable test cases to work with.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/113025] Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
  2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
  2023-12-14 21:44 ` [Bug c++/113025] " pinskia at gcc dot gnu.org
  2023-12-18 17:10 ` juki at gcc dot mail.kapsi.fi
@ 2023-12-18 17:13 ` pinskia at gcc dot gnu.org
  2023-12-18 21:14 ` xry111 at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-18 17:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|FIXED                       |INVALID

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/113025] Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
  2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
                   ` (2 preceding siblings ...)
  2023-12-18 17:13 ` pinskia at gcc dot gnu.org
@ 2023-12-18 21:14 ` xry111 at gcc dot gnu.org
  2023-12-18 21:14 ` xry111 at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-12-18 21:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xry111 at gcc dot gnu.org

--- Comment #3 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to juki from comment #2)
> Unfortunately alignment of the cast type was not causing this issue.
> 
> I changed all calls that were defined in GCC headers to use __m128i_u or
> __m128d_u types to use those types before unaligned intrinsic.
> 
> For example LOAD_SI128 macro looks like the following:
> 
> #define LOAD_SI128(ptr) \
>         ( ((uintptr_t)(ptr) & 15) == 0 ) ? _mm_load_si128((__m128i*)(ptr)) :
> _mm_loadu_si128((__m128i_u*)(ptr))

This won't work if ptr is a __m128i *.  It is allowed to optimize
(uintptr_t)(__m128i *)foo % 15 to 0 because the standard says (__m128i *)foo
invokes undefined behavior when foo is a pointer not aligned to 16-byte
boundary (C23 section 6.3.2.3p6).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/113025] Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
  2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
                   ` (3 preceding siblings ...)
  2023-12-18 21:14 ` xry111 at gcc dot gnu.org
@ 2023-12-18 21:14 ` xry111 at gcc dot gnu.org
  2023-12-18 21:30 ` juki at gcc dot mail.kapsi.fi
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-12-18 21:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

--- Comment #4 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to Xi Ruoyao from comment #3)
> (In reply to juki from comment #2)
> > Unfortunately alignment of the cast type was not causing this issue.
> > 
> > I changed all calls that were defined in GCC headers to use __m128i_u or
> > __m128d_u types to use those types before unaligned intrinsic.
> > 
> > For example LOAD_SI128 macro looks like the following:
> > 
> > #define LOAD_SI128(ptr) \
> >         ( ((uintptr_t)(ptr) & 15) == 0 ) ? _mm_load_si128((__m128i*)(ptr)) :
> > _mm_loadu_si128((__m128i_u*)(ptr))
> 
> This won't work if ptr is a __m128i *.  It is allowed to optimize
> (uintptr_t)(__m128i *)foo % 15 to 0 because the standard says (__m128i *)foo

I mean % 16, not % 15.

> invokes undefined behavior when foo is a pointer not aligned to 16-byte
> boundary (C23 section 6.3.2.3p6).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/113025] Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
  2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
                   ` (4 preceding siblings ...)
  2023-12-18 21:14 ` xry111 at gcc dot gnu.org
@ 2023-12-18 21:30 ` juki at gcc dot mail.kapsi.fi
  2023-12-18 21:41 ` xry111 at gcc dot gnu.org
  2023-12-18 22:01 ` juki at gcc dot mail.kapsi.fi
  7 siblings, 0 replies; 9+ messages in thread
From: juki at gcc dot mail.kapsi.fi @ 2023-12-18 21:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

--- Comment #5 from juki at gcc dot mail.kapsi.fi ---
(In reply to Xi Ruoyao from comment #4)
> (In reply to Xi Ruoyao from comment #3)
> > 
> > This won't work if ptr is a __m128i *.  It is allowed to optimize
> > (uintptr_t)(__m128i *)foo % 15 to 0 because the standard says (__m128i *)foo
> 
> I mean % 16, not % 15.
> 
> > invokes undefined behavior when foo is a pointer not aligned to 16-byte
> > boundary (C23 section 6.3.2.3p6).

ptr on this case is one of the parameter types defined for various memory load
intrinsics of NEON instruction set like vld1q_u8(const uint8_t *ptr) or
vld1q_u16(const uint16_t *ptr). These instructions expect natural alignment of
that type which is why unaligned load is needed for SSE operation to work with
them in general case.

The same macro is used to implement all those different loads that just need to
read 128 bits of memory into a vector. Alignment of ptr is something smaller
than 16 and can be as low as 1 for const uint8_t which it is has been for the
cases that have been crashing for me.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/113025] Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
  2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
                   ` (5 preceding siblings ...)
  2023-12-18 21:30 ` juki at gcc dot mail.kapsi.fi
@ 2023-12-18 21:41 ` xry111 at gcc dot gnu.org
  2023-12-18 22:01 ` juki at gcc dot mail.kapsi.fi
  7 siblings, 0 replies; 9+ messages in thread
From: xry111 at gcc dot gnu.org @ 2023-12-18 21:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

--- Comment #6 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
Works for me:

#include <xmmintrin.h>
#include <stdint.h>

#define LOAD_SI128(ptr) \
        ( ((uintptr_t)(ptr) & 15) == 0 ) ? _mm_load_si128((__m128i*)(ptr)) :
_mm_loadu_si128((__m128i*)(ptr))

extern char x[16];
__m128i y;

void
test ()
{
  y = LOAD_SI128 (&x);
}

compiled to:

test:
.LFB532:
        .cfi_startproc
        movdqu  x(%rip), %xmm0
        movaps  %xmm0, y(%rip)
        ret
        .cfi_endproc

Note that if x is not extern, GCC will generate:

test:
.LFB532:
        .cfi_startproc
        movdqa  x(%rip), %xmm0
        movaps  %xmm0, y(%rip)
        ret
        .cfi_endproc

but it's legal because GCC places x at 16-byte boundary:

        .align 16
        .type   x, @object
        .size   x, 16
x:
        .zero   16

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c++/113025] Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee
  2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
                   ` (6 preceding siblings ...)
  2023-12-18 21:41 ` xry111 at gcc dot gnu.org
@ 2023-12-18 22:01 ` juki at gcc dot mail.kapsi.fi
  7 siblings, 0 replies; 9+ messages in thread
From: juki at gcc dot mail.kapsi.fi @ 2023-12-18 22:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

--- Comment #7 from juki at gcc dot mail.kapsi.fi ---
(In reply to Xi Ruoyao from comment #6)
> Works for me:

I also have plenty of code where this works just fine and still some locations
where it does not. And there are optimization level requirements (-O3 -NDEBUG)
for that to happen as well or GCC will generate correct code in the places I
have seen those issues.

Like I mentioned earlier, creating a small test case to replicate this has been
something that I haven't been able to do yet. I think there are some special
circumstances necessary for wrong instruction to be generated.

I have caught this with gdb quite nicely in my test cases, debug information
points to this load operation being used there. Assembly at that position shows
movdqa instruction being used while movdqu should have been and removing
possibility to use movdqa fixes all failures for all tested compilers.

So it can fail and failure is tied to that alignment comparison. Somehow.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-12-18 22:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-14 21:39 [Bug c++/113025] New: Pointer is sometimes assumed to be 16-byte aligned even when there is no such guarantee juki at gcc dot mail.kapsi.fi
2023-12-14 21:44 ` [Bug c++/113025] " pinskia at gcc dot gnu.org
2023-12-18 17:10 ` juki at gcc dot mail.kapsi.fi
2023-12-18 17:13 ` pinskia at gcc dot gnu.org
2023-12-18 21:14 ` xry111 at gcc dot gnu.org
2023-12-18 21:14 ` xry111 at gcc dot gnu.org
2023-12-18 21:30 ` juki at gcc dot mail.kapsi.fi
2023-12-18 21:41 ` xry111 at gcc dot gnu.org
2023-12-18 22:01 ` juki at gcc dot mail.kapsi.fi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).