public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug malloc/27227] New: Memory corruption for altivec unaligned load / store
@ 2021-01-22 17:45 kungfujesus06 at gmail dot com
  2021-01-22 18:11 ` [Bug malloc/27227] " schwab@linux-m68k.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: kungfujesus06 at gmail dot com @ 2021-01-22 17:45 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27227

            Bug ID: 27227
           Summary: Memory corruption for altivec unaligned load / store
           Product: glibc
           Version: 2.32
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: kungfujesus06 at gmail dot com
  Target Milestone: ---

I'm not 100% sure this is a bug in assumptions made by the allocator or this is
a data hazard I've always had.  I'm seeing memory corruption in the following
scenario (I might try to produce a minimal case but if somebody can tell me
what I'm doing is in fact dumb before then, then I won't have to do so):

__vector loadu_f32(float *v)
{
    __vector unsigned char permute = vec_lvsl(0, (unsigned char*)v);
    __vector unsigned char lo = vec_ld(0, v);
    __vector unsigned char hi = vec_ld(16, v);
    return vec_perm(lo, hi, permute);
}

float *ptr3d = (float*)_mm_malloc(3 * sizeof(float) * someSizeNotModulusOf4,
16);
size_t numIterations = someSizeNotModulusOf4 / 4;

/* assume ptr3d was populated at some point, in some scalar loop */

for (size_t i = 0; i < numIterations; ++i) {
    __vector x = vec_ld(i*4, ptr3d);
    __vector y = loadu_f32(&ptr3d[someSizeNotModulusOf4+4*i]);
    __vector z = loadu_f32(&ptr3d[2*someSizeNotModulusOf4+4*i]);
}

/* remainder peeling scalar loop goes here */


This structure of a loop seems to cause memory corruption, with assertions
being thrown in malloc.  As far as I can tell, this is _the_ convention to use
for unaligned loads for altivec enabled powerpc machines that don't have VSX:
http://mirror.informatimago.com/next/developer.apple.com/hardware/ve/alignment.html

According to this, so long as some bytes of the second half being loaded (hi)
are on the heap bounds, it should be safe based on heap alignment.  Now, this
is Linux with glibc, not Mac OS X, but I swear this code worked cleanly before
and doesn't seem to be now.  Even weirder, if for malloc I drop in something
that does oversized allocations by 16 bytes, it won't crash, but I see some
evidence of corruption sometime after FFTW uses some altivec enabled kernels. 
When compiling with asan, it catches the out of heap bounds load and complains
loudly without the oversized allocations.  I could see this being a false
positive, for the second half of the load, but it doesn't make sense why the
unaligned loads and stores are then causing corruption later when oversized.

Strangely, LD_PRELOAD'ing Intel's TBB pool allocator proxy for malloc and free
make these symptoms disappear entirely (it's probably oversizing _all_ calls to
malloc/posix_memalign).  

Did some behavior change with regard to this or was this way of doing
misaligned loads always hazardous and I managed to get lucky?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/27227] Memory corruption for altivec unaligned load / store
  2021-01-22 17:45 [Bug malloc/27227] New: Memory corruption for altivec unaligned load / store kungfujesus06 at gmail dot com
@ 2021-01-22 18:11 ` schwab@linux-m68k.org
  2021-01-22 18:21 ` kungfujesus06 at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: schwab@linux-m68k.org @ 2021-01-22 18:11 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27227

--- Comment #1 from Andreas Schwab <schwab@linux-m68k.org> ---
There is not a single memory write in your example, thus any memory corruption
must have happened somewhere else.  In any case, nothing here looks related to
glibc.  Try valgrind.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/27227] Memory corruption for altivec unaligned load / store
  2021-01-22 17:45 [Bug malloc/27227] New: Memory corruption for altivec unaligned load / store kungfujesus06 at gmail dot com
  2021-01-22 18:11 ` [Bug malloc/27227] " schwab@linux-m68k.org
@ 2021-01-22 18:21 ` kungfujesus06 at gmail dot com
  2021-01-22 20:36 ` fweimer at redhat dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: kungfujesus06 at gmail dot com @ 2021-01-22 18:21 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27227

--- Comment #2 from Adam Stylinski <kungfujesus06 at gmail dot com> ---
(In reply to Andreas Schwab from comment #1)
> There is not a single memory write in your example, thus any memory
> corruption must have happened somewhere else.  In any case, nothing here
> looks related to glibc.  Try valgrind.

Sorry, that was definitely not a complete example, I was simply showing the
access pattern.  The memory writes are occurring in similar loops, where the
unaligned writes happen the same way as the loads, but the writes are handled
exactly as they were in Apple's documentation:

void StoreUnaligned( vector unsigned char src, void *target )
{

    vector unsigned char MSQ, LSQ;
    vector unsigned char mask, align, zero, neg1;
    MSQ = vec_ld(0, target);

    // most significant quadword
    LSQ = vec_ld(16, target);
    // least significant quadword
    align = vec_lvsr(0, target);
    // create alignment vector
    zero = vec_spat_u8( 0 );
    // Create vector full of zeros
    neg1 = vec_splat_s8( -1 );
    // Create vector full of -1
    mask=vec_perm(zero,neg1,align);
    Create select mask
    src=vec_perm(src,src,align );

    // Right rotate stored data
    MSQ = vec_sel( MSQ, src, mask );
    // Insert data into MSQ part
    LSQ = vec_sel( src, LSQ, mask );
   // Insert data into LSQ part
   vec_st( MSQ, 0, target );
   // Store the MSQ part
   vec_st( LSQ, 16, target );
   // Store the LSQ part
}

I'm asking first if my access pattern for both loads and stores is faulty to
begin with, or if something changed in glibc recently for the POWER ABI with
regard to alignment.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/27227] Memory corruption for altivec unaligned load / store
  2021-01-22 17:45 [Bug malloc/27227] New: Memory corruption for altivec unaligned load / store kungfujesus06 at gmail dot com
  2021-01-22 18:11 ` [Bug malloc/27227] " schwab@linux-m68k.org
  2021-01-22 18:21 ` kungfujesus06 at gmail dot com
@ 2021-01-22 20:36 ` fweimer at redhat dot com
  2021-01-22 22:20 ` kungfujesus06 at gmail dot com
  2021-01-22 22:23 ` kungfujesus06 at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2021-01-22 20:36 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27227

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #3 from Florian Weimer <fweimer at redhat dot com> ---
Is your program multi-threaded? The Apple document actually contains a warning
regarding that.

powerpc heap layout changed significantly due to the ix for bug 6527; maybe
that's why your program appeared to work reliably before.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/27227] Memory corruption for altivec unaligned load / store
  2021-01-22 17:45 [Bug malloc/27227] New: Memory corruption for altivec unaligned load / store kungfujesus06 at gmail dot com
                   ` (2 preceding siblings ...)
  2021-01-22 20:36 ` fweimer at redhat dot com
@ 2021-01-22 22:20 ` kungfujesus06 at gmail dot com
  2021-01-22 22:23 ` kungfujesus06 at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: kungfujesus06 at gmail dot com @ 2021-01-22 22:20 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27227

--- Comment #4 from Adam Stylinski <kungfujesus06 at gmail dot com> ---
It can be but not in the current test scenario I have.  Doing vec_ld(15, ptr)
for the second half essentially means that the load will never span outside the
heap in the even that the address just so happens to be aligned, no?

Basically the final unaligned loads in the last column are triggering asan and
in general are causing issues.  I suspect the logic for unaligned stores are
causing similar grief.  I _thought_ I had been doing exactly what Apple had
mentioned here:

> Typically this means that a looping function will have to stop one loop iteration before it reaches the end of the data run, and handle the last few bytes in special case code. 

Is this now broken or had I been doing the remainder peeling incorrectly all
along?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug malloc/27227] Memory corruption for altivec unaligned load / store
  2021-01-22 17:45 [Bug malloc/27227] New: Memory corruption for altivec unaligned load / store kungfujesus06 at gmail dot com
                   ` (3 preceding siblings ...)
  2021-01-22 22:20 ` kungfujesus06 at gmail dot com
@ 2021-01-22 22:23 ` kungfujesus06 at gmail dot com
  4 siblings, 0 replies; 6+ messages in thread
From: kungfujesus06 at gmail dot com @ 2021-01-22 22:23 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=27227

--- Comment #5 from Adam Stylinski <kungfujesus06 at gmail dot com> ---
Apologies for my ignorance by the way, I've been mostly spoiled lately with x86
and aarch64 which handle this to some degree for you in the uarch.  I wrote
this code a long time ago when I was just getting familiar with Altivec, so I'm
well aware what I was doing could have been completely wrong and/or stupid.  At
the very least I could have done more to pipeline the unaligned loads so that
the previous vector load for the second half was reused (I don't think GCC can
do this for me).  

I'm hoping people who are intimately familiar with the ABI and heap layout of
big endian POWER4 can clarify if I'm wrong, the library is wrong, or both.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-01-22 22:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-22 17:45 [Bug malloc/27227] New: Memory corruption for altivec unaligned load / store kungfujesus06 at gmail dot com
2021-01-22 18:11 ` [Bug malloc/27227] " schwab@linux-m68k.org
2021-01-22 18:21 ` kungfujesus06 at gmail dot com
2021-01-22 20:36 ` fweimer at redhat dot com
2021-01-22 22:20 ` kungfujesus06 at gmail dot com
2021-01-22 22:23 ` kungfujesus06 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).