From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22479 invoked by alias); 5 Oct 2008 18:52:43 -0000 Received: (qmail 22470 invoked by uid 22791); 5 Oct 2008 18:52:42 -0000 X-Spam-Check-By: sourceware.org Received: from smtp1-g19.free.fr (HELO smtp1-g19.free.fr) (212.27.42.27) by sourceware.org (qpsmtpd/0.31) with ESMTP; Sun, 05 Oct 2008 18:51:41 +0000 Received: from smtp1-g19.free.fr (localhost.localdomain [127.0.0.1]) by smtp1-g19.free.fr (Postfix) with ESMTP id A98A61AB383 for ; Sun, 5 Oct 2008 20:51:37 +0200 (CEST) Received: from [192.168.0.2] (gre92-6-82-231-206-104.fbx.proxad.net [82.231.206.104]) by smtp1-g19.free.fr (Postfix) with ESMTP id CB6DF1AB384 for ; Sun, 5 Oct 2008 20:51:36 +0200 (CEST) Message-ID: <48E90CB1.2040000@yahoo.fr> Date: Sun, 05 Oct 2008 18:52:00 -0000 From: =?ISO-8859-1?Q?S=E9bastien_Kunz-Jacques?= User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 To: pthreads-win32@sourceware.org Subject: Re: pthreads-win32 2.8.0, stack alignment, and SSE code References: <48E8B399.502@yahoo.fr> <48E8C3B8.2080709@lisha.ufsc.br> <48E8D34F.4@yahoo.fr> <48E906BD.5090304@lisha.ufsc.br> In-Reply-To: <48E906BD.5090304@lisha.ufsc.br> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes Mailing-List: contact pthreads-win32-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: pthreads-win32-owner@sourceware.org X-SW-Source: 2008/txt/msg00057.txt.bz2 Ramiro Polla a écrit : > Sébastien Kunz-Jacques wrote: >> Ramiro Polla a écrit : >>> Sébastien Kunz-Jacques wrote: >>>> Hi, >>>> >>>> I encountered problems with SSE code compiled with recent mingw >>>> GCC (4.3.2, TDM release, http://www.tdragon.net/recentgcc/) and >>>> using pthreads 2.8.0. After inverstigation, crashes occured because >>>> the code was trying to read operands on the stack, assuming the >>>> stack was 16-byte aligned as is the case in the main thread (the >>>> main function aligns the stack and alignment is maintained during >>>> each function call). I solved the issue with a very simple patch >>>> that uses some GCC wizardry to force stack realignment upon entry >>>> in a new thread: >>>> >>>> --- ptw32_threadStart.c Sun May 15 17:28:27 2005 >>>> +++ ptw32_threadStart.c Mon Sep 29 21:28:16 2008 >>>> @@ -116,6 +116,9 @@ >>>> >>>> #endif >>>> >>>> +#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && >>>> __GNUC_MINOR__>1) >>>> +__attribute__((force_align_arg_pointer)) >>>> +#endif >>>> #if ! defined (__MINGW32__) || (defined (__MSVCRT__) && ! defined >>>> (__DMC__)) >>>> unsigned >>>> __stdcall >>>> >>>> >>>> The attribute force_align_arg_pointer should be added to every >>>> function that is called with a stack with insufficient alignment; >>>> as far as I am concerned doing this for threadStart only solved my >>>> problems. Maybe this small patch could be added to the pthread code? >>> >>> I think that's related to: >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37216 >>> >>> So I think this patch shouldn't be applied to pthreads-win32, and >>> people should rather use another version of MinGW (or unset >>> automatic SSE code, if that makes any sense). >>> >>> Ramiro Polla >>> >> Not at all ; I use a modified binutils that enforces 16-byte >> alignment alignment of .bss sections (basically I reverted the part >> of the binutils patch that is linked to in the gcc bug 37216 thread). >> The bug I experienced shows up only in threaded code and comes from >> the fact that a stack of a win32 thread in only 4-byte aligned. The >> crash occurs when a data is read on the stack and not in a .bss segment. >> >> To give some contextual information, I tried to build a math library, >> ATLAS, with mingw. First, for the non-threaded version, I encountered >> bug 37216 that you mention ; to get rid of this I patched binutils >> (adding -fno-common to gcc works also). But the threaded version was >> still crashing, and indeed the symptoms looked much similiar to what >> occurred in the non-threaded case. Then I found the solution evoked >> in my first post. >> >> Please note that adding the attribute force_align_arg_pointer to >> threadStart has a negligible performance penalty (a few machine >> instructions each time this function is entered/exited) > > Hmmm... I understand what's going on now. We had this on FFmpeg some > time last year. > > IIRC it all went down to something like: > > - Win32 ABI only specifies 4-byte alignment. > - x86 ABI only specifies 4-byte alignment. > => The thread code is correct when it only aligns to 4-byte. > > - gcc aligns main() to 16-byte and maintains this alignment throughout > all functions. > - gcc doesn't take into account that it is valid to start a thread > with only 4-byte alignment. > - SSE expects 16-byte alignment. > - gcc thinks that a function that needs SSE is already aligned to > 16-byte (because of main()), but in fact it might be only 4-byte > aligned (and still be valid for Win32 and x86). > => It is the function that uses SSE that should make sure it is aligned. > > Actually it is enough to make only the thread entry functions aligned > (any function in the external API that at some point might use SSE). > > Imagine if someone wants to use that ATLAS library but instead of > starting a new thread it wants to call directly the function that > needs SSE (no I haven't checked if it is possible in this case but it > could happen theoretically). And imagine that someone is using MSVC++ > to call that function. MSVC++ only aligns to 4-byte (and again it is > valid). That function would also crash, independent of your patch. > > So in your specific case I think it is the ATLAS functions that should > be aligned (= it would also help to use the library with other > compilers). > > Your patch can also be seen as a way to always sufficiently align the > stack so that any thread started by pthreads-win32 is ok for SSE > instructions (the same way glibc does I think). In that case I don't > have a strong opinion about it. The overhead really is negligible. > Starting the thread takes much longer. > > Ramiro Polla > Actually I have tried calling ATLAS from MSVC, and it (appears to) work. I suspect that ATLAS interface functions realign stack already, but I didn't check this (I am going to ask the ATLAS maintainer about this). The problem that made ATLAS crash without the above fix is that some internal ATLAS functions get started through pthreads, and these ones definitely do not realign the stack. Regarding your last comment, do you imply that the stack realignment is slow? from disassemblies I saw, it stores %esp in another register, aligns esp (andl $-16, %esp), and restores it in the function epilogue. The main performance penalty therefore occurs because one register is used, and this is a reason to do the alignment in a function like threadStart instead of the called function, if the latter does some register-intensive task.