public inbox for pthreads-win32@sourceware.org
 help / color / mirror / Atom feed
* pthreads-win32 2.8.0, stack alignment, and SSE code
@ 2008-10-05 12:32 Sébastien Kunz-Jacques
  2008-10-05 13:41 ` Ramiro Polla
  0 siblings, 1 reply; 11+ messages in thread
From: Sébastien Kunz-Jacques @ 2008-10-05 12:32 UTC (permalink / raw)
  To: pthreads-win32

Hi,

I  encountered problems with SSE code compiled with recent mingw GCC 
(4.3.2, TDM release, http://www.tdragon.net/recentgcc/) and using 
pthreads 2.8.0. After inverstigation, crashes occured because the code 
was trying to read operands on the stack, assuming the stack was 16-byte 
aligned as is the case in the main thread (the main function aligns the 
stack and alignment is maintained during each function call).  I solved 
the issue with a very simple patch that uses some GCC wizardry to force 
stack realignment upon entry in a new thread:

--- ptw32_threadStart.c    Sun May 15 17:28:27 2005
+++ ptw32_threadStart.c    Mon Sep 29 21:28:16 2008
@@ -116,6 +116,9 @@
 
 #endif
 
+#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && 
__GNUC_MINOR__>1)
+__attribute__((force_align_arg_pointer))
+#endif
 #if ! defined (__MINGW32__) || (defined (__MSVCRT__) && ! defined 
(__DMC__))
 unsigned
   __stdcall


The attribute force_align_arg_pointer should be added to every function 
that is called with a stack with insufficient alignment; as far as I am 
concerned doing this for threadStart only solved my problems. Maybe this 
small patch could be added to the pthread code?






^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-05 12:32 pthreads-win32 2.8.0, stack alignment, and SSE code Sébastien Kunz-Jacques
@ 2008-10-05 13:41 ` Ramiro Polla
  2008-10-05 14:47   ` Sébastien Kunz-Jacques
  0 siblings, 1 reply; 11+ messages in thread
From: Ramiro Polla @ 2008-10-05 13:41 UTC (permalink / raw)
  To: pthreads-win32

Sébastien Kunz-Jacques wrote:
> Hi,
> 
> I  encountered problems with SSE code compiled with recent mingw GCC 
> (4.3.2, TDM release, http://www.tdragon.net/recentgcc/) and using 
> pthreads 2.8.0. After inverstigation, crashes occured because the code 
> was trying to read operands on the stack, assuming the stack was 16-byte 
> aligned as is the case in the main thread (the main function aligns the 
> stack and alignment is maintained during each function call).  I solved 
> the issue with a very simple patch that uses some GCC wizardry to force 
> stack realignment upon entry in a new thread:
> 
> --- ptw32_threadStart.c    Sun May 15 17:28:27 2005
> +++ ptw32_threadStart.c    Mon Sep 29 21:28:16 2008
> @@ -116,6 +116,9 @@
> 
> #endif
> 
> +#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && 
> __GNUC_MINOR__>1)
> +__attribute__((force_align_arg_pointer))
> +#endif
> #if ! defined (__MINGW32__) || (defined (__MSVCRT__) && ! defined 
> (__DMC__))
> unsigned
>   __stdcall
> 
> 
> The attribute force_align_arg_pointer should be added to every function 
> that is called with a stack with insufficient alignment; as far as I am 
> concerned doing this for threadStart only solved my problems. Maybe this 
> small patch could be added to the pthread code?

I think that's related to:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37216

So I think this patch shouldn't be applied to pthreads-win32, and people 
should rather use another version of MinGW (or unset automatic SSE code, 
if that makes any sense).

Ramiro Polla

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-05 13:41 ` Ramiro Polla
@ 2008-10-05 14:47   ` Sébastien Kunz-Jacques
  2008-10-05 18:27     ` Ramiro Polla
  0 siblings, 1 reply; 11+ messages in thread
From: Sébastien Kunz-Jacques @ 2008-10-05 14:47 UTC (permalink / raw)
  To: pthreads-win32

Ramiro Polla a écrit :
> Sébastien Kunz-Jacques wrote:
>> Hi,
>>
>> I  encountered problems with SSE code compiled with recent mingw GCC 
>> (4.3.2, TDM release, http://www.tdragon.net/recentgcc/) and using 
>> pthreads 2.8.0. After inverstigation, crashes occured because the 
>> code was trying to read operands on the stack, assuming the stack was 
>> 16-byte aligned as is the case in the main thread (the main function 
>> aligns the stack and alignment is maintained during each function 
>> call).  I solved the issue with a very simple patch that uses some 
>> GCC wizardry to force stack realignment upon entry in a new thread:
>>
>> --- ptw32_threadStart.c    Sun May 15 17:28:27 2005
>> +++ ptw32_threadStart.c    Mon Sep 29 21:28:16 2008
>> @@ -116,6 +116,9 @@
>>
>> #endif
>>
>> +#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && 
>> __GNUC_MINOR__>1)
>> +__attribute__((force_align_arg_pointer))
>> +#endif
>> #if ! defined (__MINGW32__) || (defined (__MSVCRT__) && ! defined 
>> (__DMC__))
>> unsigned
>>   __stdcall
>>
>>
>> The attribute force_align_arg_pointer should be added to every 
>> function that is called with a stack with insufficient alignment; as 
>> far as I am concerned doing this for threadStart only solved my 
>> problems. Maybe this small patch could be added to the pthread code?
>
> I think that's related to:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37216
>
> So I think this patch shouldn't be applied to pthreads-win32, and 
> people should rather use another version of MinGW (or unset automatic 
> SSE code, if that makes any sense).
>
> Ramiro Polla
>
Not at all ; I use a modified binutils that enforces 16-byte alignment 
alignment of .bss sections (basically I reverted the part of the 
binutils patch that is linked to in the gcc bug 37216 thread). The bug I 
experienced shows up only in threaded code and comes from the fact that 
a stack of a win32 thread in only 4-byte aligned. The crash occurs when 
a data is read on the stack and not in a .bss segment.

To give some contextual information, I tried to build a math library, 
ATLAS, with mingw. First, for the non-threaded version, I encountered 
bug 37216 that you mention ; to get rid of this I patched binutils 
(adding -fno-common to gcc works also). But the threaded version was 
still crashing, and indeed the symptoms looked much similiar to what 
occurred in the non-threaded case. Then I found the solution evoked in 
my first post.

Please note that adding the attribute force_align_arg_pointer to 
threadStart has a negligible performance penalty (a few machine 
instructions each time this function is entered/exited)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-05 14:47   ` Sébastien Kunz-Jacques
@ 2008-10-05 18:27     ` Ramiro Polla
  2008-10-05 18:52       ` Sébastien Kunz-Jacques
  0 siblings, 1 reply; 11+ messages in thread
From: Ramiro Polla @ 2008-10-05 18:27 UTC (permalink / raw)
  To: pthreads-win32

Sébastien Kunz-Jacques wrote:
> Ramiro Polla a écrit :
>> Sébastien Kunz-Jacques wrote:
>>> Hi,
>>>
>>> I  encountered problems with SSE code compiled with recent mingw GCC 
>>> (4.3.2, TDM release, http://www.tdragon.net/recentgcc/) and using 
>>> pthreads 2.8.0. After inverstigation, crashes occured because the 
>>> code was trying to read operands on the stack, assuming the stack was 
>>> 16-byte aligned as is the case in the main thread (the main function 
>>> aligns the stack and alignment is maintained during each function 
>>> call).  I solved the issue with a very simple patch that uses some 
>>> GCC wizardry to force stack realignment upon entry in a new thread:
>>>
>>> --- ptw32_threadStart.c    Sun May 15 17:28:27 2005
>>> +++ ptw32_threadStart.c    Mon Sep 29 21:28:16 2008
>>> @@ -116,6 +116,9 @@
>>>
>>> #endif
>>>
>>> +#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && 
>>> __GNUC_MINOR__>1)
>>> +__attribute__((force_align_arg_pointer))
>>> +#endif
>>> #if ! defined (__MINGW32__) || (defined (__MSVCRT__) && ! defined 
>>> (__DMC__))
>>> unsigned
>>>   __stdcall
>>>
>>>
>>> The attribute force_align_arg_pointer should be added to every 
>>> function that is called with a stack with insufficient alignment; as 
>>> far as I am concerned doing this for threadStart only solved my 
>>> problems. Maybe this small patch could be added to the pthread code?
>>
>> I think that's related to:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37216
>>
>> So I think this patch shouldn't be applied to pthreads-win32, and 
>> people should rather use another version of MinGW (or unset automatic 
>> SSE code, if that makes any sense).
>>
>> Ramiro Polla
>>
> Not at all ; I use a modified binutils that enforces 16-byte alignment 
> alignment of .bss sections (basically I reverted the part of the 
> binutils patch that is linked to in the gcc bug 37216 thread). The bug I 
> experienced shows up only in threaded code and comes from the fact that 
> a stack of a win32 thread in only 4-byte aligned. The crash occurs when 
> a data is read on the stack and not in a .bss segment.
> 
> To give some contextual information, I tried to build a math library, 
> ATLAS, with mingw. First, for the non-threaded version, I encountered 
> bug 37216 that you mention ; to get rid of this I patched binutils 
> (adding -fno-common to gcc works also). But the threaded version was 
> still crashing, and indeed the symptoms looked much similiar to what 
> occurred in the non-threaded case. Then I found the solution evoked in 
> my first post.
> 
> Please note that adding the attribute force_align_arg_pointer to 
> threadStart has a negligible performance penalty (a few machine 
> instructions each time this function is entered/exited)

Hmmm... I understand what's going on now. We had this on FFmpeg some 
time last year.

IIRC it all went down to something like:

- Win32 ABI only specifies 4-byte alignment.
- x86 ABI only specifies 4-byte alignment.
=> The thread code is correct when it only aligns to 4-byte.

- gcc aligns main() to 16-byte and maintains this alignment throughout 
all functions.
- gcc doesn't take into account that it is valid to start a thread with 
only 4-byte alignment.
- SSE expects 16-byte alignment.
- gcc thinks that a function that needs SSE is already aligned to 
16-byte (because of main()), but in fact it might be only 4-byte aligned 
(and still be valid for Win32 and x86).
=> It is the function that uses SSE that should make sure it is aligned.

Actually it is enough to make only the thread entry functions aligned 
(any function in the external API that at some point might use SSE).

Imagine if someone wants to use that ATLAS library but instead of 
starting a new thread it wants to call directly the function that needs 
SSE (no I haven't checked if it is possible in this case but it could 
happen theoretically). And imagine that someone is using MSVC++ to call 
that function. MSVC++ only aligns to 4-byte (and again it is valid). 
That function would also crash, independent of your patch.

So in your specific case I think it is the ATLAS functions that should 
be aligned (= it would also help to use the library with other compilers).

Your patch can also be seen as a way to always sufficiently align the 
stack so that any thread started by pthreads-win32 is ok for SSE 
instructions (the same way glibc does I think). In that case I don't 
have a strong opinion about it. The overhead really is negligible. 
Starting the thread takes much longer.

Ramiro Polla

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-05 18:27     ` Ramiro Polla
@ 2008-10-05 18:52       ` Sébastien Kunz-Jacques
  2008-10-05 19:25         ` Ramiro Polla
  0 siblings, 1 reply; 11+ messages in thread
From: Sébastien Kunz-Jacques @ 2008-10-05 18:52 UTC (permalink / raw)
  To: pthreads-win32

Ramiro Polla a écrit :
> Sébastien Kunz-Jacques wrote:
>> Ramiro Polla a écrit :
>>> Sébastien Kunz-Jacques wrote:
>>>> Hi,
>>>>
>>>> I  encountered problems with SSE code compiled with recent mingw 
>>>> GCC (4.3.2, TDM release, http://www.tdragon.net/recentgcc/) and 
>>>> using pthreads 2.8.0. After inverstigation, crashes occured because 
>>>> the code was trying to read operands on the stack, assuming the 
>>>> stack was 16-byte aligned as is the case in the main thread (the 
>>>> main function aligns the stack and alignment is maintained during 
>>>> each function call).  I solved the issue with a very simple patch 
>>>> that uses some GCC wizardry to force stack realignment upon entry 
>>>> in a new thread:
>>>>
>>>> --- ptw32_threadStart.c    Sun May 15 17:28:27 2005
>>>> +++ ptw32_threadStart.c    Mon Sep 29 21:28:16 2008
>>>> @@ -116,6 +116,9 @@
>>>>
>>>> #endif
>>>>
>>>> +#if defined(__GNUC__) && (__GNUC__ > 4 || __GNUC__ == 4 && 
>>>> __GNUC_MINOR__>1)
>>>> +__attribute__((force_align_arg_pointer))
>>>> +#endif
>>>> #if ! defined (__MINGW32__) || (defined (__MSVCRT__) && ! defined 
>>>> (__DMC__))
>>>> unsigned
>>>>   __stdcall
>>>>
>>>>
>>>> The attribute force_align_arg_pointer should be added to every 
>>>> function that is called with a stack with insufficient alignment; 
>>>> as far as I am concerned doing this for threadStart only solved my 
>>>> problems. Maybe this small patch could be added to the pthread code?
>>>
>>> I think that's related to:
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37216
>>>
>>> So I think this patch shouldn't be applied to pthreads-win32, and 
>>> people should rather use another version of MinGW (or unset 
>>> automatic SSE code, if that makes any sense).
>>>
>>> Ramiro Polla
>>>
>> Not at all ; I use a modified binutils that enforces 16-byte 
>> alignment alignment of .bss sections (basically I reverted the part 
>> of the binutils patch that is linked to in the gcc bug 37216 thread). 
>> The bug I experienced shows up only in threaded code and comes from 
>> the fact that a stack of a win32 thread in only 4-byte aligned. The 
>> crash occurs when a data is read on the stack and not in a .bss segment.
>>
>> To give some contextual information, I tried to build a math library, 
>> ATLAS, with mingw. First, for the non-threaded version, I encountered 
>> bug 37216 that you mention ; to get rid of this I patched binutils 
>> (adding -fno-common to gcc works also). But the threaded version was 
>> still crashing, and indeed the symptoms looked much similiar to what 
>> occurred in the non-threaded case. Then I found the solution evoked 
>> in my first post.
>>
>> Please note that adding the attribute force_align_arg_pointer to 
>> threadStart has a negligible performance penalty (a few machine 
>> instructions each time this function is entered/exited)
>
> Hmmm... I understand what's going on now. We had this on FFmpeg some 
> time last year.
>
> IIRC it all went down to something like:
>
> - Win32 ABI only specifies 4-byte alignment.
> - x86 ABI only specifies 4-byte alignment.
> => The thread code is correct when it only aligns to 4-byte.
>
> - gcc aligns main() to 16-byte and maintains this alignment throughout 
> all functions.
> - gcc doesn't take into account that it is valid to start a thread 
> with only 4-byte alignment.
> - SSE expects 16-byte alignment.
> - gcc thinks that a function that needs SSE is already aligned to 
> 16-byte (because of main()), but in fact it might be only 4-byte 
> aligned (and still be valid for Win32 and x86).
> => It is the function that uses SSE that should make sure it is aligned.
>
> Actually it is enough to make only the thread entry functions aligned 
> (any function in the external API that at some point might use SSE).
>
> Imagine if someone wants to use that ATLAS library but instead of 
> starting a new thread it wants to call directly the function that 
> needs SSE (no I haven't checked if it is possible in this case but it 
> could happen theoretically). And imagine that someone is using MSVC++ 
> to call that function. MSVC++ only aligns to 4-byte (and again it is 
> valid). That function would also crash, independent of your patch.
>
> So in your specific case I think it is the ATLAS functions that should 
> be aligned (= it would also help to use the library with other 
> compilers).
>
> Your patch can also be seen as a way to always sufficiently align the 
> stack so that any thread started by pthreads-win32 is ok for SSE 
> instructions (the same way glibc does I think). In that case I don't 
> have a strong opinion about it. The overhead really is negligible. 
> Starting the thread takes much longer.
>
> Ramiro Polla
>
Actually I have tried calling ATLAS from MSVC, and it (appears to) work. 
I suspect that ATLAS interface functions realign stack already, but I 
didn't check this (I am going to ask the ATLAS maintainer about this). 
The problem that made ATLAS crash without the above fix is that  some 
internal ATLAS functions get started through pthreads, and these ones 
definitely do not realign the stack.

Regarding your last comment, do you imply that the stack realignment is 
slow? from disassemblies I saw, it stores %esp in another register, 
aligns esp (andl    $-16, %esp), and restores it in the function 
epilogue. The main performance penalty therefore occurs because one 
register is used, and this is a reason to do the alignment in a function 
like threadStart instead of the called function, if the latter does some 
register-intensive task.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-05 18:52       ` Sébastien Kunz-Jacques
@ 2008-10-05 19:25         ` Ramiro Polla
  2008-10-05 20:12           ` Sébastien Kunz-Jacques
  0 siblings, 1 reply; 11+ messages in thread
From: Ramiro Polla @ 2008-10-05 19:25 UTC (permalink / raw)
  To: pthreads-win32

Sébastien Kunz-Jacques wrote:
> Ramiro Polla a écrit :
[...]
>> Imagine if someone wants to use that ATLAS library but instead of 
>> starting a new thread it wants to call directly the function that 
>> needs SSE (no I haven't checked if it is possible in this case but it 
>> could happen theoretically). And imagine that someone is using MSVC++ 
>> to call that function. MSVC++ only aligns to 4-byte (and again it is 
>> valid). That function would also crash, independent of your patch.
>>
>> So in your specific case I think it is the ATLAS functions that should 
>> be aligned (= it would also help to use the library with other 
>> compilers).
[...]
> Actually I have tried calling ATLAS from MSVC, and it (appears to) work. 
> I suspect that ATLAS interface functions realign stack already, but I 
> didn't check this (I am going to ask the ATLAS maintainer about this). 
> The problem that made ATLAS crash without the above fix is that  some 
> internal ATLAS functions get started through pthreads, and these ones 
> definitely do not realign the stack.

Then I suspect it is only these ones that should need force_align.

[...]
 >> Your patch can also be seen as a way to always sufficiently align the
 >> stack so that any thread started by pthreads-win32 is ok for SSE
 >> instructions (the same way glibc does I think). In that case I don't
 >> have a strong opinion about it. The overhead really is negligible.
 >> Starting the thread takes much longer.
[...]
> Regarding your last comment, do you imply that the stack realignment is 
> slow? from disassemblies I saw, it stores %esp in another register, 
> aligns esp (andl    $-16, %esp), and restores it in the function 
> epilogue. The main performance penalty therefore occurs because one 
> register is used, and this is a reason to do the alignment in a function 
> like threadStart instead of the called function, if the latter does some 
> register-intensive task.

I didn't express myself very well then. I meant to say: "The overhead 
really is negligible. Starting the thread takes much longer, so the 
overhead in aligning the stack gets hidden away in the delay to start 
the thread".

Ramiro Polla

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-05 19:25         ` Ramiro Polla
@ 2008-10-05 20:12           ` Sébastien Kunz-Jacques
  2008-10-05 22:42             ` Ramiro Polla
  0 siblings, 1 reply; 11+ messages in thread
From: Sébastien Kunz-Jacques @ 2008-10-05 20:12 UTC (permalink / raw)
  To: pthreads-win32

Ramiro Polla a écrit :
>> Ramiro Polla a écrit :
>> I didn't express myself very well then. I meant to say: "The overhead 
>> really is negligible. Starting the thread takes much longer, so the 
>> overhead in aligning the stack gets hidden away in the delay to start 
>> the thread".
In that case always aligning the stack in threadStart would save some 
headaches to a lot of people, I think. While googling about these issues 
I found mentions of the ffmpeg case you talked about; that is because of 
them that I found the correct attribute to align the stack. some mplayer 
codecs seem to have trouble with these multithreaded alignment issues too.

If for some reason it is not desirable to patch the lib, would it be 
possible to have some easy to see disclaimer added about this problem 
somewhere?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-05 20:12           ` Sébastien Kunz-Jacques
@ 2008-10-05 22:42             ` Ramiro Polla
  2008-10-09 13:14               ` Ross Johnson
  0 siblings, 1 reply; 11+ messages in thread
From: Ramiro Polla @ 2008-10-05 22:42 UTC (permalink / raw)
  To: pthreads-win32

Sébastien Kunz-Jacques wrote:
> If for some reason it is not desirable to patch the lib, would it be 
> possible to have some easy to see disclaimer added about this problem 
> somewhere?

Oh, that's not my call =). It's up to Ross Johnson to decide. I simply 
had lots of free time today and decided to share my ideas. You can 
disregard anything I said... (although I think they might help).

The patch is not necessary because 4-byte alignment is enough for x86 
and Win32, but it certainly might help some people to avoid a headache 
like you mentioned.

Ramiro Polla

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-05 22:42             ` Ramiro Polla
@ 2008-10-09 13:14               ` Ross Johnson
  2008-10-09 19:51                 ` Sébastien Kunz-Jacques
  2008-10-23  5:57                 ` Sébastien Kunz-Jacques
  0 siblings, 2 replies; 11+ messages in thread
From: Ross Johnson @ 2008-10-09 13:14 UTC (permalink / raw)
  To: pthreads-win32

I've just read this whole thread for the first time. I haven't come 
across this issue of alignment on Intel processors before so I thought 
I'd better at least Google around the subject before replying. 
Unfortunately I've got to run now and won't be reading mail for another 
5 days or so.

I would very likely include the patch as a build option, so I'm 
wondering if you've tried building the library with the -mstackrealign 
gcc flag that does the same thing as force_align_arg_pointer (I haven't 
tried either of these but read about it).

Ross

Ramiro Polla wrote:
> Sébastien Kunz-Jacques wrote:
>> If for some reason it is not desirable to patch the lib, would it be 
>> possible to have some easy to see disclaimer added about this problem 
>> somewhere?
>
> Oh, that's not my call =). It's up to Ross Johnson to decide. I simply 
> had lots of free time today and decided to share my ideas. You can 
> disregard anything I said... (although I think they might help).
>
> The patch is not necessary because 4-byte alignment is enough for x86 
> and Win32, but it certainly might help some people to avoid a headache 
> like you mentioned.
>
> Ramiro Polla


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-09 13:14               ` Ross Johnson
@ 2008-10-09 19:51                 ` Sébastien Kunz-Jacques
  2008-10-23  5:57                 ` Sébastien Kunz-Jacques
  1 sibling, 0 replies; 11+ messages in thread
From: Sébastien Kunz-Jacques @ 2008-10-09 19:51 UTC (permalink / raw)
  To: pthreads-win32

Ross Johnson a écrit :
> I've just read this whole thread for the first time. I haven't come 
> across this issue of alignment on Intel processors before so I thought 
> I'd better at least Google around the subject before replying. 
> Unfortunately I've got to run now and won't be reading mail for 
> another 5 days or so.
>
> I would very likely include the patch as a build option, so I'm 
> wondering if you've tried building the library with the -mstackrealign 
> gcc flag that does the same thing as force_align_arg_pointer (I 
> haven't tried either of these but read about it).
>
> Ross
>
>
I haven't tried it, but will do it shortly. Since this realigns the 
stack in all functions, it may impact performance. I would think this 
impact is small to negligible however.

SKJ

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pthreads-win32 2.8.0, stack alignment, and SSE code
  2008-10-09 13:14               ` Ross Johnson
  2008-10-09 19:51                 ` Sébastien Kunz-Jacques
@ 2008-10-23  5:57                 ` Sébastien Kunz-Jacques
  1 sibling, 0 replies; 11+ messages in thread
From: Sébastien Kunz-Jacques @ 2008-10-23  5:57 UTC (permalink / raw)
  Cc: pthreads-win32

regarding my initial problem of making ATLAS work with pthread-win32, a 
solution was found with solves all ATLAS alignment problems without 
having to patch pthreads: it  is to add the option 

-mpreferred-stack-boundary=2

to the gcc ATLAS flags ( see 
http://sourceforge.net/tracker/index.php?func=detail&aid=2170667&group_id=23725&atid=379483 
). This should work with any client lib of pthreads. More generally it 
makes gcc more compliant with a platform such as win32 whose ABI only 
guarantees 4-byte stack alignment.

I still think aligning the stack of newly created threads on a 16-byte 
boundary would be a useful option for pthread-win32.

Ross Johnson a écrit :
> I've just read this whole thread for the first time. I haven't come 
> across this issue of alignment on Intel processors before so I thought 
> I'd better at least Google around the subject before replying. 
> Unfortunately I've got to run now and won't be reading mail for 
> another 5 days or so.
>
> I would very likely include the patch as a build option, so I'm 
> wondering if you've tried building the library with the -mstackrealign 
> gcc flag that does the same thing as force_align_arg_pointer (I 
> haven't tried either of these but read about it).
>
> Ross
>
> Ramiro Polla wrote:
>> Sébastien Kunz-Jacques wrote:
>>> If for some reason it is not desirable to patch the lib, would it be 
>>> possible to have some easy to see disclaimer added about this 
>>> problem somewhere?
>>
>> Oh, that's not my call =). It's up to Ross Johnson to decide. I 
>> simply had lots of free time today and decided to share my ideas. You 
>> can disregard anything I said... (although I think they might help).
>>
>> The patch is not necessary because 4-byte alignment is enough for x86 
>> and Win32, but it certainly might help some people to avoid a 
>> headache like you mentioned.
>>
>> Ramiro Polla
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-10-23  5:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-05 12:32 pthreads-win32 2.8.0, stack alignment, and SSE code Sébastien Kunz-Jacques
2008-10-05 13:41 ` Ramiro Polla
2008-10-05 14:47   ` Sébastien Kunz-Jacques
2008-10-05 18:27     ` Ramiro Polla
2008-10-05 18:52       ` Sébastien Kunz-Jacques
2008-10-05 19:25         ` Ramiro Polla
2008-10-05 20:12           ` Sébastien Kunz-Jacques
2008-10-05 22:42             ` Ramiro Polla
2008-10-09 13:14               ` Ross Johnson
2008-10-09 19:51                 ` Sébastien Kunz-Jacques
2008-10-23  5:57                 ` Sébastien Kunz-Jacques

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).