* [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
@ 2014-09-24 8:11 David Wohlferd
2014-09-24 8:48 ` Richard Biener
0 siblings, 1 reply; 17+ messages in thread
From: David Wohlferd @ 2014-09-24 8:11 UTC (permalink / raw)
To: gcc, Yury Gribov, Hans-Peter Nilsson
Hans-Peter Nilsson: I should have listened to you back when you raised
concerns about this. My apologies for ever doubting you.
In summary:
- The "trick" in the docs for using an arbitrarily sized struct to force
register flushes for inline asm does not work.
- Placing the inline asm in a separate routine can sometimes mask the
problem with the trick not working.
- The sample that has been in the docs forever performs an unhelpful,
unexpected, and probably unwanted stack allocation + memcpy.
Details:
Here is the text from the docs:
-----------
One trick to avoid [using the "memory" clobber] is available if the size
of the memory being accessed is known at compile time. For example, if
accessing ten bytes of a string, use a memory input like:
"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
-----------
When I did the re-write of gcc's inline asm docs, I left the description
for this (essentially) untouched. I just took it on faith that "magic
happens" and the right code gets generated. But reading a recent post
raised questions for me, so I tried it. And what I found was that not
only does this not work, it actually just makes a mess.
I started with some code that I knew required some memory clobbering:
#include <stdio.h>
int main(int argc, char* argv[])
{
struct
{
int a;
int b;
} c;
c.a = 1;
c.b = 2;
int Count = sizeof(c);
void *Dest;
__asm__ __volatile__ ("rep; stosb"
: "=D" (Dest), "+c" (Count)
: "0" (&c), "a" (0)
//: "memory"
);
printf("%u %u\n", c.a, c.b);
}
As written, this x64 code (compiled with -O2) will print out "1 2", even
though someone might (incorrectly) expect the asm to overwrite the
struct with zeros. Adding the memory clobber allows this code to work
as expected (printing "0 0").
Now that I have code I can use to see if registers are getting flushed,
I removed the memory clobber, and tried just 'clobbering' the struct:
#include <stdio.h>
int main(int argc, char* argv[])
{
struct
{
int a;
int b;
} c;
c.a = 1;
c.b = 2;
int Count = sizeof(c);
void *Dest;
__asm__ __volatile__ ("rep; stosb"
: "=D" (Dest), "+c" (Count)
: "0" (&c), "a" (0),
"m" ( ({ struct foo { char x[8]; } *p = (struct foo *)&c ;
*p; }) )
);
printf("%u %u\n", c.a, c.b);
}
I'm using a named struct (foo) to avoid some compiler messages, but
other than that, I believe this is the same as what's in the docs. And
it doesn't work. I still get "1 2".
At this point I realized that code I've seen using this trick usually
has the asm in its own routine. When I try this, it still fails.
Unless I start cranking up the size of x from 8 to ~250. At ~250,
suddenly it starts working. Apparently this is because at this point,
gcc decides not to inline the routine anymore, and flushes the registers
before calling the non-inline code.
And why does changing the size of the structure we are pointing to
result in increases in the size of the routine? Reading the -S output,
the "*p" at the end of this constraint generates a call to memcpy the
250 characters onto the stack, which it passes to the asm as %4, which
is never used. Argh!
Conclusion:
What I expected when using that sample code from the docs was that any
registers that contain values from the struct would get flushed to
memory. This was intended to be a 'cheaper' alternative to doing a
full-on "memory" clobber. What I got instead was an unexpected/unneeded
stack allocation and memcpy, and STILL didn't get the values flushed.
Yeah, not exactly the 'cheaper' I was hoping for.
Is the example in the docs just written incorrectly? Did this get
broken somewhere along the line? Or am I just using it wrong?
I'm using gcc version 4.9.0 (x86_64-win32-seh-rev2, Built by MinGW-W64
project). Remember to compile these x64 samples with -O2.
dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-09-24 8:11 [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work David Wohlferd
@ 2014-09-24 8:48 ` Richard Biener
2014-09-25 20:48 ` Yury Gribov
0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2014-09-24 8:48 UTC (permalink / raw)
To: David Wohlferd; +Cc: gcc, Yury Gribov, Hans-Peter Nilsson
On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd <dw@limegreensocks.com> wrote:
> Hans-Peter Nilsson: I should have listened to you back when you raised
> concerns about this. My apologies for ever doubting you.
>
> In summary:
>
> - The "trick" in the docs for using an arbitrarily sized struct to force
> register flushes for inline asm does not work.
> - Placing the inline asm in a separate routine can sometimes mask the
> problem with the trick not working.
> - The sample that has been in the docs forever performs an unhelpful,
> unexpected, and probably unwanted stack allocation + memcpy.
>
> Details:
>
> Here is the text from the docs:
>
> -----------
> One trick to avoid [using the "memory" clobber] is available if the size of
> the memory being accessed is known at compile time. For example, if
> accessing ten bytes of a string, use a memory input like:
>
> "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
Well - this can't work because you essentially are using a _value_
here (looking at the GIMPLE - I'm not sure if a statement expression
evaluates to an lvalue.
It should work if you simply do this without a stmt expression:
"m" (*(struct { char x[10]; } *)ptr)
because that's clearly an lvalue (and the GIMPLE correctly says so):
<bb 2>:
c.a = 1;
c.b = 2;
__asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
&c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
printf ("%u %u\n", 1, 2);
note that we still constant propagated 1 and 2 for the reason that
the asm didn't get any VDEF. That's because you do not have any
memory output! So while it keeps 'c' live it doesn't consider it
modified by the asm. You'd still need to clobber the memory,
but "m" clobbers are not supported, only "memory".
Thus fixed asm:
__asm__ __volatile__ ("rep; stosb"
: "=D" (Dest), "+c" (Count)
: "0" (&c), "a" (0),
"m" (*( struct foo { char x[8]; } *)&c)
: "memory"
);
where I'm not 100% sure if the "m" input is now pointless (that is,
if a "memory" clobber also constitutes a use of all memory).
Richard.
> -----------
>
> When I did the re-write of gcc's inline asm docs, I left the description for
> this (essentially) untouched. I just took it on faith that "magic happens"
> and the right code gets generated. But reading a recent post raised
> questions for me, so I tried it. And what I found was that not only does
> this not work, it actually just makes a mess.
>
> I started with some code that I knew required some memory clobbering:
>
> #include <stdio.h>
>
> int main(int argc, char* argv[])
> {
> struct
> {
> int a;
> int b;
> } c;
>
> c.a = 1;
> c.b = 2;
>
> int Count = sizeof(c);
> void *Dest;
>
> __asm__ __volatile__ ("rep; stosb"
> : "=D" (Dest), "+c" (Count)
> : "0" (&c), "a" (0)
> //: "memory"
> );
>
> printf("%u %u\n", c.a, c.b);
> }
>
> As written, this x64 code (compiled with -O2) will print out "1 2", even
> though someone might (incorrectly) expect the asm to overwrite the struct
> with zeros. Adding the memory clobber allows this code to work as expected
> (printing "0 0").
>
> Now that I have code I can use to see if registers are getting flushed, I
> removed the memory clobber, and tried just 'clobbering' the struct:
>
> #include <stdio.h>
>
> int main(int argc, char* argv[])
> {
> struct
> {
> int a;
> int b;
> } c;
>
> c.a = 1;
> c.b = 2;
>
> int Count = sizeof(c);
> void *Dest;
>
> __asm__ __volatile__ ("rep; stosb"
> : "=D" (Dest), "+c" (Count)
> : "0" (&c), "a" (0),
> "m" ( ({ struct foo { char x[8]; } *p = (struct foo *)&c ; *p; })
> )
> );
>
> printf("%u %u\n", c.a, c.b);
> }
>
> I'm using a named struct (foo) to avoid some compiler messages, but other
> than that, I believe this is the same as what's in the docs. And it doesn't
> work. I still get "1 2".
>
> At this point I realized that code I've seen using this trick usually has
> the asm in its own routine. When I try this, it still fails. Unless I
> start cranking up the size of x from 8 to ~250. At ~250, suddenly it starts
> working. Apparently this is because at this point, gcc decides not to
> inline the routine anymore, and flushes the registers before calling the
> non-inline code.
>
> And why does changing the size of the structure we are pointing to result in
> increases in the size of the routine? Reading the -S output, the "*p" at
> the end of this constraint generates a call to memcpy the 250 characters
> onto the stack, which it passes to the asm as %4, which is never used.
> Argh!
>
> Conclusion:
>
> What I expected when using that sample code from the docs was that any
> registers that contain values from the struct would get flushed to memory.
> This was intended to be a 'cheaper' alternative to doing a full-on "memory"
> clobber. What I got instead was an unexpected/unneeded stack allocation and
> memcpy, and STILL didn't get the values flushed. Yeah, not exactly the
> 'cheaper' I was hoping for.
>
> Is the example in the docs just written incorrectly? Did this get broken
> somewhere along the line? Or am I just using it wrong?
>
> I'm using gcc version 4.9.0 (x86_64-win32-seh-rev2, Built by MinGW-W64
> project). Remember to compile these x64 samples with -O2.
>
> dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-09-24 8:48 ` Richard Biener
@ 2014-09-25 20:48 ` Yury Gribov
2014-09-26 10:50 ` David Wohlferd
0 siblings, 1 reply; 17+ messages in thread
From: Yury Gribov @ 2014-09-25 20:48 UTC (permalink / raw)
To: Richard Biener, David Wohlferd; +Cc: gcc, Hans-Peter Nilsson
On 09/24/2014 12:31 PM, Richard Biener wrote:
> On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd <dw@limegreensocks.com> wrote:
>> Hans-Peter Nilsson: I should have listened to you back when you raised
>> concerns about this. My apologies for ever doubting you.
>>
>> In summary:
>>
>> - The "trick" in the docs for using an arbitrarily sized struct to force
>> register flushes for inline asm does not work.
>> - Placing the inline asm in a separate routine can sometimes mask the
>> problem with the trick not working.
>> - The sample that has been in the docs forever performs an unhelpful,
>> unexpected, and probably unwanted stack allocation + memcpy.
>>
>> Details:
>>
>> Here is the text from the docs:
>>
>> -----------
>> One trick to avoid [using the "memory" clobber] is available if the size of
>> the memory being accessed is known at compile time. For example, if
>> accessing ten bytes of a string, use a memory input like:
>>
>> "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
>
> Well - this can't work because you essentially are using a _value_
> here (looking at the GIMPLE - I'm not sure if a statement expression
> evaluates to an lvalue.
>
> It should work if you simply do this without a stmt expression:
>
> "m" (*(struct { char x[10]; } *)ptr)
>
> because that's clearly an lvalue (and the GIMPLE correctly says so):
>
> <bb 2>:
> c.a = 1;
> c.b = 2;
> __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
> &c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
> printf ("%u %u\n", 1, 2);
>
> note that we still constant propagated 1 and 2 for the reason that
> the asm didn't get any VDEF. That's because you do not have any
> memory output! So while it keeps 'c' live it doesn't consider it
> modified by the asm. You'd still need to clobber the memory,
> but "m" clobbers are not supported, only "memory".
>
> Thus fixed asm:
>
>
> __asm__ __volatile__ ("rep; stosb"
> : "=D" (Dest), "+c" (Count)
> : "0" (&c), "a" (0),
> "m" (*( struct foo { char x[8]; } *)&c)
> : "memory"
> );
>
> where I'm not 100% sure if the "m" input is now pointless (that is,
> if a "memory" clobber also constitutes a use of all memory).
Or maybe even
__asm__ __volatile__ ("rep; stosb"
: "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; } *)&c)
: "0" (&c), "a" (0)
);
to avoid the big-hammer memory clobber?
-Y
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-09-25 20:48 ` Yury Gribov
@ 2014-09-26 10:50 ` David Wohlferd
2014-09-26 14:47 ` Richard Biener
0 siblings, 1 reply; 17+ messages in thread
From: David Wohlferd @ 2014-09-26 10:50 UTC (permalink / raw)
To: Yury Gribov, richard.guenther, gcc,
hp@bitrange.com >> Hans-Peter Nilsson, Gerald Pfeifer
On 9/25/2014 12:36 AM, Yury Gribov wrote:
> On 09/24/2014 12:31 PM, Richard Biener wrote:
>> On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd
>> <dw@limegreensocks.com> wrote:
>>> Hans-Peter Nilsson: I should have listened to you back when you raised
>>> concerns about this. My apologies for ever doubting you.
>>>
>>> In summary:
>>>
>>> - The "trick" in the docs for using an arbitrarily sized struct to
>>> force
>>> register flushes for inline asm does not work.
>>> - Placing the inline asm in a separate routine can sometimes mask the
>>> problem with the trick not working.
>>> - The sample that has been in the docs forever performs an unhelpful,
>>> unexpected, and probably unwanted stack allocation + memcpy.
>>>
>>> Details:
>>>
>>> Here is the text from the docs:
>>>
>>> -----------
>>> One trick to avoid [using the "memory" clobber] is available if the
>>> size of
>>> the memory being accessed is known at compile time. For example, if
>>> accessing ten bytes of a string, use a memory input like:
>>>
>>> "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
>>
>> Well - this can't work because you essentially are using a _value_
>> here (looking at the GIMPLE - I'm not sure if a statement expression
>> evaluates to an lvalue.
>>
>> It should work if you simply do this without a stmt expression:
>>
>> "m" (*(struct { char x[10]; } *)ptr)
>>
>> because that's clearly an lvalue (and the GIMPLE correctly says so):
>>
>> <bb 2>:
>> c.a = 1;
>> c.b = 2;
>> __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
>> &c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
>> printf ("%u %u\n", 1, 2);
>>
>> note that we still constant propagated 1 and 2 for the reason that
>> the asm didn't get any VDEF. That's because you do not have any
>> memory output! So while it keeps 'c' live it doesn't consider it
>> modified by the asm. You'd still need to clobber the memory,
>> but "m" clobbers are not supported, only "memory".
>>
>> Thus fixed asm:
>>
>>
>> __asm__ __volatile__ ("rep; stosb"
>> : "=D" (Dest), "+c" (Count)
>> : "0" (&c), "a" (0),
>> "m" (*( struct foo { char x[8]; } *)&c)
>> : "memory"
>> );
>>
>> where I'm not 100% sure if the "m" input is now pointless (that is,
>> if a "memory" clobber also constitutes a use of all memory).
>
> Or maybe even
> __asm__ __volatile__ ("rep; stosb"
> : "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; }
> *)&c)
> : "0" (&c), "a" (0)
> );
> to avoid the big-hammer memory clobber?
>
> -Y
>
Thank you both for the responses. At this point I've started composing
some replacement text for the docs (below), cuz clearly what's there is
both inadequate and wrong. All comments are welcome.
While the code in Richard's response will always produce the correct
results, the intent here is to use memory constraints to *avoid* the
performance penalties of the memory clobber. The existing docs say this
should work, and I've seen a number of places using it (linux kernel,
glibc, etc). If this does work, we should update the docs to say how
it's done. If this doesn't work, we should say that too.
Looking at Yury's response, that code does work (in c, not c++). At
least it does sometimes. The problem is that sometimes gcc can "lose
track" of what a pointer points to. And when it does, gcc can get
confused about what to flush. Here's a simple example that shows this
(4.9.0, x64, -O2, 'c'):
#include <stdio.h>
inline void *memset( void * Dest, int c, size_t Count) {
void *dummy;
__asm__ (
"rep stosb"
: "=D" (dummy), "+c" (Count), "=m" (*( struct foo { char x[8]; }
*)Dest)
: "0" (Dest), "a" (c)
: "cc"//, "memory"
);
return Dest;
}
int main() {
struct {
int a;
} c;
void *v;
asm volatile("#" : "=r" (v) : "0" (&c) );
c.a = 0x30303030;
//memset(&c, 0, sizeof(c));
memset(v, 0, sizeof(c));
printf("%x\n", c.a);
}
This code will work if you pass &c to memset. But it will fail if you
use v. Oh, the wonders of aliasing.
And this is why I'm having a problem doc'ing this. I love the potential
benefits of using this. But if you are writing general-purpose
routines, how can you hope to know whether the code calling you will
pass you a pointer that will work with this trick? This kind of thing
can introduce *horribly* hard to track down problems. Note that the
memory clobber always works, but potentially with a performance penalty.
<sigh>
I could simply skip describing the whole problem with aliasing, but that
just hides the problem. I hate when docs do that.
That said, here's what I've written so far. Any improvements or
corrections are very welcome.
==========================
Flushing registers to memory has performance implications and may be an
issue
for time-sensitive code. One trick to avoid flushing more registers than is
absolutely required is to use a memory constraint.
For example this i386 code works for both c and c++:
@example
inline void *memset(void *Dest, int c, size_t Count)
@{
struct _reallybigstruct @{ char x[SSIZE_MAX]; @};
void *dummy;
asm (
"rep stosb"
: "=D" (dummy), "+c" (Count), "=m" (*(struct _reallybigstruct *)Dest)
: "0" (Dest), "a" (c)
: "cc"
);
return Dest;
@}
@end example
Similar code can be used if the memory is being read (by changing the memory
constraint to be an input) or updated (by changing the constraint to use
'+' instead of '=').
A few noteworthy things about clobbering using memory constraints:
@itemize @minus
@item
Note that @code{_reallybigstruct} uses a size of @code{SSIZE_MAX}. This
does
not mean that when casting @code{Dest} to @code{(struct _reallybigstruct
*)},
@code{Dest} must be @code{SSIZE_MAX} bytes long or that @code{SSIZE_MAX}
bytes will be used. Nor does it affect registers associated with items
that
may follow @code{Dest} in memory. This only affects the symbol @code{Dest}.
@item
It may be that using memory constraints can remove the need for the asm
@code{volatile} qualifier. This can, under certain circumstances, result in
more efficient code. Consider the memset sample above. Without the memory
constraint, the only outputs are @code{dummy} and @code{Count}. However,
both those symbols go out of scope immediately after the asm statement.
As a
result, gcc optimization could reasonably assume that the asm block isn't
actually needed for anything. Even adding the "memory" clobber doesn't
change this. Normally this is resolved by adding the asm @code{volatile}
qualifier. However, by adding the memory constraint, gcc also considers
whether updating the @code{Dest} block will affect the subsequent code.
Only in the event that the @code{Dest} block is not needed will gcc
consider
removing this asm code.
@end itemize
@strong{Warning}: Older versions of this documentation show an example
for using
memory constraints like this:
@example
"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )
@end example
This code not only doesn't compile in c++, it has potentially undesirable
side effects that affect performance, as well as producing incorrect
results.
==========================
dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-09-26 10:50 ` David Wohlferd
@ 2014-09-26 14:47 ` Richard Biener
2014-10-03 6:41 ` David Wohlferd
0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2014-09-26 14:47 UTC (permalink / raw)
To: David Wohlferd
Cc: Yury Gribov, gcc, hp@bitrange.com >> Hans-Peter Nilsson,
Gerald Pfeifer
On Fri, Sep 26, 2014 at 12:37 PM, David Wohlferd <dw@limegreensocks.com> wrote:
>
> On 9/25/2014 12:36 AM, Yury Gribov wrote:
>>
>> On 09/24/2014 12:31 PM, Richard Biener wrote:
>>>
>>> On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd <dw@limegreensocks.com>
>>> wrote:
>>>>
>>>> Hans-Peter Nilsson: I should have listened to you back when you raised
>>>> concerns about this. My apologies for ever doubting you.
>>>>
>>>> In summary:
>>>>
>>>> - The "trick" in the docs for using an arbitrarily sized struct to force
>>>> register flushes for inline asm does not work.
>>>> - Placing the inline asm in a separate routine can sometimes mask the
>>>> problem with the trick not working.
>>>> - The sample that has been in the docs forever performs an unhelpful,
>>>> unexpected, and probably unwanted stack allocation + memcpy.
>>>>
>>>> Details:
>>>>
>>>> Here is the text from the docs:
>>>>
>>>> -----------
>>>> One trick to avoid [using the "memory" clobber] is available if the size
>>>> of
>>>> the memory being accessed is known at compile time. For example, if
>>>> accessing ten bytes of a string, use a memory input like:
>>>>
>>>> "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
>>>
>>>
>>> Well - this can't work because you essentially are using a _value_
>>> here (looking at the GIMPLE - I'm not sure if a statement expression
>>> evaluates to an lvalue.
>>>
>>> It should work if you simply do this without a stmt expression:
>>>
>>> "m" (*(struct { char x[10]; } *)ptr)
>>>
>>> because that's clearly an lvalue (and the GIMPLE correctly says so):
>>>
>>> <bb 2>:
>>> c.a = 1;
>>> c.b = 2;
>>> __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
>>> &c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
>>> printf ("%u %u\n", 1, 2);
>>>
>>> note that we still constant propagated 1 and 2 for the reason that
>>> the asm didn't get any VDEF. That's because you do not have any
>>> memory output! So while it keeps 'c' live it doesn't consider it
>>> modified by the asm. You'd still need to clobber the memory,
>>> but "m" clobbers are not supported, only "memory".
>>>
>>> Thus fixed asm:
>>>
>>>
>>> __asm__ __volatile__ ("rep; stosb"
>>> : "=D" (Dest), "+c" (Count)
>>> : "0" (&c), "a" (0),
>>> "m" (*( struct foo { char x[8]; } *)&c)
>>> : "memory"
>>> );
>>>
>>> where I'm not 100% sure if the "m" input is now pointless (that is,
>>> if a "memory" clobber also constitutes a use of all memory).
>>
>>
>> Or maybe even
>> __asm__ __volatile__ ("rep; stosb"
>> : "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; }
>> *)&c)
>> : "0" (&c), "a" (0)
>> );
>> to avoid the big-hammer memory clobber?
>>
>> -Y
>>
>
> Thank you both for the responses. At this point I've started composing some
> replacement text for the docs (below), cuz clearly what's there is both
> inadequate and wrong. All comments are welcome.
>
> While the code in Richard's response will always produce the correct
> results, the intent here is to use memory constraints to *avoid* the
> performance penalties of the memory clobber. The existing docs say this
> should work, and I've seen a number of places using it (linux kernel, glibc,
> etc). If this does work, we should update the docs to say how it's done.
> If this doesn't work, we should say that too.
>
> Looking at Yury's response, that code does work (in c, not c++). At least
> it does sometimes. The problem is that sometimes gcc can "lose track" of
> what a pointer points to. And when it does, gcc can get confused about what
> to flush. Here's a simple example that shows this (4.9.0, x64, -O2, 'c'):
>
> #include <stdio.h>
>
> inline void *memset( void * Dest, int c, size_t Count) {
> void *dummy;
>
> __asm__ (
> "rep stosb"
> : "=D" (dummy), "+c" (Count), "=m" (*( struct foo { char x[8]; }
> *)Dest)
> : "0" (Dest), "a" (c)
> : "cc"//, "memory"
> );
>
> return Dest;
> }
>
> int main() {
> struct {
> int a;
> } c;
>
> void *v;
> asm volatile("#" : "=r" (v) : "0" (&c) );
>
> c.a = 0x30303030;
>
> //memset(&c, 0, sizeof(c));
> memset(v, 0, sizeof(c));
> printf("%x\n", c.a);
> }
>
> This code will work if you pass &c to memset. But it will fail if you use
> v. Oh, the wonders of aliasing.
Yeah, in this case because *(struct foo { char x[8]; } *)Dest uses alias-set
'4' but c.a is accessed with alias-set '3'.
You can't expect a struct with a char array to end up using alias-set
zero (which probably was intended). You want
"=m" (*( struct foo { char x[8]; } __attribute__((may_alias)) *)Dest)
which makes it work. Or use *(int *)Dest as you are later accessing
it with int, so you even do not lose TBAA.
Richard.
> And this is why I'm having a problem doc'ing this. I love the potential
> benefits of using this. But if you are writing general-purpose routines,
> how can you hope to know whether the code calling you will pass you a
> pointer that will work with this trick? This kind of thing can introduce
> *horribly* hard to track down problems. Note that the memory clobber always
> works, but potentially with a performance penalty. <sigh>
>
> I could simply skip describing the whole problem with aliasing, but that
> just hides the problem. I hate when docs do that.
>
> That said, here's what I've written so far. Any improvements or corrections
> are very welcome.
>
> ==========================
> Flushing registers to memory has performance implications and may be an
> issue
> for time-sensitive code. One trick to avoid flushing more registers than is
> absolutely required is to use a memory constraint.
>
> For example this i386 code works for both c and c++:
>
> @example
> inline void *memset(void *Dest, int c, size_t Count)
> @{
> struct _reallybigstruct @{ char x[SSIZE_MAX]; @};
> void *dummy;
>
> asm (
> "rep stosb"
> : "=D" (dummy), "+c" (Count), "=m" (*(struct _reallybigstruct *)Dest)
> : "0" (Dest), "a" (c)
> : "cc"
> );
>
> return Dest;
> @}
> @end example
>
> Similar code can be used if the memory is being read (by changing the memory
> constraint to be an input) or updated (by changing the constraint to use
> '+' instead of '=').
>
> A few noteworthy things about clobbering using memory constraints:
>
> @itemize @minus
> @item
> Note that @code{_reallybigstruct} uses a size of @code{SSIZE_MAX}. This does
> not mean that when casting @code{Dest} to @code{(struct _reallybigstruct
> *)},
> @code{Dest} must be @code{SSIZE_MAX} bytes long or that @code{SSIZE_MAX}
> bytes will be used. Nor does it affect registers associated with items that
> may follow @code{Dest} in memory. This only affects the symbol @code{Dest}.
>
> @item
> It may be that using memory constraints can remove the need for the asm
> @code{volatile} qualifier. This can, under certain circumstances, result in
> more efficient code. Consider the memset sample above. Without the memory
> constraint, the only outputs are @code{dummy} and @code{Count}. However,
> both those symbols go out of scope immediately after the asm statement. As a
> result, gcc optimization could reasonably assume that the asm block isn't
> actually needed for anything. Even adding the "memory" clobber doesn't
> change this. Normally this is resolved by adding the asm @code{volatile}
> qualifier. However, by adding the memory constraint, gcc also considers
> whether updating the @code{Dest} block will affect the subsequent code.
> Only in the event that the @code{Dest} block is not needed will gcc consider
> removing this asm code.
>
> @end itemize
>
> @strong{Warning}: Older versions of this documentation show an example for
> using
> memory constraints like this:
>
> @example
> "m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )
> @end example
>
> This code not only doesn't compile in c++, it has potentially undesirable
> side effects that affect performance, as well as producing incorrect
> results.
>
> ==========================
>
> dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-09-26 14:47 ` Richard Biener
@ 2014-10-03 6:41 ` David Wohlferd
2014-10-03 8:07 ` Hans-Peter Nilsson
2014-10-06 11:09 ` Richard Biener
0 siblings, 2 replies; 17+ messages in thread
From: David Wohlferd @ 2014-10-03 6:41 UTC (permalink / raw)
To: richard.guenther; +Cc: y.gribov, gcc, hp, gerald
> You want
>
> "=m" (*( struct foo { char x[8]; } __attribute__((may_alias)) *)Dest)
Thank you. With your help, that worse-than-useless sample in the docs
is getting closer to something people can actually use.
Except for one last serious problem: This trick only works for very
specific (and very small) sizes of x (something else the existing docs
fail to mention).
On x64: Sizes of 1/2/4/8/16 correctly clobber Dest (and only Dest) as
expected. Trying to use any other values (5, 12, 32, etc) seem to force
a full memory clobber. This is the case REGARDLESS of the actual size
of the underlying structure. For example clobbering 16 bytes of a 12
byte struct only clobbers the 12 bytes of Dest (as expected), but
clobbering 12 bytes of a 12 byte struct performs a full memory clobber.
This presents a problem for general purpose routines like memset, where
the actual size of the block cannot be hardcoded into the struct.
glibc (which uses this trick to avoid using the memory clobber) uses a
size of 0xfffffff. Being larger than 16, this always causes a full
memory clobber, not the "memory constraint" clobber I assume they were
expecting. OTOH, since they don't use may_alias, this is may actually
be a good thing...
While I really like the idea of using memory constraints to avoid all
out memory clobbers, 16 bytes is a pretty small maximum memory block,
and x32 only supports a max of 8. Unless there's some way to use larger
sizes (say SSIZE_MAX), this feature hardly seems worth documenting.
dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-10-03 6:41 ` David Wohlferd
@ 2014-10-03 8:07 ` Hans-Peter Nilsson
2014-10-06 11:09 ` Richard Biener
1 sibling, 0 replies; 17+ messages in thread
From: Hans-Peter Nilsson @ 2014-10-03 8:07 UTC (permalink / raw)
To: David Wohlferd; +Cc: gcc
On Thu, 2 Oct 2014, David Wohlferd wrote:
> > You want
> >
> > "=m" (*( struct foo { char x[8]; } __attribute__((may_alias)) *)Dest)
>
> Thank you. With your help, that worse-than-useless sample in the docs
> is getting closer to something people can actually use.
Thank *you* for your investigation.
> While I really like the idea of using memory constraints to avoid all
> out memory clobbers, 16 bytes is a pretty small maximum memory block,
> and x32 only supports a max of 8. Unless there's some way to use larger
> sizes (say SSIZE_MAX), this feature hardly seems worth documenting.
Exactly. Now, with it documented and all, we're (should be) on
the hook to actually maintain that...feature, so we don't break
it now that the tweaked version works. The best practice is a
test-case in the test-suite. With all the required tweaks I
presume we don't have one. It seems grepping a gimple dump
works, with a scan-dump-based test (numerous examples; framework
is in place). Could you please consider adding one?
(Or a patch to remove the feature from the docs since it hasn't
had the intended effect for some long time - my not-so-secret
actual point...)
brgds, H-P
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-10-03 6:41 ` David Wohlferd
2014-10-03 8:07 ` Hans-Peter Nilsson
@ 2014-10-06 11:09 ` Richard Biener
2014-11-13 12:14 ` David Wohlferd
1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2014-10-06 11:09 UTC (permalink / raw)
To: David Wohlferd; +Cc: y.gribov, gcc, Hans-Peter Nilsson, Gerald Pfeifer
On Fri, Oct 3, 2014 at 8:41 AM, David Wohlferd <dw@limegreensocks.com> wrote:
>
>> You want
>>
>> "=m" (*( struct foo { char x[8]; } __attribute__((may_alias)) *)Dest)
>
>
> Thank you. With your help, that worse-than-useless sample in the docs is
> getting closer to something people can actually use.
>
> Except for one last serious problem: This trick only works for very
> specific (and very small) sizes of x (something else the existing docs fail
> to mention).
>
> On x64: Sizes of 1/2/4/8/16 correctly clobber Dest (and only Dest) as
> expected. Trying to use any other values (5, 12, 32, etc) seem to force a
> full memory clobber. This is the case REGARDLESS of the actual size of the
> underlying structure. For example clobbering 16 bytes of a 12 byte struct
> only clobbers the 12 bytes of Dest (as expected), but clobbering 12 bytes of
> a 12 byte struct performs a full memory clobber. This presents a problem
> for general purpose routines like memset, where the actual size of the block
> cannot be hardcoded into the struct.
>
> glibc (which uses this trick to avoid using the memory clobber) uses a size
> of 0xfffffff. Being larger than 16, this always causes a full memory
> clobber, not the "memory constraint" clobber I assume they were expecting.
> OTOH, since they don't use may_alias, this is may actually be a good
> thing...
>
> While I really like the idea of using memory constraints to avoid all out
> memory clobbers, 16 bytes is a pretty small maximum memory block, and x32
> only supports a max of 8. Unless there's some way to use larger sizes (say
> SSIZE_MAX), this feature hardly seems worth documenting.
I wonder how you figured out that a 12 byte clobber performs a full
memory clobber? I'm quite sure it does not.
Richard.
> dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-10-06 11:09 ` Richard Biener
@ 2014-11-13 12:14 ` David Wohlferd
2014-11-13 13:08 ` Richard Biener
2014-11-13 14:02 ` Hans-Peter Nilsson
0 siblings, 2 replies; 17+ messages in thread
From: David Wohlferd @ 2014-11-13 12:14 UTC (permalink / raw)
To: Richard Biener, Hans-Peter Nilsson; +Cc: y.gribov, gcc, Gerald Pfeifer
Sorry for the (very) delayed response. I'm still looking for feedback
here so I can fix the docs.
To refresh: The topic of conversation was the (extremely) wrong
explanation that has been in the docs since forever about how to use
memory constraints with inline asm to avoid the performance hit of a
full memory clobber. Trying to understand how this really works has led
to some surprising results.
Me:
>> While I really like the idea of using memory constraints to avoid
all out
>> memory clobbers, 16 bytes is a pretty small maximum memory block,
and x86
>> only supports a max of 8. Unless there's some way to use larger
sizes (say
>> SSIZE_MAX), this feature hardly seems worth documenting.
Richard:
> I wonder how you figured out that a 12 byte clobber performs a full
> memory clobber?
Here's the code (compiled with gcc version 4.9.0 x86_64-win32-seh-rev2,
using -m64 -O2 -fdump-final-insns):
--------------------
#include <stdio.h>
#define MYSIZE 3
inline void
__stosb(unsigned char *Dest, unsigned char Data, size_t Count)
{
struct _reallybigstruct { char x[MYSIZE]; }
*p = (struct _reallybigstruct *)Dest;
__asm__ __volatile__ ("rep stos{b|b}"
: "+D" (Dest), "+c" (Count), "=m" (*p)
: [Data] "a" (Data)
//: "memory"
);
}
int main()
{
unsigned char buff[100];
buff[5] = 'A';
__stosb(buff, 'B', sizeof(buff));
printf("%c\n", buff[5]);
}
--------------------
In summary:
1) Create a 100 byte buffer, and set buff[5] to 'A'.
2) Call __stosb, which uses inline asm to overwrite all of buff with
'B'.
3) Use a memory constraint in __stosb to flush buff. The size of
the memory constraint is controlled by a #define.
With this, I have a simple way to test various sizes of memory
constraints to see if the buffer gets flushed. If it *is* flushing the
buffer, printing buff[5] after __stosb will print 'B'. If it is *not*
flushing, it will print 'A'.
Results:
- Since buff[5] is the 6th byte in the buffer, using memory
constraint sizes of 1, 2 & 4 (not surprisingly) all print 'A', showing
that no flush was done.
- Sizes of 8 and 16 print 'B', showing that the flush was done. This
is also the expected result, since I am now flushing enough of buff to
include buff[5].
- The surprise comes from using a size of 3 or 5. These also print
'B'. WTF? If 4 doesn't flush, why does 3?
I believe the answer comes from reading the RTL. The difference between
sizes of 3 and 16 comes here:
(set (mem/c:TI (plus:DI (reg/f:DI 7 sp)
(const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0
S16 A128])
(asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [
(set (mem/c:BLK (plus:DI (reg/f:DI 7 sp)
(const_int 32 [0x20])) [ MEM[(struct _reallybigstruct
*)&buff]+0 S3 A128])
(asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [
While I don't actually speak RTL, TI clearly refers to TIMode.
Apparently when using a size that exactly matches a machine mode, asm
memory references (on i386) can flush the exact number of bytes. But
for other sizes, gcc seems to falls back to BLK mode, which doesn't.
I don't know the exact meaning of BLK on a "set" or "asm_operands." Does
it cause a full clobber? Or just a complete clobber of buff? Attempting
to answer that question leads us to the second bit of code:
--------------------
#include <stdio.h>
#define MYSIZE 8
inline void
__stosb(unsigned char *Dest, unsigned char Data, size_t Count)
{
struct _reallybigstruct { char x[MYSIZE]; }
*p = (struct _reallybigstruct *)Dest;
__asm__ __volatile__ ("rep stos{b|b}"
: "+D" (Dest), "+c" (Count), "=m" (*p)
: [Data] "a" (Data)
//: "memory"
);
}
int main()
{
unsigned char buff[100], buff2[100];
buff[5] = 'A';
buff2[5] = 'M';
asm("#" : : "r" (buff2));
__stosb(buff, 'B', sizeof(buff));
printf("%c %c\n", buff[5], buff2[5]);
}
--------------------
Here I've added a buff2, and I set buff2[5] to 'M' (aka ascii 77), which
I also print. I still perform the memory constraint against buff, then
I check to see if it is affecting buff2.
I start by compiling this with a size of 8 and look at the -S output.
If this is NOT doing a full clobber, gcc should be able to just print
buff2[5] by moving 77 into the appropriate register before calling
printf. And indeed, that's what we see.
/APP
# 17 "mem2.cpp" 1
rep stosb
# 0 "" 2
/NO_APP
movzbl 37(%rsp), %edx
movl $77, %r8d
leaq .LC0(%rip), %rcx
call printf
If using a size of 3 *is* causing a full memory clobber, we would expect
to see the value getting read from memory before calling printf. And
indeed, that's also what we see.
/APP
# 17 "mem2.cpp" 1
rep stosb
# 0 "" 2
/NO_APP
movzbl 37(%rsp), %edx
leaq .LC0(%rip), %rcx
movzbl 149(%rsp), %r8d
I don't know the internals of gcc well enough to understand exactly why
this is happening. But from a user's point of view, it sure looks like
a memory clobber.
As I said before, triggering a full memory clobber for anything over 16
bytes (and most sizes under 16 bytes) makes this feature all but
useless. So if that's really what's happening, we need to decide what
to do next:
1) Can this be "fixed?"
2) Do we want to doc the current behavior?
3) Or do we just remove this section?
I think it could be a nice performance win for inline asm if it could be
made to work right, but I have no idea what might be involved in that.
Failing that, I guess if it doesn't work and isn't going to work, I'd
recommend removing the text for this feature.
Since all 3 suggestions require a doc change, I'll just say that I'm
prepared to start work on the doc patch as soon as someone lets me know
what the plan is.
Richard? Hans-Peter? Your thoughts?
Thanks,
dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-11-13 12:14 ` David Wohlferd
@ 2014-11-13 13:08 ` Richard Biener
2014-11-14 10:05 ` Segher Boessenkool
2014-11-13 14:02 ` Hans-Peter Nilsson
1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2014-11-13 13:08 UTC (permalink / raw)
To: David Wohlferd; +Cc: Hans-Peter Nilsson, y.gribov, gcc, Gerald Pfeifer
On Thu, Nov 13, 2014 at 1:03 PM, David Wohlferd <dw@limegreensocks.com> wrote:
> Sorry for the (very) delayed response. I'm still looking for feedback here
> so I can fix the docs.
>
> To refresh: The topic of conversation was the (extremely) wrong explanation
> that has been in the docs since forever about how to use memory constraints
> with inline asm to avoid the performance hit of a full memory clobber.
> Trying to understand how this really works has led to some surprising
> results.
>
> Me:
>>> While I really like the idea of using memory constraints to avoid all out
>>> memory clobbers, 16 bytes is a pretty small maximum memory block, and x86
>>> only supports a max of 8. Unless there's some way to use larger sizes
>>> (say
>>> SSIZE_MAX), this feature hardly seems worth documenting.
>
> Richard:
>> I wonder how you figured out that a 12 byte clobber performs a full
>> memory clobber?
>
> Here's the code (compiled with gcc version 4.9.0 x86_64-win32-seh-rev2,
> using -m64 -O2 -fdump-final-insns):
>
> --------------------
> #include <stdio.h>
>
> #define MYSIZE 3
>
> inline void
> __stosb(unsigned char *Dest, unsigned char Data, size_t Count)
> {
> struct _reallybigstruct { char x[MYSIZE]; }
> *p = (struct _reallybigstruct *)Dest;
>
> __asm__ __volatile__ ("rep stos{b|b}"
> : "+D" (Dest), "+c" (Count), "=m" (*p)
> : [Data] "a" (Data)
> //: "memory"
> );
> }
>
> int main()
> {
> unsigned char buff[100];
> buff[5] = 'A';
>
> __stosb(buff, 'B', sizeof(buff));
> printf("%c\n", buff[5]);
> }
> --------------------
>
> In summary:
>
> 1) Create a 100 byte buffer, and set buff[5] to 'A'.
> 2) Call __stosb, which uses inline asm to overwrite all of buff with 'B'.
> 3) Use a memory constraint in __stosb to flush buff. The size of the
> memory constraint is controlled by a #define.
>
> With this, I have a simple way to test various sizes of memory constraints
> to see if the buffer gets flushed. If it *is* flushing the buffer, printing
> buff[5] after __stosb will print 'B'. If it is *not* flushing, it will
> print 'A'.
>
> Results:
> - Since buff[5] is the 6th byte in the buffer, using memory constraint
> sizes of 1, 2 & 4 (not surprisingly) all print 'A', showing that no flush
> was done.
> - Sizes of 8 and 16 print 'B', showing that the flush was done. This is
> also the expected result, since I am now flushing enough of buff to include
> buff[5].
> - The surprise comes from using a size of 3 or 5. These also print 'B'.
> WTF? If 4 doesn't flush, why does 3?
>
> I believe the answer comes from reading the RTL. The difference between
> sizes of 3 and 16 comes here:
>
> (set (mem/c:TI (plus:DI (reg/f:DI 7 sp)
> (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S16
> A128])
> (asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [
>
> (set (mem/c:BLK (plus:DI (reg/f:DI 7 sp)
> (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S3
> A128])
> (asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [
>
> While I don't actually speak RTL, TI clearly refers to TIMode. Apparently
> when using a size that exactly matches a machine mode, asm memory references
> (on i386) can flush the exact number of bytes. But for other sizes, gcc
> seems to falls back to BLK mode, which doesn't.
>
> I don't know the exact meaning of BLK on a "set" or "asm_operands." Does it
> cause a full clobber? Or just a complete clobber of buff? Attempting to
> answer that question leads us to the second bit of code:
>
> --------------------
> #include <stdio.h>
>
> #define MYSIZE 8
>
> inline void
> __stosb(unsigned char *Dest, unsigned char Data, size_t Count)
> {
> struct _reallybigstruct { char x[MYSIZE]; }
> *p = (struct _reallybigstruct *)Dest;
>
> __asm__ __volatile__ ("rep stos{b|b}"
> : "+D" (Dest), "+c" (Count), "=m" (*p)
> : [Data] "a" (Data)
> //: "memory"
> );
> }
> int main()
> {
> unsigned char buff[100], buff2[100];
> buff[5] = 'A';
> buff2[5] = 'M';
> asm("#" : : "r" (buff2));
>
> __stosb(buff, 'B', sizeof(buff));
> printf("%c %c\n", buff[5], buff2[5]);
> }
> --------------------
>
> Here I've added a buff2, and I set buff2[5] to 'M' (aka ascii 77), which I
> also print. I still perform the memory constraint against buff, then I
> check to see if it is affecting buff2.
>
> I start by compiling this with a size of 8 and look at the -S output. If
> this is NOT doing a full clobber, gcc should be able to just print buff2[5]
> by moving 77 into the appropriate register before calling printf. And
> indeed, that's what we see.
>
> /APP
> # 17 "mem2.cpp" 1
> rep stosb
> # 0 "" 2
> /NO_APP
> movzbl 37(%rsp), %edx
> movl $77, %r8d
> leaq .LC0(%rip), %rcx
> call printf
>
> If using a size of 3 *is* causing a full memory clobber, we would expect to
> see the value getting read from memory before calling printf. And indeed,
> that's also what we see.
>
> /APP
> # 17 "mem2.cpp" 1
> rep stosb
> # 0 "" 2
> /NO_APP
> movzbl 37(%rsp), %edx
> leaq .LC0(%rip), %rcx
> movzbl 149(%rsp), %r8d
>
> I don't know the internals of gcc well enough to understand exactly why this
> is happening. But from a user's point of view, it sure looks like a memory
> clobber.
>
> As I said before, triggering a full memory clobber for anything over 16
> bytes (and most sizes under 16 bytes) makes this feature all but useless.
> So if that's really what's happening, we need to decide what to do next:
>
> 1) Can this be "fixed?"
> 2) Do we want to doc the current behavior?
> 3) Or do we just remove this section?
>
> I think it could be a nice performance win for inline asm if it could be
> made to work right, but I have no idea what might be involved in that.
> Failing that, I guess if it doesn't work and isn't going to work, I'd
> recommend removing the text for this feature.
>
> Since all 3 suggestions require a doc change, I'll just say that I'm
> prepared to start work on the doc patch as soon as someone lets me know what
> the plan is.
>
> Richard? Hans-Peter? Your thoughts?
Just from a very very quick look you miss:
> (set (mem/c:TI (plus:DI (reg/f:DI 7 sp)
> (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S16
> A128])
> (asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [
>
> (set (mem/c:BLK (plus:DI (reg/f:DI 7 sp)
> (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S3
> A128])
> (asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [
The memory attributes - the first one has 'S16' (size 16 bytes), the
2nd has 'S3' (size 3 bytes). So the information is clearly there.
It might be that RTL alias analysis / CSE give up too early here
(we don't optimize across asm() on the GIMPLE level at all ... heh).
I didn't look where it gives up (even though appearantly it does).
Richard.
> Thanks,
> dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-11-13 12:14 ` David Wohlferd
2014-11-13 13:08 ` Richard Biener
@ 2014-11-13 14:02 ` Hans-Peter Nilsson
2014-11-13 14:15 ` Richard Biener
1 sibling, 1 reply; 17+ messages in thread
From: Hans-Peter Nilsson @ 2014-11-13 14:02 UTC (permalink / raw)
To: David Wohlferd; +Cc: Richard Biener, y.gribov, gcc, Gerald Pfeifer
On Thu, 13 Nov 2014, David Wohlferd wrote:
> Sorry for the (very) delayed response. I'm still looking for feedback here so
> I can fix the docs.
Thank you for your diligence.
> As I said before, triggering a full memory clobber for anything over 16 bytes
> (and most sizes under 16 bytes) makes this feature all but useless. So if
> that's really what's happening, we need to decide what to do next:
>
> 1) Can this be "fixed?"
> 2) Do we want to doc the current behavior?
> 3) Or do we just remove this section?
>
> I think it could be a nice performance win for inline asm if it could be made
> to work right, but I have no idea what might be involved in that. Failing
> that, I guess if it doesn't work and isn't going to work, I'd recommend
> removing the text for this feature.
>
> Since all 3 suggestions require a doc change, I'll just say that I'm prepared
> to start work on the doc patch as soon as someone lets me know what the plan
> is.
>
> Richard? Hans-Peter? Your thoughts?
I've forgot if someone mentioned whether we have a test-case in
our test-suite for this feature. If we don't, then 3; removal.
If we do, I guess it's flawed or at least not agreeing with the
documentation? Then it *might* be worth the effort fixing that
and additional test-coverage (depending on the person stepping
up...) but 3 is IMHO still an arguably sane option.
brgds, H-P
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-11-13 14:02 ` Hans-Peter Nilsson
@ 2014-11-13 14:15 ` Richard Biener
2014-11-14 1:18 ` David Wohlferd
0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2014-11-13 14:15 UTC (permalink / raw)
To: Hans-Peter Nilsson; +Cc: David Wohlferd, y.gribov, gcc, Gerald Pfeifer
On Thu, Nov 13, 2014 at 2:53 PM, Hans-Peter Nilsson <hp@bitrange.com> wrote:
> On Thu, 13 Nov 2014, David Wohlferd wrote:
>> Sorry for the (very) delayed response. I'm still looking for feedback here so
>> I can fix the docs.
>
> Thank you for your diligence.
>
>> As I said before, triggering a full memory clobber for anything over 16 bytes
>> (and most sizes under 16 bytes) makes this feature all but useless. So if
>> that's really what's happening, we need to decide what to do next:
>>
>> 1) Can this be "fixed?"
>> 2) Do we want to doc the current behavior?
>> 3) Or do we just remove this section?
>>
>> I think it could be a nice performance win for inline asm if it could be made
>> to work right, but I have no idea what might be involved in that. Failing
>> that, I guess if it doesn't work and isn't going to work, I'd recommend
>> removing the text for this feature.
>>
>> Since all 3 suggestions require a doc change, I'll just say that I'm prepared
>> to start work on the doc patch as soon as someone lets me know what the plan
>> is.
>>
>> Richard? Hans-Peter? Your thoughts?
>
> I've forgot if someone mentioned whether we have a test-case in
> our test-suite for this feature. If we don't, then 3; removal.
> If we do, I guess it's flawed or at least not agreeing with the
> documentation? Then it *might* be worth the effort fixing that
> and additional test-coverage (depending on the person stepping
> up...) but 3 is IMHO still an arguably sane option.
Well, as what is missing is just an optimization I'd say we should
try to fix it. And surely the docs should not promise that optimization
will happen - it should just mention that doing this might allow
optimization to happen.
Richard.
> brgds, H-P
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-11-13 14:15 ` Richard Biener
@ 2014-11-14 1:18 ` David Wohlferd
2014-11-14 9:56 ` dw
2014-11-14 10:15 ` Richard Biener
0 siblings, 2 replies; 17+ messages in thread
From: David Wohlferd @ 2014-11-14 1:18 UTC (permalink / raw)
To: Richard Biener, Hans-Peter Nilsson; +Cc: y.gribov, gcc, Gerald Pfeifer
On 11/13/2014 6:02 AM, Richard Biener wrote:
> On Thu, Nov 13, 2014 at 2:53 PM, Hans-Peter Nilsson <hp@bitrange.com> wrote:
>> On Thu, 13 Nov 2014, David Wohlferd wrote:
>>> Sorry for the (very) delayed response. I'm still looking for feedback here so
>>> I can fix the docs.
>> Thank you for your diligence.
>>
>>> As I said before, triggering a full memory clobber for anything over 16 bytes
>>> (and most sizes under 16 bytes) makes this feature all but useless. So if
>>> that's really what's happening, we need to decide what to do next:
>>>
>>> 1) Can this be "fixed?"
>>> 2) Do we want to doc the current behavior?
>>> 3) Or do we just remove this section?
>>>
>>> I think it could be a nice performance win for inline asm if it could be made
>>> to work right, but I have no idea what might be involved in that. Failing
>>> that, I guess if it doesn't work and isn't going to work, I'd recommend
>>> removing the text for this feature.
>>>
>>> Since all 3 suggestions require a doc change, I'll just say that I'm prepared
>>> to start work on the doc patch as soon as someone lets me know what the plan
>>> is.
>>>
>>> Richard? Hans-Peter? Your thoughts?
>> I've forgot if someone mentioned whether we have a test-case in
>> our test-suite for this feature.
I'm looking thru gcc/testsuite/*.c to see if I can spot anything. It's
not easy since there is a lot of asm and the people who write these are
apparently allergic to using comments to describe what they are testing.
>> If we don't, then 3; removal.
>> If we do, I guess it's flawed or at least not agreeing with the
>> documentation? Then it *might* be worth the effort fixing that
>> and additional test-coverage (depending on the person stepping
>> up...) but 3 is IMHO still an arguably sane option.
> Well, as what is missing is just an optimization I'd say we should
> try to fix it.
While I'd love to be the one to fix this, the fact of the matter is that
most of gcc is a black box to me. Even if you told me roughly where to
start, I'd have no idea of the downstream impacts of anything I changed.
So while I understand that it looks like I'm just finding work for other
people, fixing something like this is simply beyond me. That said, I'm
certainly prepared to outline what I see as the interesting test cases
and to do some testing if someone else is willing to step up and do this
optimization.
> And surely the docs should not promise that optimization
> will happen - it should just mention that doing this might allow
> optimization to happen.
I can agree with this. I am quite confident there will be occasions
where gcc has no option but to fall back to doing a full clobber to
ensure correct function (a possibility which the current docs also fails
to mention). So yes, the docs should be limited in what it promises here.
Which brings us to the question: what do we do now? The 15th is fast
approaching. Can something like this get done before then? Can it be
checked in for 5.0 after the 15th? Or does it need to wait for 6.0?
If it does need to wait for 6.0, what do we want to do with the docs in
the meantime? Given how wrong they are currently, I'd hate to ship yet
another release with that ugly text. But trying to describe the best
way to take advantage of optimizations that haven't been written yet
is... hard.
Since (as I understand it) 5.0 docs *can* be checked in after the 15th,
my recommendations:
- If someone is prepared to step up and do this work for v5.0, then
I'll wait and write the docs when they are done and can describe how it
works.
- If this is going to wait for 6.0, then if someone does (at least)
enough investigative work to be able to describe how this will
eventually work, I'll update the 5.0 docs in a general way talking about
ways gcc *may* be able to optimize. It should be possible to phrase
this so code people write today will work even better tomorrow.
- Worst case is if no one has the time to look at this for the
foreseeable future. In that case, I'm with Hans-Peter. Let's take the
existing text out. Following the existing text makes things *worse*,
and the current implementation is so limited that I'd be surprised if
anyone's code actually uses it successfully. New text can get added
when the new code is.
Hmm. I just had a thought: Is it possible the problem I'm seeing here
is platform-specific? Maybe this works perfectly for non-i386 code?
That would certainly change my recommendations here.
dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-11-14 1:18 ` David Wohlferd
@ 2014-11-14 9:56 ` dw
2014-11-14 10:15 ` Richard Biener
1 sibling, 0 replies; 17+ messages in thread
From: dw @ 2014-11-14 9:56 UTC (permalink / raw)
To: Richard Biener, Hans-Peter Nilsson; +Cc: y.gribov, gcc, Gerald Pfeifer
>>> I've forgot if someone mentioned whether we have a test-case in
>>> our test-suite for this feature.
>
> I'm looking thru gcc/testsuite/*.c to see if I can spot anything. It's
> not easy since there is a lot of asm and the people who write these
> are apparently allergic to using comments to describe what they are
> testing.
So, I found a few tests that were *using* this feature. But they seem
to be checking for an ICE or page fault, rather than checking to see if
the generated code was avoiding the memory clobber.
dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-11-13 13:08 ` Richard Biener
@ 2014-11-14 10:05 ` Segher Boessenkool
0 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2014-11-14 10:05 UTC (permalink / raw)
To: Richard Biener
Cc: David Wohlferd, Hans-Peter Nilsson, y.gribov, gcc, Gerald Pfeifer
On Thu, Nov 13, 2014 at 01:14:41PM +0100, Richard Biener wrote:
> It might be that RTL alias analysis / CSE give up too early here
> (we don't optimize across asm() on the GIMPLE level at all ... heh).
>
> I didn't look where it gives up (even though appearantly it does).
The cse1 pass doesn't carry the store forward in the "3" case, while it
does in the "4" case. So yes it is just a missed optimisation.
Segher
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-11-14 1:18 ` David Wohlferd
2014-11-14 9:56 ` dw
@ 2014-11-14 10:15 ` Richard Biener
2014-11-16 15:51 ` David Wohlferd
1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2014-11-14 10:15 UTC (permalink / raw)
To: David Wohlferd; +Cc: Hans-Peter Nilsson, y.gribov, gcc, Gerald Pfeifer
On Fri, Nov 14, 2014 at 12:51 AM, David Wohlferd <dw@limegreensocks.com> wrote:
>
> On 11/13/2014 6:02 AM, Richard Biener wrote:
>>
>> On Thu, Nov 13, 2014 at 2:53 PM, Hans-Peter Nilsson <hp@bitrange.com>
>> wrote:
>>>
>>> On Thu, 13 Nov 2014, David Wohlferd wrote:
>>>>
>>>> Sorry for the (very) delayed response. I'm still looking for feedback
>>>> here so
>>>> I can fix the docs.
>>>
>>> Thank you for your diligence.
>>>
>>>> As I said before, triggering a full memory clobber for anything over 16
>>>> bytes
>>>> (and most sizes under 16 bytes) makes this feature all but useless. So
>>>> if
>>>> that's really what's happening, we need to decide what to do next:
>>>>
>>>> 1) Can this be "fixed?"
>>>> 2) Do we want to doc the current behavior?
>>>> 3) Or do we just remove this section?
>>>>
>>>> I think it could be a nice performance win for inline asm if it could be
>>>> made
>>>> to work right, but I have no idea what might be involved in that.
>>>> Failing
>>>> that, I guess if it doesn't work and isn't going to work, I'd recommend
>>>> removing the text for this feature.
>>>>
>>>> Since all 3 suggestions require a doc change, I'll just say that I'm
>>>> prepared
>>>> to start work on the doc patch as soon as someone lets me know what the
>>>> plan
>>>> is.
>>>>
>>>> Richard? Hans-Peter? Your thoughts?
>>>
>>> I've forgot if someone mentioned whether we have a test-case in
>>> our test-suite for this feature.
>
>
> I'm looking thru gcc/testsuite/*.c to see if I can spot anything. It's not
> easy since there is a lot of asm and the people who write these are
> apparently allergic to using comments to describe what they are testing.
>
>>> If we don't, then 3; removal.
>>> If we do, I guess it's flawed or at least not agreeing with the
>>> documentation? Then it *might* be worth the effort fixing that
>>> and additional test-coverage (depending on the person stepping
>>> up...) but 3 is IMHO still an arguably sane option.
>>
>> Well, as what is missing is just an optimization I'd say we should
>> try to fix it.
>
>
> While I'd love to be the one to fix this, the fact of the matter is that
> most of gcc is a black box to me. Even if you told me roughly where to
> start, I'd have no idea of the downstream impacts of anything I changed.
>
> So while I understand that it looks like I'm just finding work for other
> people, fixing something like this is simply beyond me. That said, I'm
> certainly prepared to outline what I see as the interesting test cases and
> to do some testing if someone else is willing to step up and do this
> optimization.
>
>> And surely the docs should not promise that optimization
>> will happen - it should just mention that doing this might allow
>> optimization to happen.
>
>
> I can agree with this. I am quite confident there will be occasions where
> gcc has no option but to fall back to doing a full clobber to ensure correct
> function (a possibility which the current docs also fails to mention). So
> yes, the docs should be limited in what it promises here.
>
> Which brings us to the question: what do we do now? The 15th is fast
> approaching. Can something like this get done before then? Can it be
> checked in for 5.0 after the 15th? Or does it need to wait for 6.0?
Can you please file a bug in bugzilla if you haven't already done so?
We can still fix bugs during the next three months ;)
> If it does need to wait for 6.0, what do we want to do with the docs in the
> meantime? Given how wrong they are currently, I'd hate to ship yet another
> release with that ugly text. But trying to describe the best way to take
> advantage of optimizations that haven't been written yet is... hard.
>
> Since (as I understand it) 5.0 docs *can* be checked in after the 15th, my
> recommendations:
>
> - If someone is prepared to step up and do this work for v5.0, then I'll
> wait and write the docs when they are done and can describe how it works.
We can at least improve the docs and fix the bogus examples therein.
> - If this is going to wait for 6.0, then if someone does (at least) enough
> investigative work to be able to describe how this will eventually work,
> I'll update the 5.0 docs in a general way talking about ways gcc *may* be
> able to optimize. It should be possible to phrase this so code people write
> today will work even better tomorrow.
> - Worst case is if no one has the time to look at this for the foreseeable
> future. In that case, I'm with Hans-Peter. Let's take the existing text
> out. Following the existing text makes things *worse*, and the current
> implementation is so limited that I'd be surprised if anyone's code actually
> uses it successfully. New text can get added when the new code is.
>
> Hmm. I just had a thought: Is it possible the problem I'm seeing here is
> platform-specific? Maybe this works perfectly for non-i386 code? That
> would certainly change my recommendations here.
I'd say that's highly unlikely.
Richard.
> dw
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
2014-11-14 10:15 ` Richard Biener
@ 2014-11-16 15:51 ` David Wohlferd
0 siblings, 0 replies; 17+ messages in thread
From: David Wohlferd @ 2014-11-16 15:51 UTC (permalink / raw)
To: Richard Biener; +Cc: Hans-Peter Nilsson, y.gribov, gcc, Gerald Pfeifer
> Can you please file a bug in bugzilla if you haven't already done so?
> We can still fix bugs during the next three months ;)
Well, I opened one. Briefly.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63900
dw
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-11-16 3:04 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-24 8:11 [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work David Wohlferd
2014-09-24 8:48 ` Richard Biener
2014-09-25 20:48 ` Yury Gribov
2014-09-26 10:50 ` David Wohlferd
2014-09-26 14:47 ` Richard Biener
2014-10-03 6:41 ` David Wohlferd
2014-10-03 8:07 ` Hans-Peter Nilsson
2014-10-06 11:09 ` Richard Biener
2014-11-13 12:14 ` David Wohlferd
2014-11-13 13:08 ` Richard Biener
2014-11-14 10:05 ` Segher Boessenkool
2014-11-13 14:02 ` Hans-Peter Nilsson
2014-11-13 14:15 ` Richard Biener
2014-11-14 1:18 ` David Wohlferd
2014-11-14 9:56 ` dw
2014-11-14 10:15 ` Richard Biener
2014-11-16 15:51 ` David Wohlferd
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).