Question about __builtin_ia32_mfence and memory barriers

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* Question about __builtin_ia32_mfence and memory barriers
@ 2013-06-04 22:58 dw
  2013-06-04 23:52 ` Ian Lance Taylor
  0 siblings, 1 reply; 10+ messages in thread
From: dw @ 2013-06-04 22:58 UTC (permalink / raw)
  To: gcc-help

The discussion below assumes 64bit code on an i386 processor.

My understanding is that the way to do a memory barrier in gcc is:

     asm ("" ::: "memory");

This creates a ReadWriteBarrier, but no processor fence.  To create a 
processor fence, you could do something like

     __builtin_ia32_mfence();

This will generate an mfence instruction, but (assembly code inspection 
suggests) no memory barrier.  I thought about just putting one after the 
other:

     asm ("" ::: "memory");
__builtin_ia32_mfence();

And this leads to my questions:

1) Am I right that __builtin_ia32_mfence() does not generate a memory 
barrier?
1) Is this "two statement thing" guaranteed to be safe?  Could the 
optimizer re-order instructions moving code in between the two? (Yes, I 
realize that the asm statement doesn't actually generate any output.  
But given my understanding of how the compiler processes code, I believe 
the question is still valid).
2) If it is not guaranteed to be safe, what is the use of 
__builtin_ia32_mfence()?  What value is there in preventing the 
*processor* from executing statements out of order if the *compiler* is 
just going to move them around?

I expect this would always work:

     asm ("mfence" ::: "memory");

But I would rather use the builtins if possible.

dw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-04 22:58 Question about __builtin_ia32_mfence and memory barriers dw
@ 2013-06-04 23:52 ` Ian Lance Taylor
  2013-06-05  2:45   ` dw
  2013-06-12  8:15   ` dw
  0 siblings, 2 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2013-06-04 23:52 UTC (permalink / raw)
  To: dw; +Cc: gcc-help

On Tue, Jun 4, 2013 at 3:58 PM, dw <limegreensocks@yahoo.com> wrote:
>
> To create a
> processor fence, you could do something like
>
>     __builtin_ia32_mfence();

A better choice these days is __atomic_thread_fence(__ATOMIC_SEQ_CST)
(or __atomic_signal_fence).

> 1) Am I right that __builtin_ia32_mfence() does not generate a memory
> barrier?

That is correct: it does not prevent the compiler from moving loads
and stores across the call to __builtin_ia32_mfence.

> 1) Is this "two statement thing" guaranteed to be safe?  Could the optimizer
> re-order instructions moving code in between the two? (Yes, I realize that
> the asm statement doesn't actually generate any output.  But given my
> understanding of how the compiler processes code, I believe the question is
> still valid).

It is probably safe, because why would the compiler put anything in
there, but it is not absolutely guaranteed to be safe.

> 2) If it is not guaranteed to be safe, what is the use of
> __builtin_ia32_mfence()?  What value is there in preventing the *processor*
> from executing statements out of order if the *compiler* is just going to
> move them around?

__builtin_ia32_mfence exists to support the Intel documented
_mm_mfence intrinsic.  I'm not clear on whether _mm_mfence is meant to
be a compiler memory barrier or not.  If it is, then I think GCC has a
bug in the way it is implemented.  Please feel free to file a bug
report at http://gcc.gnu.org/bugzilla/ , especially if you can come up
with a case that fails.

> I expect this would always work:
>
>     asm ("mfence" ::: "memory");
>
> But I would rather use the builtins if possible.

Yes, you should use the builtins.  The __atomic builtins, which work
better and are portable across processors.

Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-04 23:52 ` Ian Lance Taylor
@ 2013-06-05  2:45   ` dw
  2013-06-05  4:30     ` Ian Lance Taylor
  2013-06-12  8:15   ` dw
  1 sibling, 1 reply; 10+ messages in thread
From: dw @ 2013-06-05  2:45 UTC (permalink / raw)
  To: gcc-help

 > A better choice these days is __atomic_thread_fence(__ATOMIC_SEQ_CST)
 > (or __atomic_signal_fence).

This sounded so promising. Unfortunately, it's not producing the results 
I need.  I can put all these statements in the code, and none of them 
generate -any- fence instruction:

     __atomic_thread_fence(__ATOMIC_RELAXED);
     __atomic_thread_fence(__ATOMIC_CONSUME);
     __atomic_thread_fence(__ATOMIC_ACQUIRE);
     __atomic_thread_fence(__ATOMIC_RELEASE);
     __atomic_thread_fence(__ATOMIC_ACQ_REL);

     __atomic_signal_fence(__ATOMIC_RELAXED);
     __atomic_signal_fence(__ATOMIC_CONSUME);
     __atomic_signal_fence(__ATOMIC_ACQUIRE);
     __atomic_signal_fence(__ATOMIC_RELEASE);
     __atomic_signal_fence(__ATOMIC_ACQ_REL);
     __atomic_signal_fence(__ATOMIC_SEQ_CST);

And while I get an mfence instruction with this:

     __atomic_thread_fence(__ATOMIC_SEQ_CST);

It doesn't produce quite the same instruction ordering as:

   asm volatile ("mfence" ::: "memory");

Which makes me think that whatever __ATOMIC_SEQ_CST means, it's not the 
same as the "memory" clobber.  Also, I'm looking to support SFENCE and 
LFENCE, which these don't appear to support at all.

 > I'm not clear on whether _mm_mfence is meant to be a compiler memory 
barrier or not.

Every authoritative reference I have found is maddeningly silent on this 
point.

However, I have tried compiling x64 code with MSVC, and the instruction 
ordering it produces for _mm_mfence is not the same as what it produces 
for _mm_sfence.  In fact, the asm produced when using _mm_sfence bears a 
striking similarity to what you get with just _WriteBarrier (minus the 
sfence instruction, of course), and _mm_mfence looks like _ReadWriteBarrier.

While I'm not prepared to call this conclusive evidence, it is becoming 
suspicious.

And apparently I'm not the only person who thinks there is a problem 
here 
(http://doxygen.reactos.org/dd/dcb/intrin__x86_8h_a0dee6d755a43d9f9d8072d6202b487db.html#a0dee6d755a43d9f9d8072d6202b487db). 
I was concerned about using 2 statements and hoping the compiler didn't 
re-order any code around them.  I'm not convinced that 3 statements 
makes me feel any better.

dw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-05  2:45   ` dw
@ 2013-06-05  4:30     ` Ian Lance Taylor
  0 siblings, 0 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2013-06-05  4:30 UTC (permalink / raw)
  To: dw; +Cc: gcc-help

On Tue, Jun 4, 2013 at 7:45 PM, dw <limegreensocks@yahoo.com> wrote:

> And while I get an mfence instruction with this:
>
>     __atomic_thread_fence(__ATOMIC_SEQ_CST);
>
> It doesn't produce quite the same instruction ordering as:
>
>   asm volatile ("mfence" ::: "memory");
>
> Which makes me think that whatever __ATOMIC_SEQ_CST means, it's not the same
> as the "memory" clobber.

It's not the same as the "memory" clobber, but it should have the
effect of providing both an mfence instruction and a compiler memory
barrier.

> Also, I'm looking to support SFENCE and LFENCE,
> which these don't appear to support at all.

That is true.  I think the only supported way to get those is the
Intel intrinsics _mm_sfence and _mm_lfence.

Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-04 23:52 ` Ian Lance Taylor
  2013-06-05  2:45   ` dw
@ 2013-06-12  8:15   ` dw
  2013-06-12 19:01     ` Ian Lance Taylor
  1 sibling, 1 reply; 10+ messages in thread
From: dw @ 2013-06-12  8:15 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

>> 1) Am I right that __builtin_ia32_mfence() does not generate a memory
>> barrier?

> That is correct: it does not prevent the compiler from moving loads
> and stores across the call to __builtin_ia32_mfence.

Are you sure?  Based on your comment, I was fully expecting to be able to produce a failure case suitable for bugzilla.

In fact, I *can* generate failure cases if I comment the __builtin_ia32_mfence() call out of _mm_mfence and replace it with something else (like asm("mfence")).  But as soon as I put the __builtin_ia32_mfence call back in, my "failure scenario" clears right up.

In short, it looks like __builtin_ia32_mfence *does* generate a barrier.  But so do other builtins (like __builtin_ia32_pause).  Does that even seem possible?  It would be weird if every builtin (or even every ia32 builtin) implied a barrier.

dw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-12  8:15   ` dw
@ 2013-06-12 19:01     ` Ian Lance Taylor
  2013-06-13  2:55       ` dw
  2013-06-13  3:01       ` Chung-Ju Wu
  0 siblings, 2 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2013-06-12 19:01 UTC (permalink / raw)
  To: dw; +Cc: gcc-help

On Wed, Jun 12, 2013 at 1:15 AM, dw <limegreensocks@yahoo.com> wrote:
>>> 1) Am I right that __builtin_ia32_mfence() does not generate a memory
>>> barrier?
>
>
>> That is correct: it does not prevent the compiler from moving loads
>> and stores across the call to __builtin_ia32_mfence.
>
>
> Are you sure?  Based on your comment, I was fully expecting to be able to
> produce a failure case suitable for bugzilla.

No, I'm not sure.

> In fact, I *can* generate failure cases if I comment the
> __builtin_ia32_mfence() call out of _mm_mfence and replace it with something
> else (like asm("mfence")).  But as soon as I put the __builtin_ia32_mfence
> call back in, my "failure scenario" clears right up.
>
> In short, it looks like __builtin_ia32_mfence *does* generate a barrier.
> But so do other builtins (like __builtin_ia32_pause).  Does that even seem
> possible?  It would be weird if every builtin (or even every ia32 builtin)
> implied a barrier.

As far as I know, __builtin_ia32_mfence does not generate a barrier.
However, what it does do is appear to be a function call to the main
optimization stages of the compiler.  This is not true of an asm
statement, nor of an inlined function.  The __builtin_ia32_mfence call
is only expanded to an instruction when GCC converts to RTL, after the
main optimizations have been run.  However, it is still possible for
memory loads and stores to move after RTL, specifically when doing
register allocation and spilling register loads and stores to the
stack.

So when I say that as far as I know __builtin_ia32_mfence does not
generate a barrier, what I mean is that as far as I know after it is
expanded to RTL there is no barrier.  But I could be wrong.

Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-12 19:01     ` Ian Lance Taylor
@ 2013-06-13  2:55       ` dw
  2013-06-13  3:01       ` Chung-Ju Wu
  1 sibling, 0 replies; 10+ messages in thread
From: dw @ 2013-06-13  2:55 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc-help

 > As far as I know, __builtin_ia32_mfence does not generate a barrier.
 > However, what it does do is appear to be a function call to the main
 > optimization stages of the compiler.

Ok, now that makes sense.  I kept thinking that somehow it had to be 
that the compiler was seeing this as a function, but then I kept 
discarding that theory because the inline function didn't change anything.

I'm going to have to ponder the performance implications of this. For 
example, it seems possible that asm("pause") could end up generating 
better code than _mm_pause().

 > However, it is still possible for memory loads and stores to move 
after RTL

While it may be possible, I am unable to cause it to happen. Without a 
solid example or authoritative docs describing _mm_mfence as performing 
a ReadWriteBarrier (preferably both), I'm hard pressed to think of a 
credible way to file this in bugzilla.

On this mildly unsatisfactory note, I'm going to assume that _mm_?fence 
will work properly and cross my fingers.  If I eventually find this not 
to be true, I'll head straight to bugzilla.

Thanks for the help.

dw

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-12 19:01     ` Ian Lance Taylor
  2013-06-13  2:55       ` dw
@ 2013-06-13  3:01       ` Chung-Ju Wu
  2013-06-13  3:25         ` Ian Lance Taylor
  2013-06-13  3:44         ` dw
  1 sibling, 2 replies; 10+ messages in thread
From: Chung-Ju Wu @ 2013-06-13  3:01 UTC (permalink / raw)
  To: dw, Ian Lance Taylor; +Cc: gcc-help

2013/6/13 Ian Lance Taylor <iant@google.com>:
> On Wed, Jun 12, 2013 at 1:15 AM, dw <limegreensocks@yahoo.com> wrote:
[deleted]
>> In fact, I *can* generate failure cases if I comment the
>> __builtin_ia32_mfence() call out of _mm_mfence and replace it with something
>> else (like asm("mfence")).  But as soon as I put the __builtin_ia32_mfence
>> call back in, my "failure scenario" clears right up.
>>
>> In short, it looks like __builtin_ia32_mfence *does* generate a barrier.
>> But so do other builtins (like __builtin_ia32_pause).  Does that even seem
>> possible?  It would be weird if every builtin (or even every ia32 builtin)
>> implied a barrier.
[deleted]
>
> So when I say that as far as I know __builtin_ia32_mfence does not
> generate a barrier, what I mean is that as far as I know after it is
> expanded to RTL there is no barrier.  But I could be wrong.
>
> Ian

I just noticed there is a statement "MEM_VOLATILE_P(operands[0]=1"
for mfence pattern in gcc/config/i386/sync.md:

(define_expand "sse2_mfence"
  [(set (match_dup 0)
        (unspec:BLK [(match_dup 0)] UNSPEC_MFENCE))]
  "TARGET_SSE2"
{
  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
  MEM_VOLATILE_P (operands[0]) = 1;
})


And so does "pause" pattern in gcc/config/i386/i386.md:

(define_expand "pause"
  [(set (match_dup 0)
        (unspec:BLK [(match_dup 0)] UNSPEC_PAUSE))]
  ""
{
  operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
  MEM_VOLATILE_P (operands[0]) = 1;
})


According to GCC Internal 10.5, the description says that
"Volatile memory references may not be deleted, reordered or combined."
I think that is why __builtin_ia32_mfence and __builtin_ia32_pause *do*
generate a barrier in dw's experiment.


Best regards,
jasonwucj

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-13  3:01       ` Chung-Ju Wu
@ 2013-06-13  3:25         ` Ian Lance Taylor
  2013-06-13  3:44         ` dw
  1 sibling, 0 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2013-06-13  3:25 UTC (permalink / raw)
  To: Chung-Ju Wu; +Cc: dw, gcc-help

On Wed, Jun 12, 2013 at 8:01 PM, Chung-Ju Wu <jasonwucj@gmail.com> wrote:
>
> According to GCC Internal 10.5, the description says that
> "Volatile memory references may not be deleted, reordered or combined."
> I think that is why __builtin_ia32_mfence and __builtin_ia32_pause *do*
> generate a barrier in dw's experiment.

Good point.  Thanks, that makes sense.

Ian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Question about __builtin_ia32_mfence and memory barriers
  2013-06-13  3:01       ` Chung-Ju Wu
  2013-06-13  3:25         ` Ian Lance Taylor
@ 2013-06-13  3:44         ` dw
  1 sibling, 0 replies; 10+ messages in thread
From: dw @ 2013-06-13  3:44 UTC (permalink / raw)
  To: Chung-Ju Wu; +Cc: Ian Lance Taylor, gcc-help

> I just noticed there is a statement "MEM_VOLATILE_P(operands[0]=1"
> for mfence pattern in gcc/config/i386/sync.md:

> I think that is why __builtin_ia32_mfence and __builtin_ia32_pause *do*
> generate a barrier in dw's experiment.

Aha!

This is not only exactly what I wanted to know, it's exactly the answer 
I was hoping for.

Thank you for your help.

dw

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-06-13  3:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-04 22:58 Question about __builtin_ia32_mfence and memory barriers dw
2013-06-04 23:52 ` Ian Lance Taylor
2013-06-05  2:45   ` dw
2013-06-05  4:30     ` Ian Lance Taylor
2013-06-12  8:15   ` dw
2013-06-12 19:01     ` Ian Lance Taylor
2013-06-13  2:55       ` dw
2013-06-13  3:01       ` Chung-Ju Wu
2013-06-13  3:25         ` Ian Lance Taylor
2013-06-13  3:44         ` dw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).