* Question about __builtin_ia32_mfence and memory barriers
@ 2013-06-04 22:58 dw
2013-06-04 23:52 ` Ian Lance Taylor
0 siblings, 1 reply; 10+ messages in thread
From: dw @ 2013-06-04 22:58 UTC (permalink / raw)
To: gcc-help
The discussion below assumes 64bit code on an i386 processor.
My understanding is that the way to do a memory barrier in gcc is:
asm ("" ::: "memory");
This creates a ReadWriteBarrier, but no processor fence. To create a
processor fence, you could do something like
__builtin_ia32_mfence();
This will generate an mfence instruction, but (assembly code inspection
suggests) no memory barrier. I thought about just putting one after the
other:
asm ("" ::: "memory");
__builtin_ia32_mfence();
And this leads to my questions:
1) Am I right that __builtin_ia32_mfence() does not generate a memory
barrier?
1) Is this "two statement thing" guaranteed to be safe? Could the
optimizer re-order instructions moving code in between the two? (Yes, I
realize that the asm statement doesn't actually generate any output.
But given my understanding of how the compiler processes code, I believe
the question is still valid).
2) If it is not guaranteed to be safe, what is the use of
__builtin_ia32_mfence()? What value is there in preventing the
*processor* from executing statements out of order if the *compiler* is
just going to move them around?
I expect this would always work:
asm ("mfence" ::: "memory");
But I would rather use the builtins if possible.
dw
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-04 22:58 Question about __builtin_ia32_mfence and memory barriers dw
@ 2013-06-04 23:52 ` Ian Lance Taylor
2013-06-05 2:45 ` dw
2013-06-12 8:15 ` dw
0 siblings, 2 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2013-06-04 23:52 UTC (permalink / raw)
To: dw; +Cc: gcc-help
On Tue, Jun 4, 2013 at 3:58 PM, dw <limegreensocks@yahoo.com> wrote:
>
> To create a
> processor fence, you could do something like
>
> __builtin_ia32_mfence();
A better choice these days is __atomic_thread_fence(__ATOMIC_SEQ_CST)
(or __atomic_signal_fence).
> 1) Am I right that __builtin_ia32_mfence() does not generate a memory
> barrier?
That is correct: it does not prevent the compiler from moving loads
and stores across the call to __builtin_ia32_mfence.
> 1) Is this "two statement thing" guaranteed to be safe? Could the optimizer
> re-order instructions moving code in between the two? (Yes, I realize that
> the asm statement doesn't actually generate any output. But given my
> understanding of how the compiler processes code, I believe the question is
> still valid).
It is probably safe, because why would the compiler put anything in
there, but it is not absolutely guaranteed to be safe.
> 2) If it is not guaranteed to be safe, what is the use of
> __builtin_ia32_mfence()? What value is there in preventing the *processor*
> from executing statements out of order if the *compiler* is just going to
> move them around?
__builtin_ia32_mfence exists to support the Intel documented
_mm_mfence intrinsic. I'm not clear on whether _mm_mfence is meant to
be a compiler memory barrier or not. If it is, then I think GCC has a
bug in the way it is implemented. Please feel free to file a bug
report at http://gcc.gnu.org/bugzilla/ , especially if you can come up
with a case that fails.
> I expect this would always work:
>
> asm ("mfence" ::: "memory");
>
> But I would rather use the builtins if possible.
Yes, you should use the builtins. The __atomic builtins, which work
better and are portable across processors.
Ian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-04 23:52 ` Ian Lance Taylor
@ 2013-06-05 2:45 ` dw
2013-06-05 4:30 ` Ian Lance Taylor
2013-06-12 8:15 ` dw
1 sibling, 1 reply; 10+ messages in thread
From: dw @ 2013-06-05 2:45 UTC (permalink / raw)
To: gcc-help
> A better choice these days is __atomic_thread_fence(__ATOMIC_SEQ_CST)
> (or __atomic_signal_fence).
This sounded so promising. Unfortunately, it's not producing the results
I need. I can put all these statements in the code, and none of them
generate -any- fence instruction:
__atomic_thread_fence(__ATOMIC_RELAXED);
__atomic_thread_fence(__ATOMIC_CONSUME);
__atomic_thread_fence(__ATOMIC_ACQUIRE);
__atomic_thread_fence(__ATOMIC_RELEASE);
__atomic_thread_fence(__ATOMIC_ACQ_REL);
__atomic_signal_fence(__ATOMIC_RELAXED);
__atomic_signal_fence(__ATOMIC_CONSUME);
__atomic_signal_fence(__ATOMIC_ACQUIRE);
__atomic_signal_fence(__ATOMIC_RELEASE);
__atomic_signal_fence(__ATOMIC_ACQ_REL);
__atomic_signal_fence(__ATOMIC_SEQ_CST);
And while I get an mfence instruction with this:
__atomic_thread_fence(__ATOMIC_SEQ_CST);
It doesn't produce quite the same instruction ordering as:
asm volatile ("mfence" ::: "memory");
Which makes me think that whatever __ATOMIC_SEQ_CST means, it's not the
same as the "memory" clobber. Also, I'm looking to support SFENCE and
LFENCE, which these don't appear to support at all.
> I'm not clear on whether _mm_mfence is meant to be a compiler memory
barrier or not.
Every authoritative reference I have found is maddeningly silent on this
point.
However, I have tried compiling x64 code with MSVC, and the instruction
ordering it produces for _mm_mfence is not the same as what it produces
for _mm_sfence. In fact, the asm produced when using _mm_sfence bears a
striking similarity to what you get with just _WriteBarrier (minus the
sfence instruction, of course), and _mm_mfence looks like _ReadWriteBarrier.
While I'm not prepared to call this conclusive evidence, it is becoming
suspicious.
And apparently I'm not the only person who thinks there is a problem
here
(http://doxygen.reactos.org/dd/dcb/intrin__x86_8h_a0dee6d755a43d9f9d8072d6202b487db.html#a0dee6d755a43d9f9d8072d6202b487db).
I was concerned about using 2 statements and hoping the compiler didn't
re-order any code around them. I'm not convinced that 3 statements
makes me feel any better.
dw
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-05 2:45 ` dw
@ 2013-06-05 4:30 ` Ian Lance Taylor
0 siblings, 0 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2013-06-05 4:30 UTC (permalink / raw)
To: dw; +Cc: gcc-help
On Tue, Jun 4, 2013 at 7:45 PM, dw <limegreensocks@yahoo.com> wrote:
> And while I get an mfence instruction with this:
>
> __atomic_thread_fence(__ATOMIC_SEQ_CST);
>
> It doesn't produce quite the same instruction ordering as:
>
> asm volatile ("mfence" ::: "memory");
>
> Which makes me think that whatever __ATOMIC_SEQ_CST means, it's not the same
> as the "memory" clobber.
It's not the same as the "memory" clobber, but it should have the
effect of providing both an mfence instruction and a compiler memory
barrier.
> Also, I'm looking to support SFENCE and LFENCE,
> which these don't appear to support at all.
That is true. I think the only supported way to get those is the
Intel intrinsics _mm_sfence and _mm_lfence.
Ian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-04 23:52 ` Ian Lance Taylor
2013-06-05 2:45 ` dw
@ 2013-06-12 8:15 ` dw
2013-06-12 19:01 ` Ian Lance Taylor
1 sibling, 1 reply; 10+ messages in thread
From: dw @ 2013-06-12 8:15 UTC (permalink / raw)
To: Ian Lance Taylor; +Cc: gcc-help
>> 1) Am I right that __builtin_ia32_mfence() does not generate a memory
>> barrier?
> That is correct: it does not prevent the compiler from moving loads
> and stores across the call to __builtin_ia32_mfence.
Are you sure? Based on your comment, I was fully expecting to be able to produce a failure case suitable for bugzilla.
In fact, I *can* generate failure cases if I comment the __builtin_ia32_mfence() call out of _mm_mfence and replace it with something else (like asm("mfence")). But as soon as I put the __builtin_ia32_mfence call back in, my "failure scenario" clears right up.
In short, it looks like __builtin_ia32_mfence *does* generate a barrier. But so do other builtins (like __builtin_ia32_pause). Does that even seem possible? It would be weird if every builtin (or even every ia32 builtin) implied a barrier.
dw
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-12 8:15 ` dw
@ 2013-06-12 19:01 ` Ian Lance Taylor
2013-06-13 2:55 ` dw
2013-06-13 3:01 ` Chung-Ju Wu
0 siblings, 2 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2013-06-12 19:01 UTC (permalink / raw)
To: dw; +Cc: gcc-help
On Wed, Jun 12, 2013 at 1:15 AM, dw <limegreensocks@yahoo.com> wrote:
>>> 1) Am I right that __builtin_ia32_mfence() does not generate a memory
>>> barrier?
>
>
>> That is correct: it does not prevent the compiler from moving loads
>> and stores across the call to __builtin_ia32_mfence.
>
>
> Are you sure? Based on your comment, I was fully expecting to be able to
> produce a failure case suitable for bugzilla.
No, I'm not sure.
> In fact, I *can* generate failure cases if I comment the
> __builtin_ia32_mfence() call out of _mm_mfence and replace it with something
> else (like asm("mfence")). But as soon as I put the __builtin_ia32_mfence
> call back in, my "failure scenario" clears right up.
>
> In short, it looks like __builtin_ia32_mfence *does* generate a barrier.
> But so do other builtins (like __builtin_ia32_pause). Does that even seem
> possible? It would be weird if every builtin (or even every ia32 builtin)
> implied a barrier.
As far as I know, __builtin_ia32_mfence does not generate a barrier.
However, what it does do is appear to be a function call to the main
optimization stages of the compiler. This is not true of an asm
statement, nor of an inlined function. The __builtin_ia32_mfence call
is only expanded to an instruction when GCC converts to RTL, after the
main optimizations have been run. However, it is still possible for
memory loads and stores to move after RTL, specifically when doing
register allocation and spilling register loads and stores to the
stack.
So when I say that as far as I know __builtin_ia32_mfence does not
generate a barrier, what I mean is that as far as I know after it is
expanded to RTL there is no barrier. But I could be wrong.
Ian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-12 19:01 ` Ian Lance Taylor
@ 2013-06-13 2:55 ` dw
2013-06-13 3:01 ` Chung-Ju Wu
1 sibling, 0 replies; 10+ messages in thread
From: dw @ 2013-06-13 2:55 UTC (permalink / raw)
To: Ian Lance Taylor; +Cc: gcc-help
> As far as I know, __builtin_ia32_mfence does not generate a barrier.
> However, what it does do is appear to be a function call to the main
> optimization stages of the compiler.
Ok, now that makes sense. I kept thinking that somehow it had to be
that the compiler was seeing this as a function, but then I kept
discarding that theory because the inline function didn't change anything.
I'm going to have to ponder the performance implications of this. For
example, it seems possible that asm("pause") could end up generating
better code than _mm_pause().
> However, it is still possible for memory loads and stores to move
after RTL
While it may be possible, I am unable to cause it to happen. Without a
solid example or authoritative docs describing _mm_mfence as performing
a ReadWriteBarrier (preferably both), I'm hard pressed to think of a
credible way to file this in bugzilla.
On this mildly unsatisfactory note, I'm going to assume that _mm_?fence
will work properly and cross my fingers. If I eventually find this not
to be true, I'll head straight to bugzilla.
Thanks for the help.
dw
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-12 19:01 ` Ian Lance Taylor
2013-06-13 2:55 ` dw
@ 2013-06-13 3:01 ` Chung-Ju Wu
2013-06-13 3:25 ` Ian Lance Taylor
2013-06-13 3:44 ` dw
1 sibling, 2 replies; 10+ messages in thread
From: Chung-Ju Wu @ 2013-06-13 3:01 UTC (permalink / raw)
To: dw, Ian Lance Taylor; +Cc: gcc-help
2013/6/13 Ian Lance Taylor <iant@google.com>:
> On Wed, Jun 12, 2013 at 1:15 AM, dw <limegreensocks@yahoo.com> wrote:
[deleted]
>> In fact, I *can* generate failure cases if I comment the
>> __builtin_ia32_mfence() call out of _mm_mfence and replace it with something
>> else (like asm("mfence")). But as soon as I put the __builtin_ia32_mfence
>> call back in, my "failure scenario" clears right up.
>>
>> In short, it looks like __builtin_ia32_mfence *does* generate a barrier.
>> But so do other builtins (like __builtin_ia32_pause). Does that even seem
>> possible? It would be weird if every builtin (or even every ia32 builtin)
>> implied a barrier.
[deleted]
>
> So when I say that as far as I know __builtin_ia32_mfence does not
> generate a barrier, what I mean is that as far as I know after it is
> expanded to RTL there is no barrier. But I could be wrong.
>
> Ian
I just noticed there is a statement "MEM_VOLATILE_P(operands[0]=1"
for mfence pattern in gcc/config/i386/sync.md:
(define_expand "sse2_mfence"
[(set (match_dup 0)
(unspec:BLK [(match_dup 0)] UNSPEC_MFENCE))]
"TARGET_SSE2"
{
operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
MEM_VOLATILE_P (operands[0]) = 1;
})
And so does "pause" pattern in gcc/config/i386/i386.md:
(define_expand "pause"
[(set (match_dup 0)
(unspec:BLK [(match_dup 0)] UNSPEC_PAUSE))]
""
{
operands[0] = gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (Pmode));
MEM_VOLATILE_P (operands[0]) = 1;
})
According to GCC Internal 10.5, the description says that
"Volatile memory references may not be deleted, reordered or combined."
I think that is why __builtin_ia32_mfence and __builtin_ia32_pause *do*
generate a barrier in dw's experiment.
Best regards,
jasonwucj
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-13 3:01 ` Chung-Ju Wu
@ 2013-06-13 3:25 ` Ian Lance Taylor
2013-06-13 3:44 ` dw
1 sibling, 0 replies; 10+ messages in thread
From: Ian Lance Taylor @ 2013-06-13 3:25 UTC (permalink / raw)
To: Chung-Ju Wu; +Cc: dw, gcc-help
On Wed, Jun 12, 2013 at 8:01 PM, Chung-Ju Wu <jasonwucj@gmail.com> wrote:
>
> According to GCC Internal 10.5, the description says that
> "Volatile memory references may not be deleted, reordered or combined."
> I think that is why __builtin_ia32_mfence and __builtin_ia32_pause *do*
> generate a barrier in dw's experiment.
Good point. Thanks, that makes sense.
Ian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Question about __builtin_ia32_mfence and memory barriers
2013-06-13 3:01 ` Chung-Ju Wu
2013-06-13 3:25 ` Ian Lance Taylor
@ 2013-06-13 3:44 ` dw
1 sibling, 0 replies; 10+ messages in thread
From: dw @ 2013-06-13 3:44 UTC (permalink / raw)
To: Chung-Ju Wu; +Cc: Ian Lance Taylor, gcc-help
> I just noticed there is a statement "MEM_VOLATILE_P(operands[0]=1"
> for mfence pattern in gcc/config/i386/sync.md:
> I think that is why __builtin_ia32_mfence and __builtin_ia32_pause *do*
> generate a barrier in dw's experiment.
Aha!
This is not only exactly what I wanted to know, it's exactly the answer
I was hoping for.
Thank you for your help.
dw
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-06-13 3:44 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-04 22:58 Question about __builtin_ia32_mfence and memory barriers dw
2013-06-04 23:52 ` Ian Lance Taylor
2013-06-05 2:45 ` dw
2013-06-05 4:30 ` Ian Lance Taylor
2013-06-12 8:15 ` dw
2013-06-12 19:01 ` Ian Lance Taylor
2013-06-13 2:55 ` dw
2013-06-13 3:01 ` Chung-Ju Wu
2013-06-13 3:25 ` Ian Lance Taylor
2013-06-13 3:44 ` dw
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).