* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
@ 2013-01-13 19:59 Uros Bizjak
2013-01-13 20:36 ` Andi Kleen
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-13 19:59 UTC (permalink / raw)
To: gcc-patches; +Cc: Andi Kleen
[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]
Hello!
> __atomic_clear and __atomic_store_n didn't have code to generate
> the TSX HLE RELEASE prefix. Add this plus test cases.
+(define_insn "atomic_store_hle_release<mode>"
+ [(set (match_operand:ATOMIC 0 "memory_operand")
+ (unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
+ (match_operand:SI 2 "const_int_operand")]
+ UNSPEC_MOVA_RELEASE))]
+ ""
+ "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
This pattern doesn't have any constraints! Also, mov insn can store
immediates directly.
+ if (model & IX86_HLE_RELEASE)
+ {
+ emit_insn (gen_atomic_store_hle_release<mode> (operands[0],
operands[1],
+ operands[2]));
+ DONE;
+ }
+
/* For seq-cst stores, when we lack MFENCE, use XCHG. */
if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
What about __ATOMIC_SEQ_CST; should
__atomic_clear (p, __ATOMIC_SEQ_CST | __ATOMIC_HLE_RELEASE);
emit a mfence at the end; in case of for your test:
xrelease movb $0, (%rdi)
mfence
ret
?
+
+void
+hle_clear (int *p, int v)
hle_clear (char *p)
This argument should correspond to a bool, please see documentation.
I have also attached the patch that implements sync.md fixes.
Uros.
[-- Attachment #2: r.diff.txt --]
[-- Type: text/plain, Size: 1129 bytes --]
Index: config/i386/sync.md
===================================================================
--- config/i386/sync.md (revision 195137)
+++ config/i386/sync.md (working copy)
@@ -224,8 +224,12 @@
DONE;
}
- /* Otherwise use a normal store. */
- emit_move_insn (operands[0], operands[1]);
+ /* Otherwise use a store... */
+ if (INTVAL (operands[2]) & IX86_HLE_RELEASE)
+ emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
+ operands[2]));
+ else
+ emit_move_insn (operands[0], operands[1]);
}
/* ... followed by an MFENCE, if required. */
if (model == MEMMODEL_SEQ_CST)
@@ -233,6 +237,14 @@
DONE;
})
+(define_insn "atomic_store<mode>_1"
+ [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
+ (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
+ (match_operand:SI 2 "const_int_operand")]
+ UNSPEC_MOVA))]
+ ""
+ "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
+
(define_insn_and_split "atomic_storedi_fpu"
[(set (match_operand:DI 0 "memory_operand" "=m,m,m")
(unspec:DI [(match_operand:DI 1 "register_operand" "x,m,?r")]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-13 19:59 [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n Uros Bizjak
@ 2013-01-13 20:36 ` Andi Kleen
2013-01-13 20:59 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-13 20:36 UTC (permalink / raw)
To: Uros Bizjak; +Cc: gcc-patches, Andi Kleen
On Sun, Jan 13, 2013 at 08:59:15PM +0100, Uros Bizjak wrote:
> Hello!
>
> > __atomic_clear and __atomic_store_n didn't have code to generate
> > the TSX HLE RELEASE prefix. Add this plus test cases.
>
> +(define_insn "atomic_store_hle_release<mode>"
> + [(set (match_operand:ATOMIC 0 "memory_operand")
> + (unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
> + (match_operand:SI 2 "const_int_operand")]
> + UNSPEC_MOVA_RELEASE))]
> + ""
> + "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>
> This pattern doesn't have any constraints! Also, mov insn can store
> immediates directly.
Can you suggest a better pattern?
>
> + if (model & IX86_HLE_RELEASE)
> + {
> + emit_insn (gen_atomic_store_hle_release<mode> (operands[0],
> operands[1],
> + operands[2]));
> + DONE;
> + }
> +
> /* For seq-cst stores, when we lack MFENCE, use XCHG. */
> if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
>
> What about __ATOMIC_SEQ_CST; should
>
> __atomic_clear (p, __ATOMIC_SEQ_CST | __ATOMIC_HLE_RELEASE);
>
> emit a mfence at the end; in case of for your test:
Originally I thought not, but now on reconsideration it's needed for
older CPUs that don't know about the XRELEASE. And it may be even needed
with TSX for the non transactional fallback execution. I'll fix the patch.
> +
> +void
> +hle_clear (int *p, int v)
>
> hle_clear (char *p)
>
> This argument should correspond to a bool, please see documentation.
Not sure I understand? Which documentation? This is just a random test case
> DONE;
> })
>
> +(define_insn "atomic_store<mode>_1"
> + [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
> + (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
> + (match_operand:SI 2 "const_int_operand")]
> + UNSPEC_MOVA))]
> + ""
> + "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
Is that the updated pattern you wanted? It looks similar to mine.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-13 20:36 ` Andi Kleen
@ 2013-01-13 20:59 ` Uros Bizjak
2013-01-13 22:13 ` Andi Kleen
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-13 20:59 UTC (permalink / raw)
To: Andi Kleen; +Cc: gcc-patches
On Sun, Jan 13, 2013 at 9:36 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> > __atomic_clear and __atomic_store_n didn't have code to generate
>> > the TSX HLE RELEASE prefix. Add this plus test cases.
>>
>> +(define_insn "atomic_store_hle_release<mode>"
>> + [(set (match_operand:ATOMIC 0 "memory_operand")
>> + (unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
>> + (match_operand:SI 2 "const_int_operand")]
>> + UNSPEC_MOVA_RELEASE))]
>> + ""
>> + "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>>
>> This pattern doesn't have any constraints! Also, mov insn can store
>> immediates directly.
>
> Can you suggest a better pattern?
It is implemented in the patch, attached to my previous message.
>>
>> + if (model & IX86_HLE_RELEASE)
>> + {
>> + emit_insn (gen_atomic_store_hle_release<mode> (operands[0],
>> operands[1],
>> + operands[2]));
>> + DONE;
>> + }
>> +
>> /* For seq-cst stores, when we lack MFENCE, use XCHG. */
>> if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
>>
>> What about __ATOMIC_SEQ_CST; should
>>
>> __atomic_clear (p, __ATOMIC_SEQ_CST | __ATOMIC_HLE_RELEASE);
>>
>> emit a mfence at the end; in case of for your test:
>
> Originally I thought not, but now on reconsideration it's needed for
> older CPUs that don't know about the XRELEASE. And it may be even needed
> with TSX for the non transactional fallback execution. I'll fix the patch.
Also fixed in my patch. It emits mfence at the end.
>> +
>> +void
>> +hle_clear (int *p, int v)
>>
>> hle_clear (char *p)
>>
>> This argument should correspond to a bool, please see documentation.
>
>
> Not sure I understand? Which documentation? This is just a random test case
Ah, I was referring to the gcc documentation about __atomic_clear.
>> +(define_insn "atomic_store<mode>_1"
>> + [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
>> + (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
>> + (match_operand:SI 2 "const_int_operand")]
>> + UNSPEC_MOVA))]
>> + ""
>> + "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>
> Is that the updated pattern you wanted? It looks similar to mine.
Yes the attached patch actually implements all proposed fixes.
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-13 20:59 ` Uros Bizjak
@ 2013-01-13 22:13 ` Andi Kleen
2013-01-13 22:23 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-13 22:13 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Andi Kleen, gcc-patches
> >> +(define_insn "atomic_store<mode>_1"
> >> + [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
> >> + (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
> >> + (match_operand:SI 2 "const_int_operand")]
> >> + UNSPEC_MOVA))]
> >> + ""
> >> + "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
> >
> > Is that the updated pattern you wanted? It looks similar to mine.
>
> Yes the attached patch actually implements all proposed fixes.
Ok great. Can you just commit it then? It looks good to me.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-13 22:13 ` Andi Kleen
@ 2013-01-13 22:23 ` Uros Bizjak
2013-01-13 22:29 ` Andi Kleen
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-13 22:23 UTC (permalink / raw)
To: Andi Kleen; +Cc: gcc-patches
On Sun, Jan 13, 2013 at 11:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> +(define_insn "atomic_store<mode>_1"
>> >> + [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
>> >> + (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
>> >> + (match_operand:SI 2 "const_int_operand")]
>> >> + UNSPEC_MOVA))]
>> >> + ""
>> >> + "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>> >
>> > Is that the updated pattern you wanted? It looks similar to mine.
>>
>> Yes the attached patch actually implements all proposed fixes.
>
> Ok great. Can you just commit it then? It looks good to me.
No problem, but what about this part:
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2b615a1..c283869 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
return const0_rtx;
}
+ /* need target hook there to check for not hle acquire */
+
if (HAVE_atomic_clear)
{
emit_insn (gen_atomic_clear (mem, model));
Middle-end support should be implemented before target support is
committed. So, please figure out how to emit correct error on
unsupported models and get middle-end patch reviewed first. We do get
"Error: instruction `mov' after `xacquire' not allowed" assembler
error with "xacquire movb $0,mem" asm, though.
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-13 22:23 ` Uros Bizjak
@ 2013-01-13 22:29 ` Andi Kleen
2013-01-14 16:48 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-13 22:29 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Andi Kleen, gcc-patches
On Sun, Jan 13, 2013 at 11:23:24PM +0100, Uros Bizjak wrote:
> On Sun, Jan 13, 2013 at 11:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >> >> +(define_insn "atomic_store<mode>_1"
> >> >> + [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
> >> >> + (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
> >> >> + (match_operand:SI 2 "const_int_operand")]
> >> >> + UNSPEC_MOVA))]
> >> >> + ""
> >> >> + "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
> >> >
> >> > Is that the updated pattern you wanted? It looks similar to mine.
> >>
> >> Yes the attached patch actually implements all proposed fixes.
> >
> > Ok great. Can you just commit it then? It looks good to me.
>
> No problem, but what about this part:
Right now it just means its silently ignored, no wrong code generated.
If people are ok with a new target hook I can add one.
There are some more bugs in this area, like PR55947
Giving a warning is imho less important than supporting this at all.
So I would prefer to not delay this patch.
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 2b615a1..c283869 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
> return const0_rtx;
> }
>
> + /* need target hook there to check for not hle acquire */
> +
> if (HAVE_atomic_clear)
> {
> emit_insn (gen_atomic_clear (mem, model));
>
> Middle-end support should be implemented before target support is
> committed. So, please figure out how to emit correct error on
> unsupported models and get middle-end patch reviewed first. We do get
> "Error: instruction `mov' after `xacquire' not allowed" assembler
> error with "xacquire movb $0,mem" asm, though.
The sync.md code is only called for the acquire bit.
The only case where it may happen I guess if someone sets both.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-13 22:29 ` Andi Kleen
@ 2013-01-14 16:48 ` Uros Bizjak
2013-01-14 18:06 ` Andi Kleen
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-14 16:48 UTC (permalink / raw)
To: Andi Kleen; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]
On Sun, Jan 13, 2013 at 11:29 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Sun, Jan 13, 2013 at 11:23:24PM +0100, Uros Bizjak wrote:
>> On Sun, Jan 13, 2013 at 11:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> >> +(define_insn "atomic_store<mode>_1"
>> >> >> + [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
>> >> >> + (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
>> >> >> + (match_operand:SI 2 "const_int_operand")]
>> >> >> + UNSPEC_MOVA))]
>> >> >> + ""
>> >> >> + "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>> >> >
>> >> > Is that the updated pattern you wanted? It looks similar to mine.
>> >>
>> >> Yes the attached patch actually implements all proposed fixes.
>> >
>> > Ok great. Can you just commit it then? It looks good to me.
>>
>> No problem, but what about this part:
>
> Right now it just means its silently ignored, no wrong code generated.
> If people are ok with a new target hook I can add one.
> There are some more bugs in this area, like PR55947
>
> Giving a warning is imho less important than supporting this at all.
> So I would prefer to not delay this patch.
>
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index 2b615a1..c283869 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
>> return const0_rtx;
>> }
>>
>> + /* need target hook there to check for not hle acquire */
>> +
>> if (HAVE_atomic_clear)
>> {
>> emit_insn (gen_atomic_clear (mem, model));
>>
>> Middle-end support should be implemented before target support is
>> committed. So, please figure out how to emit correct error on
>> unsupported models and get middle-end patch reviewed first. We do get
>> "Error: instruction `mov' after `xacquire' not allowed" assembler
>> error with "xacquire movb $0,mem" asm, though.
>
> The sync.md code is only called for the acquire bit.
>
> The only case where it may happen I guess if someone sets both.
This cannot happen, we reject code that sets both __HLE* flags.
2012-01-14 Uros Bizjak <ubizjak@gmail.com>
Andi Kleen <ak@linux.intel.com>
PR target/55948
* config/i386/sync.md (atomic_store<mode>_1): New pattern.
(atomic_store<mode>): Call atomic_store<mode>_1 for IX86_HLE_RELEASE
memmodel flag.
testsuite/ChangeLog
2012-01-14 Andi Kleen <ak@linux.intel.com>
PR target/55948
* gcc.target/i386/hle-clear-rel.c: New file
* gcc.target/i386/hle-store-rel.c: New file.
I have committed attached patch to mainline SVN, after re-tested it on
x86_64-pc-linux-gnu.
Uros.
[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 2153 bytes --]
Index: config/i386/sync.md
===================================================================
--- config/i386/sync.md (revision 195152)
+++ config/i386/sync.md (working copy)
@@ -224,8 +224,12 @@
DONE;
}
- /* Otherwise use a normal store. */
- emit_move_insn (operands[0], operands[1]);
+ /* Otherwise use a store. */
+ if (INTVAL (operands[2]) & IX86_HLE_RELEASE)
+ emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
+ operands[2]));
+ else
+ emit_move_insn (operands[0], operands[1]);
}
/* ... followed by an MFENCE, if required. */
if (model == MEMMODEL_SEQ_CST)
@@ -233,6 +237,14 @@
DONE;
})
+(define_insn "atomic_store<mode>_1"
+ [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
+ (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
+ (match_operand:SI 2 "const_int_operand")]
+ UNSPEC_MOVA))]
+ ""
+ "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
+
(define_insn_and_split "atomic_storedi_fpu"
[(set (match_operand:DI 0 "memory_operand" "=m,m,m")
(unspec:DI [(match_operand:DI 1 "register_operand" "x,m,?r")]
Index: testsuite/gcc.target/i386/hle-clear-rel.c
===================================================================
--- testsuite/gcc.target/i386/hle-clear-rel.c (revision 0)
+++ testsuite/gcc.target/i386/hle-clear-rel.c (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_clear (char *p, int v)
+{
+ __atomic_clear (p, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}
Index: testsuite/gcc.target/i386/hle-store-rel.c
===================================================================
--- testsuite/gcc.target/i386/hle-store-rel.c (revision 0)
+++ testsuite/gcc.target/i386/hle-store-rel.c (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_store (int *p, int v)
+{
+ __atomic_store_n (p, v, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-14 16:48 ` Uros Bizjak
@ 2013-01-14 18:06 ` Andi Kleen
2013-01-14 18:41 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-14 18:06 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Andi Kleen, gcc-patches
> This cannot happen, we reject code that sets both __HLE* flags.
Good thanks.
BTW I found more HLE bugs, it looks like some of the fetch_op_*
patterns do not match always and fall back to cmpxchg, which
does not generate HLE code correctly. Not fully sure what's
wrong, can you spot any obvious problems? You changed the
(define_insn "atomic_<logic><mode>"
pattern last.
The only one that should really fallback to cmpxchg is nand,
all the others can be generated directly.
This can be seen by commenting in the #if 0 case in the libstdc++
HLE patch test case I sent yesterday.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966
-Andi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-14 18:06 ` Andi Kleen
@ 2013-01-14 18:41 ` Uros Bizjak
2013-01-14 19:02 ` Andi Kleen
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-14 18:41 UTC (permalink / raw)
To: Andi Kleen; +Cc: gcc-patches
On Mon, Jan 14, 2013 at 7:06 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> This cannot happen, we reject code that sets both __HLE* flags.
>
> BTW I found more HLE bugs, it looks like some of the fetch_op_*
> patterns do not match always and fall back to cmpxchg, which
> does not generate HLE code correctly. Not fully sure what's
> wrong, can you spot any obvious problems? You changed the
>
> (define_insn "atomic_<logic><mode>"
>
> pattern last.
I don't think this is a target problem, these insns work as expected
and are covered by extensive testsuite in gcc.target/i386/hle-*.c.
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-14 18:41 ` Uros Bizjak
@ 2013-01-14 19:02 ` Andi Kleen
2013-01-14 19:21 ` Uros Bizjak
2013-01-14 19:25 ` Uros Bizjak
0 siblings, 2 replies; 13+ messages in thread
From: Andi Kleen @ 2013-01-14 19:02 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Andi Kleen, gcc-patches
On Mon, Jan 14, 2013 at 07:40:56PM +0100, Uros Bizjak wrote:
> On Mon, Jan 14, 2013 at 7:06 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >> This cannot happen, we reject code that sets both __HLE* flags.
> >
> > BTW I found more HLE bugs, it looks like some of the fetch_op_*
> > patterns do not match always and fall back to cmpxchg, which
> > does not generate HLE code correctly. Not fully sure what's
> > wrong, can you spot any obvious problems? You changed the
> >
> > (define_insn "atomic_<logic><mode>"
> >
> > pattern last.
>
> I don't think this is a target problem, these insns work as expected
> and are covered by extensive testsuite in gcc.target/i386/hle-*.c.
Well the C++ test cases I wrote didn't work. It may be related to
how complex the program is. Simple calls as in the original
test suite seem to work.
e.g. instead of xacquire lock and ... it ended up with a cmpxchg loop
(which I think is a fallback path). The cmpxchg loop didn't include
a HLE prefix (and simply adding one is not enoigh, would need more
changes for successfull elision)
Before HLE the cmpxchg code was correct, just somewhat inefficient.
Even with HLE it is technically correct, just it'll never elide.
I think I would like to fix and,or,xor and disallow HLE for nand.
Here's a test case. Needs the libstdc++ HLE patch posted.
#include <atomic>
#define ACQ memory_order_acquire | __memory_order_hle_acquire
#define REL memory_order_release | __memory_order_hle_release
int main()
{
using namespace std;
atomic_ulong au = ATOMIC_VAR_INIT(0);
if (!au.fetch_and(1, ACQ))
au.fetch_and(-1, REL);
unsigned lock = 0;
__atomic_fetch_and(&lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE);
return 0;
}
The first fetch_and generates: (wrong)
.L2:
movq %rax, %rcx
movq %rax, %rdx
andl $1, %ecx
lock; cmpxchgq %rcx, -24(%rsp)
jne .L2
the second __atomic_fetch_and generates (correct):
lock;
.byte 0xf2
andl $1, -28(%rsp)
.LBE14:
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-14 19:02 ` Andi Kleen
@ 2013-01-14 19:21 ` Uros Bizjak
2013-01-14 19:25 ` Uros Bizjak
1 sibling, 0 replies; 13+ messages in thread
From: Uros Bizjak @ 2013-01-14 19:21 UTC (permalink / raw)
To: Andi Kleen; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1396 bytes --]
On Mon, Jan 14, 2013 at 8:01 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> This cannot happen, we reject code that sets both __HLE* flags.
>> >
>> > BTW I found more HLE bugs, it looks like some of the fetch_op_*
>> > patterns do not match always and fall back to cmpxchg, which
>> > does not generate HLE code correctly. Not fully sure what's
>> > wrong, can you spot any obvious problems? You changed the
>> >
>> > (define_insn "atomic_<logic><mode>"
>> >
>> > pattern last.
>>
>> I don't think this is a target problem, these insns work as expected
>> and are covered by extensive testsuite in gcc.target/i386/hle-*.c.
>
> Well the C++ test cases I wrote didn't work. It may be related to
> how complex the program is. Simple calls as in the original
> test suite seem to work.
>
> e.g. instead of xacquire lock and ... it ended up with a cmpxchg loop
> (which I think is a fallback path). The cmpxchg loop didn't include
> a HLE prefix (and simply adding one is not enoigh, would need more
> changes for successfull elision)
>
> Before HLE the cmpxchg code was correct, just somewhat inefficient.
> Even with HLE it is technically correct, just it'll never elide.
I'd start with attached (mechanical) patch that just blindly adds
masks where memory model is checked. Please note that ATOMIC_HLE
modifies high bits of the model, so these checks fail in presence of
HLE modifiers.
Uros.
[-- Attachment #2: m.diff.txt --]
[-- Type: text/plain, Size: 2767 bytes --]
Index: emit-rtl.c
===================================================================
--- emit-rtl.c (revision 195152)
+++ emit-rtl.c (working copy)
@@ -6014,7 +6014,7 @@ insn_file (const_rtx insn)
bool
need_atomic_barrier_p (enum memmodel model, bool pre)
{
- switch (model)
+ switch (model & MEMMODEL_MASK)
{
case MEMMODEL_RELAXED:
case MEMMODEL_CONSUME:
Index: optabs.c
===================================================================
--- optabs.c (revision 195152)
+++ optabs.c (working copy)
@@ -7008,9 +7008,9 @@ maybe_emit_sync_lock_test_and_set (rtx target, rtx
exists, and the memory model is stronger than acquire, add a release
barrier before the instruction. */
- if (model == MEMMODEL_SEQ_CST
- || model == MEMMODEL_RELEASE
- || model == MEMMODEL_ACQ_REL)
+ if ((model & MEMMODEL_MASK) == MEMMODEL_SEQ_CST
+ || (model & MEMMODEL_MASK) == MEMMODEL_RELEASE
+ || (model & MEMMODEL_MASK) == MEMMODEL_ACQ_REL)
expand_mem_thread_fence (model);
if (icode != CODE_FOR_nothing)
@@ -7388,7 +7388,7 @@ expand_mem_thread_fence (enum memmodel model)
{
if (HAVE_mem_thread_fence)
emit_insn (gen_mem_thread_fence (GEN_INT (model)));
- else if (model != MEMMODEL_RELAXED)
+ else if ((model & MEMMODEL_MASK) != MEMMODEL_RELAXED)
{
if (HAVE_memory_barrier)
emit_insn (gen_memory_barrier ());
@@ -7412,7 +7412,7 @@ expand_mem_signal_fence (enum memmodel model)
{
if (HAVE_mem_signal_fence)
emit_insn (gen_mem_signal_fence (GEN_INT (model)));
- else if (model != MEMMODEL_RELAXED)
+ else if ((model & MEMMODEL_MASK) != MEMMODEL_RELAXED)
{
/* By default targets are coherent between a thread and the signal
handler running on the same thread. Thus this really becomes a
@@ -7467,7 +7467,7 @@ expand_atomic_load (rtx target, rtx mem, enum memm
target = gen_reg_rtx (mode);
/* For SEQ_CST, emit a barrier before the load. */
- if (model == MEMMODEL_SEQ_CST)
+ if ((model & MEMMODEL_MASK) == MEMMODEL_SEQ_CST)
expand_mem_thread_fence (model);
emit_move_insn (target, mem);
@@ -7513,7 +7513,7 @@ expand_atomic_store (rtx mem, rtx val, enum memmod
if (maybe_expand_insn (icode, 2, ops))
{
/* lock_release is only a release barrier. */
- if (model == MEMMODEL_SEQ_CST)
+ if ((model & MEMMODEL_MASK) == MEMMODEL_SEQ_CST)
expand_mem_thread_fence (model);
return const0_rtx;
}
@@ -7540,7 +7540,7 @@ expand_atomic_store (rtx mem, rtx val, enum memmod
emit_move_insn (mem, val);
/* For SEQ_CST, also emit a barrier after the store. */
- if (model == MEMMODEL_SEQ_CST)
+ if ((model & MEMMODEL_MASK) == MEMMODEL_SEQ_CST)
expand_mem_thread_fence (model);
return const0_rtx;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-14 19:02 ` Andi Kleen
2013-01-14 19:21 ` Uros Bizjak
@ 2013-01-14 19:25 ` Uros Bizjak
1 sibling, 0 replies; 13+ messages in thread
From: Uros Bizjak @ 2013-01-14 19:25 UTC (permalink / raw)
To: Andi Kleen; +Cc: gcc-patches
On Mon, Jan 14, 2013 at 8:01 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Well the C++ test cases I wrote didn't work. It may be related to
> how complex the program is. Simple calls as in the original
> test suite seem to work.
>
> e.g. instead of xacquire lock and ... it ended up with a cmpxchg loop
> (which I think is a fallback path). The cmpxchg loop didn't include
> a HLE prefix (and simply adding one is not enoigh, would need more
> changes for successfull elision)
>
> Before HLE the cmpxchg code was correct, just somewhat inefficient.
> Even with HLE it is technically correct, just it'll never elide.
>
> I think I would like to fix and,or,xor and disallow HLE for nand.
>
> Here's a test case. Needs the libstdc++ HLE patch posted.
Can you please attach _preprocessed_ (i.e. add -save-temps to compile
flags) source to a PR?
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] Document HLE / RTM intrinsics
@ 2013-01-12 15:29 Andi Kleen
2013-01-12 15:29 ` [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n Andi Kleen
0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-12 15:29 UTC (permalink / raw)
To: gcc-patches; +Cc: Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
The TSX HLE/RTM intrinsics were missing documentation. Add this to the
manual.
Ok for release / trunk?
2013-01-11 Andi Kleen <ak@linux.intel.com>
* doc/extend.texi: Document __ATOMIC_HLE_ACQUIRE,
__ATOMIC_HLE_RELEASE. Document __builtin_ia32 TSX intrincs.
Document _x* TSX intrinsics.
---
gcc/doc/extend.texi | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 115 insertions(+)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cc20ed2..fb0d4bc 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -81,6 +81,7 @@ extensions, accepted by GCC in C90 mode and in C++.
* Offsetof:: Special syntax for implementing @code{offsetof}.
* __sync Builtins:: Legacy built-in functions for atomic memory access.
* __atomic Builtins:: Atomic built-in functions with memory model.
+* x86 specific memory model extensions for transactional memory:: x86 memory models.
* Object Size Checking:: Built-in functions for limited buffer overflow
checking.
* Other Builtins:: Other built-in functions.
@@ -7466,6 +7467,37 @@ alignment. A value of 0 indicates typical alignment should be used. The
compiler may also ignore this parameter.
@end deftypefn
+@node x86 specific memory model extensions for transactional memory
+@section x86 specific memory model extensions for transactional memory
+
+The i386 architecture supports additional memory ordering flags
+to mark lock critical sections for hardware lock elision.
+These must be specified in addition to an existing memory model to
+atomic intrinsics.
+
+@table @code
+@item __ATOMIC_HLE_ACQUIRE
+Start lock elision on a lock variable.
+Memory model must be @code{__ATOMIC_ACQUIRE} or stronger.
+@item __ATOMIC_HLE_RELEASE
+End lock elision on a lock variable.
+Memory model must be @code{__ATOMIC_RELEASE} or stronger.
+@end table
+
+When a lock acquire fails it's required for good performance to abort
+the transaction quickly. This can be done with a @code{_mm_pause}
+
+@smallexample
+#include <immintrin.h> // For _mm_pause
+
+/* Acquire lock with lock elision */
+while (__atomic_exchange_n(&lockvar, 1, __ATOMIC_ACQUIRE|__ATOMIC_HLE_ACQUIRE))
+ _mm_pause(); /* Abort failed transaction */
+...
+/* Free lock with lock elision */
+__atomic_clear(&lockvar, __ATOMIC_RELEASE|__ATOMIC_HLE_RELEASE);
+@end smallexample
+
@node Object Size Checking
@section Object Size Checking Built-in Functions
@findex __builtin_object_size
@@ -8737,6 +8769,7 @@ instructions, but allow the compiler to schedule those calls.
* Blackfin Built-in Functions::
* FR-V Built-in Functions::
* X86 Built-in Functions::
+* X86 transactional memory intrinsics::
* MIPS DSP Built-in Functions::
* MIPS Paired-Single Support::
* MIPS Loongson Built-in Functions::
@@ -10917,6 +10950,88 @@ v2sf __builtin_ia32_pswapdsf (v2sf)
v2si __builtin_ia32_pswapdsi (v2si)
@end smallexample
+The following built-in functions are available when @option{-mrtm} is used
+They are used for restricted transactional memory. These are the internal
+low level functions. Normally the functions in
+@ref{X86 transactional memory intrinsics} should be used instead.
+
+@smallexample
+int __builtin_ia32_xbegin ()
+void __builtin_ia32_xend ()
+void __builtin_ia32_xabort (status)
+int __builtin_ia32_xtest ()
+@end smallexample
+
+@node X86 transactional memory intrinsics
+@subsection X86 transaction memory intrinsics
+
+Hardware transactional memory intrinsics for i386. These allow to use
+memory transactions with RTM (Restricted Transactional Memory).
+For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead.
+This support is enabled with the @option{-mrtm} option.
+
+A memory transaction commits all changes to memory in an atomic way,
+as visible to other threads. If the transaction fails it is rolled back
+and all side effects discarded.
+
+Generally there is no guarantee that a memory transaction ever suceeds
+and suitable fallback code always needs to be supplied.
+
+@deftypefn {RTM Function} {unsigned} _xbegin ()
+Start a RTM (Restricted Transactional Memory) transaction.
+Returns _XBEGIN_STARTED when the transaction
+started successfully (not this is not 0, so the constant has to be
+explicitely tested). When the transaction aborts all side effects
+are undone and an abort code is returned. There is no guarantee
+any transaction ever succeeds, so there always needs to be a valid
+tested fallback path.
+@end deftypefn
+
+@smallexample
+#include <immintrin.h>
+
+if ((status = _xbegin ()) == _XBEGIN_STARTED) @{
+ ... transaction code...
+ _xend ();
+@} else @{
+ ... non transactional fallback path...
+@}
+@end smallexample
+
+Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are:
+
+@table @code
+@item _XABORT_EXPLICIT
+Transaction explicitely aborted with @code{_xabort}. The parameter passed
+to @code{_xabort} is available with @code{_XABORT_CODE(status)}
+@item _XABORT_RETRY
+Transaction retry is possible.
+@item _XABORT_CONFLICT
+Transaction abort due to a memory conflict with another thread
+@item _XABORT_CAPACITY
+Transaction abort due to the transaction using too much memory
+@item _XABORT_DEBUG
+Transaction abort due to a debug trap
+@item _XABORT_NESTED
+Transaction abort in a inner nested transaction
+@end table
+
+@deftypefn {RTM Function} {void} _xend ()
+Commit the current transaction. When no transaction is active this will
+fault. All memory side effects of the transactions will become visible
+to other threads in an atomic matter.
+@end deftypefn
+
+@deftypefn {RTM Function} {int} _xtest ()
+Return a value not zero when a transaction is currently active, otherwise 0.
+@end deftypefn
+
+@deftypefn {RTM Function} {void} _xabort (status)
+Abort the current transaction. When no transaction is active this is a no-op.
+status must be a 8bit constant, that is included in the status code returned
+by @code{_xbegin}
+@end deftypefn
+
@node MIPS DSP Built-in Functions
@subsection MIPS DSP Built-in Functions
--
1.7.10.4
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
2013-01-12 15:29 [PATCH 1/2] Document HLE / RTM intrinsics Andi Kleen
@ 2013-01-12 15:29 ` Andi Kleen
0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2013-01-12 15:29 UTC (permalink / raw)
To: gcc-patches; +Cc: Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
__atomic_clear and __atomic_store_n didn't have code to generate
the TSX HLE RELEASE prefix. Add this plus test cases.
Right now it would need another target hook to check for someone
passing __ATOMIC_HLE_ACQUIRE to store/clear. I just ignore this
for now.
Passes bootstrap/test on x86_64.
Ok for release branch / trunk ?
gcc/:
2013-01-11 Andi Kleen <ak@linux.intel.com>
PR target/55948
* builtins.c (expand_builtin_atomic_clear): Add comment.
* config/i386/sync.md (UNSPEC_MOVA_RELEASE): Add.
(atomic_store_hle_release<mode>): Add
(atomic_store<mode>): Check for HLE RELEASE.
gcc/testsuite/:
2013-01-11 Andi Kleen <ak@linux.intel.com>
PR target/55948
* gcc.target/i386/hle-clear-rel.c: New file
* testsuite/gcc.target/i386/hle-store-rel.c: New file.
---
gcc/builtins.c | 2 ++
gcc/config/i386/sync.md | 16 ++++++++++++++++
gcc/testsuite/gcc.target/i386/hle-clear-rel.c | 9 +++++++++
gcc/testsuite/gcc.target/i386/hle-store-rel.c | 9 +++++++++
4 files changed, 36 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/i386/hle-clear-rel.c
create mode 100644 gcc/testsuite/gcc.target/i386/hle-store-rel.c
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2b615a1..c283869 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
return const0_rtx;
}
+ /* need target hook there to check for not hle acquire */
+
if (HAVE_atomic_clear)
{
emit_insn (gen_atomic_clear (mem, model));
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 8d22a5e..9eae57f 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -23,6 +23,7 @@
UNSPEC_SFENCE
UNSPEC_MFENCE
UNSPEC_MOVA ; For __atomic support
+ UNSPEC_MOVA_RELEASE
UNSPEC_LDA
UNSPEC_STA
])
@@ -194,6 +195,14 @@
DONE;
})
+(define_insn "atomic_store_hle_release<mode>"
+ [(set (match_operand:ATOMIC 0 "memory_operand")
+ (unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
+ (match_operand:SI 2 "const_int_operand")]
+ UNSPEC_MOVA_RELEASE))]
+ ""
+ "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
+
(define_expand "atomic_store<mode>"
[(set (match_operand:ATOMIC 0 "memory_operand")
(unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
@@ -214,6 +223,13 @@
}
else
{
+ if (model & IX86_HLE_RELEASE)
+ {
+ emit_insn (gen_atomic_store_hle_release<mode> (operands[0], operands[1],
+ operands[2]));
+ DONE;
+ }
+
/* For seq-cst stores, when we lack MFENCE, use XCHG. */
if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
{
diff --git a/gcc/testsuite/gcc.target/i386/hle-clear-rel.c b/gcc/testsuite/gcc.target/i386/hle-clear-rel.c
new file mode 100644
index 0000000..913f6d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/hle-clear-rel.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_clear (int *p, int v)
+{
+ __atomic_clear (p, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}
diff --git a/gcc/testsuite/gcc.target/i386/hle-store-rel.c b/gcc/testsuite/gcc.target/i386/hle-store-rel.c
new file mode 100644
index 0000000..7295d33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/hle-store-rel.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_store (int *p, int v)
+{
+ __atomic_store_n (p, v, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}
--
1.7.10.4
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-01-14 19:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-13 19:59 [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n Uros Bizjak
2013-01-13 20:36 ` Andi Kleen
2013-01-13 20:59 ` Uros Bizjak
2013-01-13 22:13 ` Andi Kleen
2013-01-13 22:23 ` Uros Bizjak
2013-01-13 22:29 ` Andi Kleen
2013-01-14 16:48 ` Uros Bizjak
2013-01-14 18:06 ` Andi Kleen
2013-01-14 18:41 ` Uros Bizjak
2013-01-14 19:02 ` Andi Kleen
2013-01-14 19:21 ` Uros Bizjak
2013-01-14 19:25 ` Uros Bizjak
-- strict thread matches above, loose matches on Subject: below --
2013-01-12 15:29 [PATCH 1/2] Document HLE / RTM intrinsics Andi Kleen
2013-01-12 15:29 ` [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).