public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
@ 2013-01-13 19:59 Uros Bizjak
  2013-01-13 20:36 ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-13 19:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

Hello!

> __atomic_clear and __atomic_store_n didn't have code to generate
> the TSX HLE RELEASE prefix. Add this plus test cases.

+(define_insn "atomic_store_hle_release<mode>"
+  [(set (match_operand:ATOMIC 0 "memory_operand")
+	(unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
+			(match_operand:SI 2 "const_int_operand")]
+		       UNSPEC_MOVA_RELEASE))]
+  ""
+  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")

This pattern doesn't have any constraints! Also, mov insn can store
immediates directly.

+      if (model & IX86_HLE_RELEASE)
+        {
+      	  emit_insn (gen_atomic_store_hle_release<mode> (operands[0],
operands[1],
+	  	  				         operands[2]));
+	  DONE;
+        }					
+
       /* For seq-cst stores, when we lack MFENCE, use XCHG.  */
       if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))

What about __ATOMIC_SEQ_CST; should

  __atomic_clear (p, __ATOMIC_SEQ_CST | __ATOMIC_HLE_RELEASE);

emit a mfence at the end; in case of for your test:

        xrelease movb   $0, (%rdi)
        mfence
        ret
?

+
+void
+hle_clear (int *p, int v)

hle_clear (char *p)

This argument should correspond to a bool, please see documentation.

I have also attached the patch that implements sync.md fixes.

Uros.

[-- Attachment #2: r.diff.txt --]
[-- Type: text/plain, Size: 1129 bytes --]

Index: config/i386/sync.md
===================================================================
--- config/i386/sync.md	(revision 195137)
+++ config/i386/sync.md	(working copy)
@@ -224,8 +224,12 @@
 	  DONE;
 	}
 
-      /* Otherwise use a normal store.  */
-      emit_move_insn (operands[0], operands[1]);
+      /* Otherwise use a store...  */
+      if (INTVAL (operands[2]) & IX86_HLE_RELEASE)
+	emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
+					     operands[2]));
+      else
+	emit_move_insn (operands[0], operands[1]);
     }
   /* ... followed by an MFENCE, if required.  */
   if (model == MEMMODEL_SEQ_CST)
@@ -233,6 +237,14 @@
   DONE;
 })
 
+(define_insn "atomic_store<mode>_1"
+  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
+	(unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
+			(match_operand:SI 2 "const_int_operand")]
+		       UNSPEC_MOVA))]
+  ""
+  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
+
 (define_insn_and_split "atomic_storedi_fpu"
   [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
 	(unspec:DI [(match_operand:DI 1 "register_operand" "x,m,?r")]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-13 19:59 [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n Uros Bizjak
@ 2013-01-13 20:36 ` Andi Kleen
  2013-01-13 20:59   ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-13 20:36 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Andi Kleen

On Sun, Jan 13, 2013 at 08:59:15PM +0100, Uros Bizjak wrote:
> Hello!
> 
> > __atomic_clear and __atomic_store_n didn't have code to generate
> > the TSX HLE RELEASE prefix. Add this plus test cases.
> 
> +(define_insn "atomic_store_hle_release<mode>"
> +  [(set (match_operand:ATOMIC 0 "memory_operand")
> +	(unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
> +			(match_operand:SI 2 "const_int_operand")]
> +		       UNSPEC_MOVA_RELEASE))]
> +  ""
> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
> 
> This pattern doesn't have any constraints! Also, mov insn can store
> immediates directly.

Can you suggest a better pattern?

> 
> +      if (model & IX86_HLE_RELEASE)
> +        {
> +      	  emit_insn (gen_atomic_store_hle_release<mode> (operands[0],
> operands[1],
> +	  	  				         operands[2]));
> +	  DONE;
> +        }					
> +
>        /* For seq-cst stores, when we lack MFENCE, use XCHG.  */
>        if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
> 
> What about __ATOMIC_SEQ_CST; should
> 
>   __atomic_clear (p, __ATOMIC_SEQ_CST | __ATOMIC_HLE_RELEASE);
> 
> emit a mfence at the end; in case of for your test:

Originally I thought not, but now on reconsideration it's needed for
older CPUs that don't know about the XRELEASE. And it may be even needed
with TSX for the non transactional fallback execution. I'll fix the patch.

> +
> +void
> +hle_clear (int *p, int v)
> 
> hle_clear (char *p)
> 
> This argument should correspond to a bool, please see documentation.


Not sure I understand? Which documentation? This is just a random test case

>    DONE;
>  })
>  
> +(define_insn "atomic_store<mode>_1"
> +  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
> +	(unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
> +			(match_operand:SI 2 "const_int_operand")]
> +		       UNSPEC_MOVA))]
> +  ""
> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")

Is that the updated pattern you wanted? It looks similar to mine.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-13 20:36 ` Andi Kleen
@ 2013-01-13 20:59   ` Uros Bizjak
  2013-01-13 22:13     ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-13 20:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

On Sun, Jan 13, 2013 at 9:36 PM, Andi Kleen <andi@firstfloor.org> wrote:

>> > __atomic_clear and __atomic_store_n didn't have code to generate
>> > the TSX HLE RELEASE prefix. Add this plus test cases.
>>
>> +(define_insn "atomic_store_hle_release<mode>"
>> +  [(set (match_operand:ATOMIC 0 "memory_operand")
>> +     (unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
>> +                     (match_operand:SI 2 "const_int_operand")]
>> +                    UNSPEC_MOVA_RELEASE))]
>> +  ""
>> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>>
>> This pattern doesn't have any constraints! Also, mov insn can store
>> immediates directly.
>
> Can you suggest a better pattern?

It is implemented in the patch, attached to my previous message.

>>
>> +      if (model & IX86_HLE_RELEASE)
>> +        {
>> +               emit_insn (gen_atomic_store_hle_release<mode> (operands[0],
>> operands[1],
>> +                                                      operands[2]));
>> +       DONE;
>> +        }
>> +
>>        /* For seq-cst stores, when we lack MFENCE, use XCHG.  */
>>        if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
>>
>> What about __ATOMIC_SEQ_CST; should
>>
>>   __atomic_clear (p, __ATOMIC_SEQ_CST | __ATOMIC_HLE_RELEASE);
>>
>> emit a mfence at the end; in case of for your test:
>
> Originally I thought not, but now on reconsideration it's needed for
> older CPUs that don't know about the XRELEASE. And it may be even needed
> with TSX for the non transactional fallback execution. I'll fix the patch.

Also fixed in my patch. It emits mfence at the end.

>> +
>> +void
>> +hle_clear (int *p, int v)
>>
>> hle_clear (char *p)
>>
>> This argument should correspond to a bool, please see documentation.
>
>
> Not sure I understand? Which documentation? This is just a random test case

Ah, I was referring to the gcc documentation about __atomic_clear.

>> +(define_insn "atomic_store<mode>_1"
>> +  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
>> +     (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
>> +                     (match_operand:SI 2 "const_int_operand")]
>> +                    UNSPEC_MOVA))]
>> +  ""
>> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>
> Is that the updated pattern you wanted? It looks similar to mine.

Yes the attached patch actually implements all proposed fixes.

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-13 20:59   ` Uros Bizjak
@ 2013-01-13 22:13     ` Andi Kleen
  2013-01-13 22:23       ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-13 22:13 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Andi Kleen, gcc-patches

> >> +(define_insn "atomic_store<mode>_1"
> >> +  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
> >> +     (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
> >> +                     (match_operand:SI 2 "const_int_operand")]
> >> +                    UNSPEC_MOVA))]
> >> +  ""
> >> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
> >
> > Is that the updated pattern you wanted? It looks similar to mine.
> 
> Yes the attached patch actually implements all proposed fixes.

Ok great. Can you just commit it then? It looks good to me.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-13 22:13     ` Andi Kleen
@ 2013-01-13 22:23       ` Uros Bizjak
  2013-01-13 22:29         ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-13 22:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

On Sun, Jan 13, 2013 at 11:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> +(define_insn "atomic_store<mode>_1"
>> >> +  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
>> >> +     (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
>> >> +                     (match_operand:SI 2 "const_int_operand")]
>> >> +                    UNSPEC_MOVA))]
>> >> +  ""
>> >> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>> >
>> > Is that the updated pattern you wanted? It looks similar to mine.
>>
>> Yes the attached patch actually implements all proposed fixes.
>
> Ok great. Can you just commit it then? It looks good to me.

No problem, but what about this part:

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2b615a1..c283869 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
       return const0_rtx;
     }

+  /* need target hook there to check for not hle acquire */
+
   if (HAVE_atomic_clear)
     {
       emit_insn (gen_atomic_clear (mem, model));

Middle-end support should be implemented before target support is
committed. So, please figure out how to emit correct error on
unsupported models and get middle-end patch reviewed first. We do get
"Error: instruction `mov' after `xacquire' not allowed" assembler
error with "xacquire movb $0,mem" asm, though.

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-13 22:23       ` Uros Bizjak
@ 2013-01-13 22:29         ` Andi Kleen
  2013-01-14 16:48           ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-13 22:29 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Andi Kleen, gcc-patches

On Sun, Jan 13, 2013 at 11:23:24PM +0100, Uros Bizjak wrote:
> On Sun, Jan 13, 2013 at 11:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >> >> +(define_insn "atomic_store<mode>_1"
> >> >> +  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
> >> >> +     (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
> >> >> +                     (match_operand:SI 2 "const_int_operand")]
> >> >> +                    UNSPEC_MOVA))]
> >> >> +  ""
> >> >> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
> >> >
> >> > Is that the updated pattern you wanted? It looks similar to mine.
> >>
> >> Yes the attached patch actually implements all proposed fixes.
> >
> > Ok great. Can you just commit it then? It looks good to me.
> 
> No problem, but what about this part:

Right now it just means its silently ignored, no wrong code generated.
If people are ok with a new target hook I can add one.
There are some more bugs in this area, like PR55947

Giving a warning is imho less important than supporting this at all.
So I would prefer to not delay this patch.

> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 2b615a1..c283869 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
>        return const0_rtx;
>      }
> 
> +  /* need target hook there to check for not hle acquire */
> +
>    if (HAVE_atomic_clear)
>      {
>        emit_insn (gen_atomic_clear (mem, model));
> 
> Middle-end support should be implemented before target support is
> committed. So, please figure out how to emit correct error on
> unsupported models and get middle-end patch reviewed first. We do get
> "Error: instruction `mov' after `xacquire' not allowed" assembler
> error with "xacquire movb $0,mem" asm, though.

The sync.md code is only called for the acquire bit.

The only case where it may happen I guess if someone sets both.


-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-13 22:29         ` Andi Kleen
@ 2013-01-14 16:48           ` Uros Bizjak
  2013-01-14 18:06             ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-14 16:48 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]

On Sun, Jan 13, 2013 at 11:29 PM, Andi Kleen <andi@firstfloor.org> wrote:
> On Sun, Jan 13, 2013 at 11:23:24PM +0100, Uros Bizjak wrote:
>> On Sun, Jan 13, 2013 at 11:12 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> >> +(define_insn "atomic_store<mode>_1"
>> >> >> +  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
>> >> >> +     (unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
>> >> >> +                     (match_operand:SI 2 "const_int_operand")]
>> >> >> +                    UNSPEC_MOVA))]
>> >> >> +  ""
>> >> >> +  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
>> >> >
>> >> > Is that the updated pattern you wanted? It looks similar to mine.
>> >>
>> >> Yes the attached patch actually implements all proposed fixes.
>> >
>> > Ok great. Can you just commit it then? It looks good to me.
>>
>> No problem, but what about this part:
>
> Right now it just means its silently ignored, no wrong code generated.
> If people are ok with a new target hook I can add one.
> There are some more bugs in this area, like PR55947
>
> Giving a warning is imho less important than supporting this at all.
> So I would prefer to not delay this patch.
>
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index 2b615a1..c283869 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
>>        return const0_rtx;
>>      }
>>
>> +  /* need target hook there to check for not hle acquire */
>> +
>>    if (HAVE_atomic_clear)
>>      {
>>        emit_insn (gen_atomic_clear (mem, model));
>>
>> Middle-end support should be implemented before target support is
>> committed. So, please figure out how to emit correct error on
>> unsupported models and get middle-end patch reviewed first. We do get
>> "Error: instruction `mov' after `xacquire' not allowed" assembler
>> error with "xacquire movb $0,mem" asm, though.
>
> The sync.md code is only called for the acquire bit.
>
> The only case where it may happen I guess if someone sets both.

This cannot happen, we reject code that sets both __HLE* flags.

2012-01-14  Uros Bizjak  <ubizjak@gmail.com>
	    Andi Kleen  <ak@linux.intel.com>

	PR target/55948
	* config/i386/sync.md (atomic_store<mode>_1): New pattern.
	(atomic_store<mode>): Call atomic_store<mode>_1 for IX86_HLE_RELEASE
	memmodel flag.

testsuite/ChangeLog

2012-01-14  Andi Kleen  <ak@linux.intel.com>

	PR target/55948
	* gcc.target/i386/hle-clear-rel.c: New file
	* gcc.target/i386/hle-store-rel.c: New file.

I have committed attached patch to mainline SVN, after re-tested it on
x86_64-pc-linux-gnu.

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 2153 bytes --]

Index: config/i386/sync.md
===================================================================
--- config/i386/sync.md	(revision 195152)
+++ config/i386/sync.md	(working copy)
@@ -224,8 +224,12 @@
 	  DONE;
 	}
 
-      /* Otherwise use a normal store.  */
-      emit_move_insn (operands[0], operands[1]);
+      /* Otherwise use a store.  */
+      if (INTVAL (operands[2]) & IX86_HLE_RELEASE)
+	emit_insn (gen_atomic_store<mode>_1 (operands[0], operands[1],
+					     operands[2]));
+      else
+	emit_move_insn (operands[0], operands[1]);
     }
   /* ... followed by an MFENCE, if required.  */
   if (model == MEMMODEL_SEQ_CST)
@@ -233,6 +237,14 @@
   DONE;
 })
 
+(define_insn "atomic_store<mode>_1"
+  [(set (match_operand:ATOMIC 0 "memory_operand" "=m")
+	(unspec:ATOMIC [(match_operand:ATOMIC 1 "<nonmemory_operand>" "<r><i>")
+			(match_operand:SI 2 "const_int_operand")]
+		       UNSPEC_MOVA))]
+  ""
+  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
+
 (define_insn_and_split "atomic_storedi_fpu"
   [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
 	(unspec:DI [(match_operand:DI 1 "register_operand" "x,m,?r")]
Index: testsuite/gcc.target/i386/hle-clear-rel.c
===================================================================
--- testsuite/gcc.target/i386/hle-clear-rel.c	(revision 0)
+++ testsuite/gcc.target/i386/hle-clear-rel.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_clear (char *p, int v)
+{
+  __atomic_clear (p, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}
Index: testsuite/gcc.target/i386/hle-store-rel.c
===================================================================
--- testsuite/gcc.target/i386/hle-store-rel.c	(revision 0)
+++ testsuite/gcc.target/i386/hle-store-rel.c	(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_store (int *p, int v)
+{
+  __atomic_store_n (p, v, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-14 16:48           ` Uros Bizjak
@ 2013-01-14 18:06             ` Andi Kleen
  2013-01-14 18:41               ` Uros Bizjak
  0 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2013-01-14 18:06 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Andi Kleen, gcc-patches

> This cannot happen, we reject code that sets both __HLE* flags.

Good thanks.

BTW I found more HLE bugs, it looks like some of the fetch_op_*
patterns do not match always and fall back to cmpxchg, which
does not generate HLE code correctly. Not fully sure what's 
wrong, can you spot any obvious problems? You changed the

(define_insn "atomic_<logic><mode>"

pattern last.

The only one that should really fallback to cmpxchg is nand,
all the others can be generated directly.

This can be seen by commenting in the #if 0 case in the libstdc++
HLE patch test case I sent yesterday. 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55966

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-14 18:06             ` Andi Kleen
@ 2013-01-14 18:41               ` Uros Bizjak
  2013-01-14 19:02                 ` Andi Kleen
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2013-01-14 18:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

On Mon, Jan 14, 2013 at 7:06 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> This cannot happen, we reject code that sets both __HLE* flags.
>
> BTW I found more HLE bugs, it looks like some of the fetch_op_*
> patterns do not match always and fall back to cmpxchg, which
> does not generate HLE code correctly. Not fully sure what's
> wrong, can you spot any obvious problems? You changed the
>
> (define_insn "atomic_<logic><mode>"
>
> pattern last.

I don't think this is a target problem, these insns work as expected
and are covered by extensive testsuite in gcc.target/i386/hle-*.c.

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-14 18:41               ` Uros Bizjak
@ 2013-01-14 19:02                 ` Andi Kleen
  2013-01-14 19:21                   ` Uros Bizjak
  2013-01-14 19:25                   ` Uros Bizjak
  0 siblings, 2 replies; 13+ messages in thread
From: Andi Kleen @ 2013-01-14 19:02 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Andi Kleen, gcc-patches

On Mon, Jan 14, 2013 at 07:40:56PM +0100, Uros Bizjak wrote:
> On Mon, Jan 14, 2013 at 7:06 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >> This cannot happen, we reject code that sets both __HLE* flags.
> >
> > BTW I found more HLE bugs, it looks like some of the fetch_op_*
> > patterns do not match always and fall back to cmpxchg, which
> > does not generate HLE code correctly. Not fully sure what's
> > wrong, can you spot any obvious problems? You changed the
> >
> > (define_insn "atomic_<logic><mode>"
> >
> > pattern last.
> 
> I don't think this is a target problem, these insns work as expected
> and are covered by extensive testsuite in gcc.target/i386/hle-*.c.

Well the C++ test cases I wrote didn't work. It may be related to 
how complex the program is. Simple calls as in the original
test suite seem to work.

e.g.  instead of xacquire lock and ... it ended up with a cmpxchg loop
(which I think is a fallback path). The cmpxchg loop didn't include
a HLE prefix (and simply adding one is not enoigh, would need more
changes for successfull elision)

Before HLE the cmpxchg code was correct, just somewhat inefficient.
Even with HLE it is technically correct, just it'll never elide.

I think I would like to fix and,or,xor and disallow HLE for nand.

Here's a test case. Needs the libstdc++ HLE patch posted.

#include <atomic>

#define ACQ memory_order_acquire | __memory_order_hle_acquire
#define REL memory_order_release | __memory_order_hle_release

int main()
{
  using namespace std;
  atomic_ulong au = ATOMIC_VAR_INIT(0);

  if (!au.fetch_and(1, ACQ))
    au.fetch_and(-1, REL);

  unsigned lock = 0;
  __atomic_fetch_and(&lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE);

  return 0;
}

The first fetch_and generates: (wrong)


.L2:
        movq    %rax, %rcx
        movq    %rax, %rdx
        andl    $1, %ecx
        lock; cmpxchgq  %rcx, -24(%rsp)
        jne     .L2


the second __atomic_fetch_and generates (correct):


        lock; 
        .byte   0xf2
        andl    $1, -28(%rsp)
.LBE14:


-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-14 19:02                 ` Andi Kleen
@ 2013-01-14 19:21                   ` Uros Bizjak
  2013-01-14 19:25                   ` Uros Bizjak
  1 sibling, 0 replies; 13+ messages in thread
From: Uros Bizjak @ 2013-01-14 19:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1396 bytes --]

On Mon, Jan 14, 2013 at 8:01 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> This cannot happen, we reject code that sets both __HLE* flags.
>> >
>> > BTW I found more HLE bugs, it looks like some of the fetch_op_*
>> > patterns do not match always and fall back to cmpxchg, which
>> > does not generate HLE code correctly. Not fully sure what's
>> > wrong, can you spot any obvious problems? You changed the
>> >
>> > (define_insn "atomic_<logic><mode>"
>> >
>> > pattern last.
>>
>> I don't think this is a target problem, these insns work as expected
>> and are covered by extensive testsuite in gcc.target/i386/hle-*.c.
>
> Well the C++ test cases I wrote didn't work. It may be related to
> how complex the program is. Simple calls as in the original
> test suite seem to work.
>
> e.g.  instead of xacquire lock and ... it ended up with a cmpxchg loop
> (which I think is a fallback path). The cmpxchg loop didn't include
> a HLE prefix (and simply adding one is not enoigh, would need more
> changes for successfull elision)
>
> Before HLE the cmpxchg code was correct, just somewhat inefficient.
> Even with HLE it is technically correct, just it'll never elide.

I'd start with attached (mechanical) patch that just blindly adds
masks where memory model is checked. Please note that ATOMIC_HLE
modifies high bits of the model, so these checks fail in presence of
HLE modifiers.

Uros.

[-- Attachment #2: m.diff.txt --]
[-- Type: text/plain, Size: 2767 bytes --]

Index: emit-rtl.c
===================================================================
--- emit-rtl.c	(revision 195152)
+++ emit-rtl.c	(working copy)
@@ -6014,7 +6014,7 @@ insn_file (const_rtx insn)
 bool
 need_atomic_barrier_p (enum memmodel model, bool pre)
 {
-  switch (model)
+  switch (model & MEMMODEL_MASK)
     {
     case MEMMODEL_RELAXED:
     case MEMMODEL_CONSUME:
Index: optabs.c
===================================================================
--- optabs.c	(revision 195152)
+++ optabs.c	(working copy)
@@ -7008,9 +7008,9 @@ maybe_emit_sync_lock_test_and_set (rtx target, rtx
      exists, and the memory model is stronger than acquire, add a release 
      barrier before the instruction.  */
 
-  if (model == MEMMODEL_SEQ_CST
-      || model == MEMMODEL_RELEASE
-      || model == MEMMODEL_ACQ_REL)
+  if ((model & MEMMODEL_MASK) == MEMMODEL_SEQ_CST
+      || (model & MEMMODEL_MASK) == MEMMODEL_RELEASE
+      || (model & MEMMODEL_MASK) == MEMMODEL_ACQ_REL)
     expand_mem_thread_fence (model);
 
   if (icode != CODE_FOR_nothing)
@@ -7388,7 +7388,7 @@ expand_mem_thread_fence (enum memmodel model)
 {
   if (HAVE_mem_thread_fence)
     emit_insn (gen_mem_thread_fence (GEN_INT (model)));
-  else if (model != MEMMODEL_RELAXED)
+  else if ((model & MEMMODEL_MASK) != MEMMODEL_RELAXED)
     {
       if (HAVE_memory_barrier)
 	emit_insn (gen_memory_barrier ());
@@ -7412,7 +7412,7 @@ expand_mem_signal_fence (enum memmodel model)
 {
   if (HAVE_mem_signal_fence)
     emit_insn (gen_mem_signal_fence (GEN_INT (model)));
-  else if (model != MEMMODEL_RELAXED)
+  else if ((model & MEMMODEL_MASK) != MEMMODEL_RELAXED)
     {
       /* By default targets are coherent between a thread and the signal
 	 handler running on the same thread.  Thus this really becomes a
@@ -7467,7 +7467,7 @@ expand_atomic_load (rtx target, rtx mem, enum memm
     target = gen_reg_rtx (mode);
 
   /* For SEQ_CST, emit a barrier before the load.  */
-  if (model == MEMMODEL_SEQ_CST)
+  if ((model & MEMMODEL_MASK) == MEMMODEL_SEQ_CST)
     expand_mem_thread_fence (model);
 
   emit_move_insn (target, mem);
@@ -7513,7 +7513,7 @@ expand_atomic_store (rtx mem, rtx val, enum memmod
 	  if (maybe_expand_insn (icode, 2, ops))
 	    {
 	      /* lock_release is only a release barrier.  */
-	      if (model == MEMMODEL_SEQ_CST)
+	      if ((model & MEMMODEL_MASK) == MEMMODEL_SEQ_CST)
 		expand_mem_thread_fence (model);
 	      return const0_rtx;
 	    }
@@ -7540,7 +7540,7 @@ expand_atomic_store (rtx mem, rtx val, enum memmod
   emit_move_insn (mem, val);
 
   /* For SEQ_CST, also emit a barrier after the store.  */
-  if (model == MEMMODEL_SEQ_CST)
+  if ((model & MEMMODEL_MASK) == MEMMODEL_SEQ_CST)
     expand_mem_thread_fence (model);
 
   return const0_rtx;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-14 19:02                 ` Andi Kleen
  2013-01-14 19:21                   ` Uros Bizjak
@ 2013-01-14 19:25                   ` Uros Bizjak
  1 sibling, 0 replies; 13+ messages in thread
From: Uros Bizjak @ 2013-01-14 19:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gcc-patches

On Mon, Jan 14, 2013 at 8:01 PM, Andi Kleen <andi@firstfloor.org> wrote:

> Well the C++ test cases I wrote didn't work. It may be related to
> how complex the program is. Simple calls as in the original
> test suite seem to work.
>
> e.g.  instead of xacquire lock and ... it ended up with a cmpxchg loop
> (which I think is a fallback path). The cmpxchg loop didn't include
> a HLE prefix (and simply adding one is not enoigh, would need more
> changes for successfull elision)
>
> Before HLE the cmpxchg code was correct, just somewhat inefficient.
> Even with HLE it is technically correct, just it'll never elide.
>
> I think I would like to fix and,or,xor and disallow HLE for nand.
>
> Here's a test case. Needs the libstdc++ HLE patch posted.

Can you please attach _preprocessed_ (i.e. add -save-temps to compile
flags) source to a PR?

Uros.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n
  2013-01-12 15:29 [PATCH 1/2] Document HLE / RTM intrinsics Andi Kleen
@ 2013-01-12 15:29 ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2013-01-12 15:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

__atomic_clear and __atomic_store_n didn't have code to generate
the TSX HLE RELEASE prefix. Add this plus test cases.

Right now it would need another target hook to check for someone
passing __ATOMIC_HLE_ACQUIRE to store/clear. I just ignore this
for now.

Passes bootstrap/test on x86_64.

Ok for release branch / trunk ?

gcc/:
2013-01-11  Andi Kleen  <ak@linux.intel.com>

	PR target/55948
	* builtins.c (expand_builtin_atomic_clear): Add comment.
	* config/i386/sync.md (UNSPEC_MOVA_RELEASE): Add.
	(atomic_store_hle_release<mode>): Add
        (atomic_store<mode>): Check for HLE RELEASE.

gcc/testsuite/:
2013-01-11  Andi Kleen  <ak@linux.intel.com>

	PR target/55948
	* gcc.target/i386/hle-clear-rel.c: New file
	* testsuite/gcc.target/i386/hle-store-rel.c: New file.
---
 gcc/builtins.c                                |    2 ++
 gcc/config/i386/sync.md                       |   16 ++++++++++++++++
 gcc/testsuite/gcc.target/i386/hle-clear-rel.c |    9 +++++++++
 gcc/testsuite/gcc.target/i386/hle-store-rel.c |    9 +++++++++
 4 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/hle-clear-rel.c
 create mode 100644 gcc/testsuite/gcc.target/i386/hle-store-rel.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2b615a1..c283869 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5556,6 +5556,8 @@ expand_builtin_atomic_clear (tree exp)
       return const0_rtx;
     }
 
+  /* need target hook there to check for not hle acquire */
+
   if (HAVE_atomic_clear)
     {
       emit_insn (gen_atomic_clear (mem, model));
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 8d22a5e..9eae57f 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -23,6 +23,7 @@
   UNSPEC_SFENCE
   UNSPEC_MFENCE
   UNSPEC_MOVA	; For __atomic support
+  UNSPEC_MOVA_RELEASE
   UNSPEC_LDA
   UNSPEC_STA
 ])
@@ -194,6 +195,14 @@
   DONE;
 })
 
+(define_insn "atomic_store_hle_release<mode>"
+  [(set (match_operand:ATOMIC 0 "memory_operand")
+	(unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
+			(match_operand:SI 2 "const_int_operand")]
+		       UNSPEC_MOVA_RELEASE))]
+  ""
+  "%K2mov{<imodesuffix>}\t{%1, %0|%0, %1}")
+
 (define_expand "atomic_store<mode>"
   [(set (match_operand:ATOMIC 0 "memory_operand")
 	(unspec:ATOMIC [(match_operand:ATOMIC 1 "register_operand")
@@ -214,6 +223,13 @@
     }
   else
     {
+      if (model & IX86_HLE_RELEASE)
+        {
+      	  emit_insn (gen_atomic_store_hle_release<mode> (operands[0], operands[1],
+	  	  				         operands[2]));
+	  DONE;
+        }					       
+
       /* For seq-cst stores, when we lack MFENCE, use XCHG.  */
       if (model == MEMMODEL_SEQ_CST && !(TARGET_64BIT || TARGET_SSE2))
 	{
diff --git a/gcc/testsuite/gcc.target/i386/hle-clear-rel.c b/gcc/testsuite/gcc.target/i386/hle-clear-rel.c
new file mode 100644
index 0000000..913f6d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/hle-clear-rel.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_clear (int *p, int v)
+{
+  __atomic_clear (p, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}
diff --git a/gcc/testsuite/gcc.target/i386/hle-store-rel.c b/gcc/testsuite/gcc.target/i386/hle-store-rel.c
new file mode 100644
index 0000000..7295d33
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/hle-store-rel.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mhle" } */
+/* { dg-final { scan-assembler "\[ \n\t\]+\(xrelease\|\.byte\[ \t\]+0xf3\)\[ \t\n\]+mov" } } */
+
+void
+hle_store (int *p, int v)
+{
+  __atomic_store_n (p, v, __ATOMIC_RELEASE | __ATOMIC_HLE_RELEASE);
+}
-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-01-14 19:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-13 19:59 [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n Uros Bizjak
2013-01-13 20:36 ` Andi Kleen
2013-01-13 20:59   ` Uros Bizjak
2013-01-13 22:13     ` Andi Kleen
2013-01-13 22:23       ` Uros Bizjak
2013-01-13 22:29         ` Andi Kleen
2013-01-14 16:48           ` Uros Bizjak
2013-01-14 18:06             ` Andi Kleen
2013-01-14 18:41               ` Uros Bizjak
2013-01-14 19:02                 ` Andi Kleen
2013-01-14 19:21                   ` Uros Bizjak
2013-01-14 19:25                   ` Uros Bizjak
  -- strict thread matches above, loose matches on Subject: below --
2013-01-12 15:29 [PATCH 1/2] Document HLE / RTM intrinsics Andi Kleen
2013-01-12 15:29 ` [PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).