public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier
@ 2017-08-02 17:45 Alexander Monakov
  2017-08-02 17:45 ` [PATCH 3/3] optabs: ensure atomic_load/stores have compiler barriers Alexander Monakov
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Alexander Monakov @ 2017-08-02 17:45 UTC (permalink / raw)
  To: gcc-patches

As recently discussed in the previous thread for PR 80640, some targets have
sufficiently strong hardware memory ordering that implementation of C11 atomic
fences might not need machine barriers.  However, a compiler memory barrier
nevertheless must be present, and at least two targets (x86, s390) got this
wrong in their .md expanders.

Handle it in the middle-end by detecting empty expansion and emitting a
compiler memory barrier.  If the backend produced non-empty RTL, expect that
it's using the same RTX structure as in "memory_barrier" (__sync_synchronize)
expansion, which must be a compiler memory on its own.

A followup refactor is possible in expand_mem_thread_fence to make it start with

  if (is_mm_relaxed (model))
    return;

as it's not useful to pass __ATOMIC_RELAXED model to gen_mem_thread_fence.

	PR target/80640
	* doc/md.texi (mem_thread_fence): Remove mention of mode.  Clarify that
        empty expansion is handled by placing a compiler barrier.
        * optabs.c (expand_mem_thread_fence): Place a compiler barrier if
        targetm.gen_mem_thread_fence produced no instructions.
testsuite/
	* gcc.dg/atomic/pr80640.c: New testcase.

---
 gcc/doc/md.texi                       |  9 +++++++--
 gcc/optabs.c                          |  7 ++++++-
 gcc/testsuite/gcc.dg/atomic/pr80640.c | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/atomic/pr80640.c

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index ea959208c98..0be161a08dc 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7044,11 +7044,16 @@ If these patterns are not defined, attempts will be made to use
 counterparts.  If none of these are available a compare-and-swap
 loop will be used.
 
-@cindex @code{mem_thread_fence@var{mode}} instruction pattern
-@item @samp{mem_thread_fence@var{mode}}
+@cindex @code{mem_thread_fence} instruction pattern
+@item @samp{mem_thread_fence}
 This pattern emits code required to implement a thread fence with
 memory model semantics.  Operand 0 is the memory model to be used.
 
+Expanding this pattern may produce no instructions if no machine barrier
+is required on the target for given memory model.  In that case, unless
+memory model is @code{__ATOMIC_RELAXED}, a compiler memory barrier is
+emitted automatically.
+
 If this pattern is not specified, all memory models except
 @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
 barrier pattern.
diff --git a/gcc/optabs.c b/gcc/optabs.c
index a9900657a58..d568ca376fe 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6298,7 +6298,12 @@ void
 expand_mem_thread_fence (enum memmodel model)
 {
   if (targetm.have_mem_thread_fence ())
-    emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
+    {
+      rtx_insn *last = get_last_insn ();
+      emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
+      if (last == get_last_insn () && !is_mm_relaxed (model))
+	expand_asm_memory_barrier ();
+    }
   else if (!is_mm_relaxed (model))
     {
       if (targetm.have_memory_barrier ())
diff --git a/gcc/testsuite/gcc.dg/atomic/pr80640.c b/gcc/testsuite/gcc.dg/atomic/pr80640.c
new file mode 100644
index 00000000000..fd17978a482
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic/pr80640.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-pthread" } */
+/* { dg-require-effective-target pthread } */
+
+#include <pthread.h>
+
+static volatile int sem1;
+static volatile int sem2;
+
+static void *f(void *va)
+{
+  void **p = va;
+  if (*p) return *p;
+  sem1 = 1;
+  while (!sem2);
+  __atomic_thread_fence(__ATOMIC_ACQUIRE);
+  // GCC used to RTL-CSE this and the first load, causing 0 to be returned
+  return *p;
+}
+
+int main()
+{
+  void *p = 0;
+  pthread_t thr;
+  if (pthread_create(&thr, 0, f, &p))
+    return 2;
+  while (!sem1);
+  __atomic_thread_fence(__ATOMIC_ACQUIRE);
+  p = &p;
+  __atomic_thread_fence(__ATOMIC_RELEASE);
+  sem2 = 1;
+  pthread_join(thr, &p);
+  return !p;
+}
-- 
2.13.3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/3] optabs: ensure atomic_load/stores have compiler barriers
  2017-08-02 17:45 [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier Alexander Monakov
@ 2017-08-02 17:45 ` Alexander Monakov
  2017-08-02 17:45 ` [PATCH 2/3] retire mem_signal_fence pattern Alexander Monakov
  2017-08-05  9:03 ` [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier Richard Sandiford
  2 siblings, 0 replies; 14+ messages in thread
From: Alexander Monakov @ 2017-08-02 17:45 UTC (permalink / raw)
  To: gcc-patches

Again, like in patch 1/3, a backend might expand C11 atomic load/store to a
volatile memory access if hardware memory ordering is sufficiently strong that
no machine barrier is required.  Nevertheless, we must ensure that compiler
memory barrier(s) are present before/after the access to prevent wrong code
transformations.  Simply place them as appropriate in optabs.c expansion code.

There's ALIAS_SET_MEMORY_BARRIER placed on MEMs accessed via atomics, but it's
probably not useful, it's very hard to audit all RTL passes and ensure they
respect it.  On the other hand, asm barriers must work or else much of
real-world code will break.

        PR rtl-optimization/57448
        PR target/67458
        PR target/81316
	* optabs.c (expand_atomic_load): Place compiler memory barriers if using
        atomic_load pattern.
        (expand_atomic_store): Likewise.
testsuite/
	* gcc.dg/atomic/pr80640-2.c: New testcase.
	* gcc.dg/atomic/pr81316.c: New testcase.
---
 gcc/optabs.c                            | 20 ++++++++++++++++++--
 gcc/testsuite/gcc.dg/atomic/pr80640-2.c | 32 ++++++++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/atomic/pr81316.c   | 29 +++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/atomic/pr80640-2.c
 create mode 100644 gcc/testsuite/gcc.dg/atomic/pr81316.c

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 6884fd4b8aa..11977d6d53f 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6343,12 +6343,20 @@ expand_atomic_load (rtx target, rtx mem, enum memmodel model)
   if (icode != CODE_FOR_nothing)
     {
       struct expand_operand ops[3];
+      rtx_insn *last = get_last_insn ();
+      if (is_mm_seq_cst (model))
+	expand_asm_memory_barrier ();
 
       create_output_operand (&ops[0], target, mode);
       create_fixed_operand (&ops[1], mem);
       create_integer_operand (&ops[2], model);
       if (maybe_expand_insn (icode, 3, ops))
-	return ops[0].value;
+	{
+	  if (!is_mm_relaxed (model))
+	    expand_asm_memory_barrier ();
+	  return ops[0].value;
+	}
+      delete_insns_since (last);
     }
 
   /* If the size of the object is greater than word size on this target,
@@ -6393,11 +6401,19 @@ expand_atomic_store (rtx mem, rtx val, enum memmodel model, bool use_release)
   icode = direct_optab_handler (atomic_store_optab, mode);
   if (icode != CODE_FOR_nothing)
     {
+      rtx_insn *last = get_last_insn ();
+      if (!is_mm_relaxed (model))
+	expand_asm_memory_barrier ();
       create_fixed_operand (&ops[0], mem);
       create_input_operand (&ops[1], val, mode);
       create_integer_operand (&ops[2], model);
       if (maybe_expand_insn (icode, 3, ops))
-	return const0_rtx;
+	{
+	  if (is_mm_seq_cst (model))
+	    expand_asm_memory_barrier ();
+	  return const0_rtx;
+	}
+      delete_insns_since (last);
     }
 
   /* If using __sync_lock_release is a viable alternative, try it.
diff --git a/gcc/testsuite/gcc.dg/atomic/pr80640-2.c b/gcc/testsuite/gcc.dg/atomic/pr80640-2.c
new file mode 100644
index 00000000000..a735054090d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic/pr80640-2.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-pthread" } */
+/* { dg-require-effective-target pthread } */
+
+#include <pthread.h>
+
+static volatile int sem1;
+static _Atomic  int sem2;
+
+static void *f(void *va)
+{
+  void **p = va;
+  if (*p) return *p;
+  sem1 = 1;
+  while (!__atomic_load_n(&sem2, __ATOMIC_ACQUIRE));
+  // GCC used to RTL-CSE this and the first load, causing 0 to be returned
+  return *p;
+}
+
+int main()
+{
+  void *p = 0;
+  pthread_t thr;
+  if (pthread_create(&thr, 0, f, &p))
+    return 2;
+  while (!sem1);
+  __atomic_thread_fence(__ATOMIC_ACQUIRE);
+  p = &p;
+  __atomic_store_n(&sem2, 1, __ATOMIC_RELEASE);
+  pthread_join(thr, &p);
+  return !p;
+}
diff --git a/gcc/testsuite/gcc.dg/atomic/pr81316.c b/gcc/testsuite/gcc.dg/atomic/pr81316.c
new file mode 100644
index 00000000000..ef10095718e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic/pr81316.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-pthread" } */
+/* { dg-require-effective-target pthread } */
+
+#include <pthread.h>
+#include <stdlib.h>
+
+static _Atomic int sem1;
+
+static void *f(void *va)
+{
+  void **p = va;
+  while (!__atomic_load_n(&sem1, __ATOMIC_ACQUIRE));
+  exit(!*p);
+}
+
+int main(int argc)
+{
+  void *p = 0;
+  pthread_t thr;
+  if (pthread_create(&thr, 0, f, &p))
+    return 2;
+  // GCC used to RTL-DSE this store
+  p = &p;
+  __atomic_store_n(&sem1, 1, __ATOMIC_RELEASE);
+  int r = -1;
+  while (r < 0) asm("":"+r"(r));
+  return r;
+}
-- 
2.13.3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] retire mem_signal_fence pattern
  2017-08-02 17:45 [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier Alexander Monakov
  2017-08-02 17:45 ` [PATCH 3/3] optabs: ensure atomic_load/stores have compiler barriers Alexander Monakov
@ 2017-08-02 17:45 ` Alexander Monakov
  2017-08-28 12:39   ` [PING][PATCH " Alexander Monakov
  2017-08-05  9:03 ` [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier Richard Sandiford
  2 siblings, 1 reply; 14+ messages in thread
From: Alexander Monakov @ 2017-08-02 17:45 UTC (permalink / raw)
  To: gcc-patches

Similar to mem_thread_fence issue from the patch 1/3, RTL representation of
__atomic_signal_fence must be a compiler barrier.  We have just one backend
offering this pattern, and it does not place a compiler barrier.

It does not appear useful to expand signal_fence to some kind of hardware
instruction, its documentation is wrong/outdated, and we are using it
incorrectly anyway.  So just remove the whole thing and simply emit a compiler
memory barrier in the optabs.c handler.

	* config/s390/s390.md (mem_signal_fence): Remove.
        * doc/md.texi (mem_signal_fence): Remove.
        * optabs.c (expand_mem_signal_fence): Remove uses of mem_signal_fence.
        Update comments.
        * target-insns.def (mem_signal_fence): Remove.

---
 gcc/config/s390/s390.md |  9 ---------
 gcc/doc/md.texi         | 13 -------------
 gcc/optabs.c            | 17 +++++------------
 gcc/target-insns.def    |  1 -
 4 files changed, 5 insertions(+), 35 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index d1ac0b8395d..1d63523d3b0 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -10084,15 +10084,6 @@ (define_insn "*basr_tls"
 ; memory barrier patterns.
 ;
 
-(define_expand "mem_signal_fence"
-  [(match_operand:SI 0 "const_int_operand")]		;; model
-  ""
-{
-  /* The s390 memory model is strong enough not to require any
-     barrier in order to synchronize a thread with itself.  */
-  DONE;
-})
-
 (define_expand "mem_thread_fence"
   [(match_operand:SI 0 "const_int_operand")]		;; model
   ""
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 0be161a08dc..73d0e7d83c0 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7058,19 +7058,6 @@ If this pattern is not specified, all memory models except
 @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
 barrier pattern.
 
-@cindex @code{mem_signal_fence@var{mode}} instruction pattern
-@item @samp{mem_signal_fence@var{mode}}
-This pattern emits code required to implement a signal fence with
-memory model semantics.  Operand 0 is the memory model to be used.
-
-This pattern should impact the compiler optimizers the same way that
-mem_signal_fence does, but it does not need to issue any barrier
-instructions.
-
-If this pattern is not specified, all memory models except
-@code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
-barrier pattern.
-
 @cindex @code{get_thread_pointer@var{mode}} instruction pattern
 @cindex @code{set_thread_pointer@var{mode}} instruction pattern
 @item @samp{get_thread_pointer@var{mode}}
diff --git a/gcc/optabs.c b/gcc/optabs.c
index d568ca376fe..6884fd4b8aa 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6315,22 +6315,15 @@ expand_mem_thread_fence (enum memmodel model)
     }
 }
 
-/* This routine will either emit the mem_signal_fence pattern or issue a 
-   sync_synchronize to generate a fence for memory model MEMMODEL.  */
+/* Emit a signal fence with given memory model.  */
 
 void
 expand_mem_signal_fence (enum memmodel model)
 {
-  if (targetm.have_mem_signal_fence ())
-    emit_insn (targetm.gen_mem_signal_fence (GEN_INT (model)));
-  else if (!is_mm_relaxed (model))
-    {
-      /* By default targets are coherent between a thread and the signal
-	 handler running on the same thread.  Thus this really becomes a
-	 compiler barrier, in that stores must not be sunk past
-	 (or raised above) a given point.  */
-      expand_asm_memory_barrier ();
-    }
+  /* No machine barrier is required to implement a signal fence, but
+     a compiler memory barrier must be issued, except for relaxed MM.  */
+  if (!is_mm_relaxed (model))
+    expand_asm_memory_barrier ();
 }
 
 /* This function expands the atomic load operation:
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index fb92f72ac29..4669439c7e1 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -58,7 +58,6 @@ DEF_TARGET_INSN (indirect_jump, (rtx x0))
 DEF_TARGET_INSN (insv, (rtx x0, rtx x1, rtx x2, rtx x3))
 DEF_TARGET_INSN (jump, (rtx x0))
 DEF_TARGET_INSN (load_multiple, (rtx x0, rtx x1, rtx x2))
-DEF_TARGET_INSN (mem_signal_fence, (rtx x0))
 DEF_TARGET_INSN (mem_thread_fence, (rtx x0))
 DEF_TARGET_INSN (memory_barrier, (void))
 DEF_TARGET_INSN (movstr, (rtx x0, rtx x1, rtx x2))
-- 
2.13.3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier
  2017-08-02 17:45 [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier Alexander Monakov
  2017-08-02 17:45 ` [PATCH 3/3] optabs: ensure atomic_load/stores have compiler barriers Alexander Monakov
  2017-08-02 17:45 ` [PATCH 2/3] retire mem_signal_fence pattern Alexander Monakov
@ 2017-08-05  9:03 ` Richard Sandiford
  2017-08-07  8:42   ` Alexander Monakov
  2 siblings, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2017-08-05  9:03 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches, law

Alexander Monakov <amonakov@ispras.ru> writes:
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index a9900657a58..d568ca376fe 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -6298,7 +6298,12 @@ void
>  expand_mem_thread_fence (enum memmodel model)
>  {
>    if (targetm.have_mem_thread_fence ())
> -    emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
> +    {
> +      rtx_insn *last = get_last_insn ();
> +      emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
> +      if (last == get_last_insn () && !is_mm_relaxed (model))
> +	expand_asm_memory_barrier ();
> +    }
>    else if (!is_mm_relaxed (model))
>      {
>        if (targetm.have_memory_barrier ())

It would be simpler to test whether targetm.gen_mem_thread_fence
returns NULL.

This feels a bit hacky though.  Checking whether a generator produced no
instructions is usually the test for whether the generator FAILed, which
should normally be treated as though the pattern wasn't defined to start
with.  This is useful if the FAIL condition is too complex to put in the
define_expand/insn C condition.

In those contexts, if a generator wants to succeed and generate no
instructions, it should emit a note instead.  (Yes, it would be good
to have a nicer interface...)

So IMO it's confusing to overload the empty sequence to mean something
else in this context, and it wouldn't work if a target does emit the kind
of "I didn't FAIL" note just mentioned.

FWIW, I agree with Jeff in the previous thread that (compared to this)
it would be more robust to make optabs.c emit the compile barrier
unconditionally and just leave the target to emit a machine barrier.

Your earlier patch to make targets emit at least a compile barrier
seemed good too.  Perhaps that way we can treat an empty sequence as a
FAIL in the normal way, or assert that the pattern doesn't FAIL.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier
  2017-08-05  9:03 ` [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier Richard Sandiford
@ 2017-08-07  8:42   ` Alexander Monakov
  2017-08-14 20:46     ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Monakov @ 2017-08-07  8:42 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches, law

On Sat, 5 Aug 2017, Richard Sandiford wrote:
> It would be simpler to test whether targetm.gen_mem_thread_fence
> returns NULL.
> 
> This feels a bit hacky though.  Checking whether a generator produced no
> instructions is usually the test for whether the generator FAILed, which
> should normally be treated as though the pattern wasn't defined to start
> with.  This is useful if the FAIL condition is too complex to put in the
> define_expand/insn C condition.
> 
> In those contexts, if a generator wants to succeed and generate no
> instructions, it should emit a note instead.  (Yes, it would be good
> to have a nicer interface...)
> 
> So IMO it's confusing to overload the empty sequence to mean something
> else in this context, and it wouldn't work if a target does emit the kind
> of "I didn't FAIL" note just mentioned.

Thanks for the elucidation, I appreciate it.

> FWIW, I agree with Jeff in the previous thread that (compared to this)
> it would be more robust to make optabs.c emit the compile barrier
> unconditionally and just leave the target to emit a machine barrier.

OK, I've changed the patch accordingly.  I've also factored out the
is_mm_relaxed check in this iteration to simplify resulting code.

	PR target/80640
	* doc/md.texi (mem_thread_fence): Remove mention of mode.  Rewrite.
        * optabs.c (expand_mem_thread_fence): Emit a compiler barrier when
        using targetm.gen_mem_thread_fence.
testsuite/
	* gcc.dg/atomic/pr80640.c: New testcase.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index ea959208c98..64a137ecad0 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7044,14 +7044,20 @@ If these patterns are not defined, attempts will be made to use
 counterparts.  If none of these are available a compare-and-swap
 loop will be used.
 
-@cindex @code{mem_thread_fence@var{mode}} instruction pattern
-@item @samp{mem_thread_fence@var{mode}}
+@cindex @code{mem_thread_fence} instruction pattern
+@item @samp{mem_thread_fence}
 This pattern emits code required to implement a thread fence with
 memory model semantics.  Operand 0 is the memory model to be used.
 
-If this pattern is not specified, all memory models except
-@code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
-barrier pattern.
+For the @code{__ATOMIC_RELAXED} model no instructions need to be issued
+and this expansion is not invoked.
+
+The compiler always emits a compiler memory barrier regardless of what
+expanding this pattern produced.
+
+If this pattern is not defined, the compiler falls back to expanding the
+@code{memory_barrier} pattern, then to emitting @code{__sync_synchronize}
+library call, and finally to just placing a compiler memory barrier.
 
 @cindex @code{mem_signal_fence@var{mode}} instruction pattern
 @item @samp{mem_signal_fence@var{mode}}
diff --git a/gcc/optabs.c b/gcc/optabs.c
index a9900657a58..71b74dd5feb 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6297,17 +6297,19 @@ expand_asm_memory_barrier (void)
 void
 expand_mem_thread_fence (enum memmodel model)
 {
+  if (is_mm_relaxed (model))
+    return;
   if (targetm.have_mem_thread_fence ())
-    emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
-  else if (!is_mm_relaxed (model))
     {
-      if (targetm.have_memory_barrier ())
-	emit_insn (targetm.gen_memory_barrier ());
-      else if (synchronize_libfunc != NULL_RTX)
-	emit_library_call (synchronize_libfunc, LCT_NORMAL, VOIDmode, 0);
-      else
-	expand_asm_memory_barrier ();
+      emit_insn (targetm.gen_mem_thread_fence (GEN_INT (model)));
+      expand_asm_memory_barrier ();
     }
+  else if (targetm.have_memory_barrier ())
+    emit_insn (targetm.gen_memory_barrier ());
+  else if (synchronize_libfunc != NULL_RTX)
+    emit_library_call (synchronize_libfunc, LCT_NORMAL, VOIDmode, 0);
+  else
+    expand_asm_memory_barrier ();
 }
 
 /* This routine will either emit the mem_signal_fence pattern or issue a 
diff --git a/gcc/testsuite/gcc.dg/atomic/pr80640.c b/gcc/testsuite/gcc.dg/atomic/pr80640.c
new file mode 100644
index 00000000000..fd17978a482
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic/pr80640.c
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+/* { dg-options "-pthread" } */
+/* { dg-require-effective-target pthread } */
+
+#include <pthread.h>
+
+static volatile int sem1;
+static volatile int sem2;
+
+static void *f(void *va)
+{
+  void **p = va;
+  if (*p) return *p;
+  sem1 = 1;
+  while (!sem2);
+  __atomic_thread_fence(__ATOMIC_ACQUIRE);
+  // GCC used to RTL-CSE this and the first load, causing 0 to be returned
+  return *p;
+}
+
+int main()
+{
+  void *p = 0;
+  pthread_t thr;
+  if (pthread_create(&thr, 0, f, &p))
+    return 2;
+  while (!sem1);
+  __atomic_thread_fence(__ATOMIC_ACQUIRE);
+  p = &p;
+  __atomic_thread_fence(__ATOMIC_RELEASE);
+  sem2 = 1;
+  pthread_join(thr, &p);
+  return !p;
+}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier
  2017-08-07  8:42   ` Alexander Monakov
@ 2017-08-14 20:46     ` Jeff Law
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff Law @ 2017-08-14 20:46 UTC (permalink / raw)
  To: Alexander Monakov, Richard Sandiford; +Cc: gcc-patches

On 08/07/2017 02:42 AM, Alexander Monakov wrote:
> On Sat, 5 Aug 2017, Richard Sandiford wrote:
>> It would be simpler to test whether targetm.gen_mem_thread_fence
>> returns NULL.
>>
>> This feels a bit hacky though.  Checking whether a generator produced no
>> instructions is usually the test for whether the generator FAILed, which
>> should normally be treated as though the pattern wasn't defined to start
>> with.  This is useful if the FAIL condition is too complex to put in the
>> define_expand/insn C condition.
>>
>> In those contexts, if a generator wants to succeed and generate no
>> instructions, it should emit a note instead.  (Yes, it would be good
>> to have a nicer interface...)
>>
>> So IMO it's confusing to overload the empty sequence to mean something
>> else in this context, and it wouldn't work if a target does emit the kind
>> of "I didn't FAIL" note just mentioned.
> 
> Thanks for the elucidation, I appreciate it.
> 
>> FWIW, I agree with Jeff in the previous thread that (compared to this)
>> it would be more robust to make optabs.c emit the compile barrier
>> unconditionally and just leave the target to emit a machine barrier.
> 
> OK, I've changed the patch accordingly.  I've also factored out the
> is_mm_relaxed check in this iteration to simplify resulting code.
> 
> 	PR target/80640
> 	* doc/md.texi (mem_thread_fence): Remove mention of mode.  Rewrite.
>         * optabs.c (expand_mem_thread_fence): Emit a compiler barrier when
>         using targetm.gen_mem_thread_fence.
> testsuite/
> 	* gcc.dg/atomic/pr80640.c: New testcase.
THanks for your patience.  OK for the trunk.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PING][PATCH 2/3] retire mem_signal_fence pattern
  2017-08-02 17:45 ` [PATCH 2/3] retire mem_signal_fence pattern Alexander Monakov
@ 2017-08-28 12:39   ` Alexander Monakov
       [not found]     ` <d738f0d1-e9d1-bfb2-e413-4d4b78370cdb@redhat.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Monakov @ 2017-08-28 12:39 UTC (permalink / raw)
  To: gcc-patches

Ping (for this and patch 3/3 in the thread).

On Wed, 2 Aug 2017, Alexander Monakov wrote:

> Similar to mem_thread_fence issue from the patch 1/3, RTL representation of
> __atomic_signal_fence must be a compiler barrier.  We have just one backend
> offering this pattern, and it does not place a compiler barrier.
> 
> It does not appear useful to expand signal_fence to some kind of hardware
> instruction, its documentation is wrong/outdated, and we are using it
> incorrectly anyway.  So just remove the whole thing and simply emit a compiler
> memory barrier in the optabs.c handler.
> 
> 	* config/s390/s390.md (mem_signal_fence): Remove.
>         * doc/md.texi (mem_signal_fence): Remove.
>         * optabs.c (expand_mem_signal_fence): Remove uses of mem_signal_fence.
>         Update comments.
>         * target-insns.def (mem_signal_fence): Remove.
> 
> ---
>  gcc/config/s390/s390.md |  9 ---------
>  gcc/doc/md.texi         | 13 -------------
>  gcc/optabs.c            | 17 +++++------------
>  gcc/target-insns.def    |  1 -
>  4 files changed, 5 insertions(+), 35 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index d1ac0b8395d..1d63523d3b0 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -10084,15 +10084,6 @@ (define_insn "*basr_tls"
>  ; memory barrier patterns.
>  ;
>  
> -(define_expand "mem_signal_fence"
> -  [(match_operand:SI 0 "const_int_operand")]		;; model
> -  ""
> -{
> -  /* The s390 memory model is strong enough not to require any
> -     barrier in order to synchronize a thread with itself.  */
> -  DONE;
> -})
> -
>  (define_expand "mem_thread_fence"
>    [(match_operand:SI 0 "const_int_operand")]		;; model
>    ""
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 0be161a08dc..73d0e7d83c0 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -7058,19 +7058,6 @@ If this pattern is not specified, all memory models except
>  @code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
>  barrier pattern.
>  
> -@cindex @code{mem_signal_fence@var{mode}} instruction pattern
> -@item @samp{mem_signal_fence@var{mode}}
> -This pattern emits code required to implement a signal fence with
> -memory model semantics.  Operand 0 is the memory model to be used.
> -
> -This pattern should impact the compiler optimizers the same way that
> -mem_signal_fence does, but it does not need to issue any barrier
> -instructions.
> -
> -If this pattern is not specified, all memory models except
> -@code{__ATOMIC_RELAXED} will result in issuing a @code{sync_synchronize}
> -barrier pattern.
> -
>  @cindex @code{get_thread_pointer@var{mode}} instruction pattern
>  @cindex @code{set_thread_pointer@var{mode}} instruction pattern
>  @item @samp{get_thread_pointer@var{mode}}
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index d568ca376fe..6884fd4b8aa 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -6315,22 +6315,15 @@ expand_mem_thread_fence (enum memmodel model)
>      }
>  }
>  
> -/* This routine will either emit the mem_signal_fence pattern or issue a 
> -   sync_synchronize to generate a fence for memory model MEMMODEL.  */
> +/* Emit a signal fence with given memory model.  */
>  
>  void
>  expand_mem_signal_fence (enum memmodel model)
>  {
> -  if (targetm.have_mem_signal_fence ())
> -    emit_insn (targetm.gen_mem_signal_fence (GEN_INT (model)));
> -  else if (!is_mm_relaxed (model))
> -    {
> -      /* By default targets are coherent between a thread and the signal
> -	 handler running on the same thread.  Thus this really becomes a
> -	 compiler barrier, in that stores must not be sunk past
> -	 (or raised above) a given point.  */
> -      expand_asm_memory_barrier ();
> -    }
> +  /* No machine barrier is required to implement a signal fence, but
> +     a compiler memory barrier must be issued, except for relaxed MM.  */
> +  if (!is_mm_relaxed (model))
> +    expand_asm_memory_barrier ();
>  }
>  
>  /* This function expands the atomic load operation:
> diff --git a/gcc/target-insns.def b/gcc/target-insns.def
> index fb92f72ac29..4669439c7e1 100644
> --- a/gcc/target-insns.def
> +++ b/gcc/target-insns.def
> @@ -58,7 +58,6 @@ DEF_TARGET_INSN (indirect_jump, (rtx x0))
>  DEF_TARGET_INSN (insv, (rtx x0, rtx x1, rtx x2, rtx x3))
>  DEF_TARGET_INSN (jump, (rtx x0))
>  DEF_TARGET_INSN (load_multiple, (rtx x0, rtx x1, rtx x2))
> -DEF_TARGET_INSN (mem_signal_fence, (rtx x0))
>  DEF_TARGET_INSN (mem_thread_fence, (rtx x0))
>  DEF_TARGET_INSN (memory_barrier, (void))
>  DEF_TARGET_INSN (movstr, (rtx x0, rtx x1, rtx x2))
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PING][PATCH 2/3] retire mem_signal_fence pattern
       [not found]     ` <d738f0d1-e9d1-bfb2-e413-4d4b78370cdb@redhat.com>
@ 2017-09-01  6:26       ` Alexander Monakov
  2017-09-04  4:56         ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Monakov @ 2017-09-01  6:26 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Thu, 31 Aug 2017, Jeff Law wrote:
> This is OK.
> 
> What's the point of the delete_insns_since calls in patch #3?

Deleting the first barrier when maybe_expand_insn failed.
Other functions in the file use a similar approach.

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PING][PATCH 2/3] retire mem_signal_fence pattern
  2017-09-01  6:26       ` Alexander Monakov
@ 2017-09-04  4:56         ` Jeff Law
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff Law @ 2017-09-04  4:56 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On 09/01/2017 12:26 AM, Alexander Monakov wrote:
> On Thu, 31 Aug 2017, Jeff Law wrote:
>> This is OK.
>>
>> What's the point of the delete_insns_since calls in patch #3?
> 
> Deleting the first barrier when maybe_expand_insn failed.
> Other functions in the file use a similar approach.
Thanks for clarifying.  OK for the trunk.  Sorry about the wait.

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PING][PATCH 2/3] retire mem_signal_fence pattern
  2017-09-05 11:40   ` Uros Bizjak
@ 2017-09-05 15:47     ` Christophe Lyon
  0 siblings, 0 replies; 14+ messages in thread
From: Christophe Lyon @ 2017-09-05 15:47 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Alexander Monakov, gcc-patches, Jeff Law, H. J. Lu

Hi,

On 5 September 2017 at 13:40, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Tue, Sep 5, 2017 at 12:28 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
>> On Mon, 4 Sep 2017, Uros Bizjak wrote:
>>> introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:
>>>
>>> New failures:
>>> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
>>> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps
>>
>> Sorry.  I suggest that the tests be XFAIL'ed, the peepholes introduced in the
>> fix for PR 71245 removed, and the PR reopened (it's a missed-optimization PR).
>> I can do all of the above if you agree.
>>
>> I think RTL peepholes are a poor way of fixing the original problem, which
>> actually exists on all targets with separate int/fp registers.  For instance,
>> trunk (without my patch) still gets a far simpler testcase wrong (-O2, 64-bit):
>
> Please note that 32bit x86 implements atomic DImode access with
> fild/fistp combination, so the mentioned peephole avoids quite costly
> instruction sequence.
>
> For reference, attached patch implements additional peephole2 patterns
> that also handle sequences with blockages.
>
> Uros.

On arm, we also have a similar regression:
FAIL: gcc.target/arm/stl-cond.c scan-assembler stlne

Before the patch, we generated:
        ldr     r3, [r0]
        cmp     r3, #0
        addne   r3, r0, #4
        movne   r2, #0
        stlne   r2, [r3]
        bx      lr
and now:
        ldr     r3, [r0]
        cmp     r3, #0
        bxeq    lr
        mov     r2, #0
        add     r3, r0, #4
        stl     r2, [r3]
        bx      lr

Christophe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PING][PATCH 2/3] retire mem_signal_fence pattern
  2017-09-05 10:29 ` Alexander Monakov
  2017-09-05 10:35   ` Uros Bizjak
@ 2017-09-05 11:40   ` Uros Bizjak
  2017-09-05 15:47     ` Christophe Lyon
  1 sibling, 1 reply; 14+ messages in thread
From: Uros Bizjak @ 2017-09-05 11:40 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches, Jeff Law, H. J. Lu

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

On Tue, Sep 5, 2017 at 12:28 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Mon, 4 Sep 2017, Uros Bizjak wrote:
>> introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:
>>
>> New failures:
>> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
>> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps
>
> Sorry.  I suggest that the tests be XFAIL'ed, the peepholes introduced in the
> fix for PR 71245 removed, and the PR reopened (it's a missed-optimization PR).
> I can do all of the above if you agree.
>
> I think RTL peepholes are a poor way of fixing the original problem, which
> actually exists on all targets with separate int/fp registers.  For instance,
> trunk (without my patch) still gets a far simpler testcase wrong (-O2, 64-bit):

Please note that 32bit x86 implements atomic DImode access with
fild/fistp combination, so the mentioned peephole avoids quite costly
instruction sequence.

For reference, attached patch implements additional peephole2 patterns
that also handle sequences with blockages.

Uros.

[-- Attachment #2: q.diff.txt --]
[-- Type: text/plain, Size: 5983 bytes --]

diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 29b82f86d43a..eceaa73a6799 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -219,29 +219,71 @@
    (set (match_operand:DI 2 "memory_operand")
 	(unspec:DI [(match_dup 0)]
 		   UNSPEC_FIST_ATOMIC))
-   (set (match_operand:DF 3 "fp_register_operand")
+   (set (match_operand:DF 3 "any_fp_register_operand")
 	(match_operand:DF 4 "memory_operand"))]
   "!TARGET_64BIT
    && peep2_reg_dead_p (2, operands[0])
-   && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
+   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
   [(set (match_dup 3) (match_dup 5))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")
 
 (define_peephole2
+  [(set (match_operand:DF 0 "fp_register_operand")
+	(unspec:DF [(match_operand:DI 1 "memory_operand")]
+		   UNSPEC_FILD_ATOMIC))
+   (set (match_operand:DI 2 "memory_operand")
+	(unspec:DI [(match_dup 0)]
+		   UNSPEC_FIST_ATOMIC))
+   (set (mem:BLK (scratch:SI))
+	(unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
+   (set (match_operand:DF 3 "any_fp_register_operand")
+	(match_operand:DF 4 "memory_operand"))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (2, operands[0])
+   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
+  [(const_int 0)]
+{
+  emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1]));
+  emit_insn (gen_memory_blockage ());
+  DONE;
+})
+
+(define_peephole2
   [(set (match_operand:DF 0 "sse_reg_operand")
 	(unspec:DF [(match_operand:DI 1 "memory_operand")]
 		   UNSPEC_LDX_ATOMIC))
    (set (match_operand:DI 2 "memory_operand")
 	(unspec:DI [(match_dup 0)]
 		   UNSPEC_STX_ATOMIC))
-   (set (match_operand:DF 3 "fp_register_operand")
+   (set (match_operand:DF 3 "any_fp_register_operand")
 	(match_operand:DF 4 "memory_operand"))]
   "!TARGET_64BIT
    && peep2_reg_dead_p (2, operands[0])
-   && rtx_equal_p (operands[4], adjust_address_nv (operands[2], DFmode, 0))"
+   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
   [(set (match_dup 3) (match_dup 5))]
   "operands[5] = gen_lowpart (DFmode, operands[1]);")
 
+(define_peephole2
+  [(set (match_operand:DF 0 "sse_reg_operand")
+	(unspec:DF [(match_operand:DI 1 "memory_operand")]
+		   UNSPEC_LDX_ATOMIC))
+   (set (match_operand:DI 2 "memory_operand")
+	(unspec:DI [(match_dup 0)]
+		   UNSPEC_STX_ATOMIC))
+   (set (mem:BLK (scratch:SI))
+	(unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
+   (set (match_operand:DF 3 "any_fp_register_operand")
+	(match_operand:DF 4 "memory_operand"))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (2, operands[0])
+   && rtx_equal_p (XEXP (operands[4], 0), XEXP (operands[2], 0))"
+  [(const_int 0)]
+{
+  emit_move_insn (operands[3], gen_lowpart (DFmode, operands[1]));
+  emit_insn (gen_memory_blockage ());
+  DONE;
+})
+
 (define_expand "atomic_store<mode>"
   [(set (match_operand:ATOMIC 0 "memory_operand")
 	(unspec:ATOMIC [(match_operand:ATOMIC 1 "nonimmediate_operand")
@@ -331,7 +373,7 @@
 
 (define_peephole2
   [(set (match_operand:DF 0 "memory_operand")
-	(match_operand:DF 1 "fp_register_operand"))
+	(match_operand:DF 1 "any_fp_register_operand"))
    (set (match_operand:DF 2 "fp_register_operand")
 	(unspec:DF [(match_operand:DI 3 "memory_operand")]
 		   UNSPEC_FILD_ATOMIC))
@@ -340,13 +382,34 @@
 		   UNSPEC_FIST_ATOMIC))]
   "!TARGET_64BIT
    && peep2_reg_dead_p (3, operands[2])
-   && rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))"
+   && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))"
   [(set (match_dup 5) (match_dup 1))]
   "operands[5] = gen_lowpart (DFmode, operands[4]);")
 
 (define_peephole2
   [(set (match_operand:DF 0 "memory_operand")
-	(match_operand:DF 1 "fp_register_operand"))
+	(match_operand:DF 1 "any_fp_register_operand"))
+   (set (mem:BLK (scratch:SI))
+	(unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
+   (set (match_operand:DF 2 "fp_register_operand")
+	(unspec:DF [(match_operand:DI 3 "memory_operand")]
+		   UNSPEC_FILD_ATOMIC))
+   (set (match_operand:DI 4 "memory_operand")
+	(unspec:DI [(match_dup 2)]
+		   UNSPEC_FIST_ATOMIC))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (4, operands[2])
+   && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))"
+  [(const_int 0)]
+{
+  emit_insn (gen_memory_blockage ());
+  emit_move_insn (gen_lowpart (DFmode, operands[4]), operands[1]);
+  DONE;
+})
+
+(define_peephole2
+  [(set (match_operand:DF 0 "memory_operand")
+	(match_operand:DF 1 "any_fp_register_operand"))
    (set (match_operand:DF 2 "sse_reg_operand")
 	(unspec:DF [(match_operand:DI 3 "memory_operand")]
 		   UNSPEC_LDX_ATOMIC))
@@ -355,10 +418,31 @@
 		   UNSPEC_STX_ATOMIC))]
   "!TARGET_64BIT
    && peep2_reg_dead_p (3, operands[2])
-   && rtx_equal_p (operands[0], adjust_address_nv (operands[3], DFmode, 0))"
+   && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))"
   [(set (match_dup 5) (match_dup 1))]
   "operands[5] = gen_lowpart (DFmode, operands[4]);")
 
+(define_peephole2
+  [(set (match_operand:DF 0 "memory_operand")
+	(match_operand:DF 1 "any_fp_register_operand"))
+   (set (mem:BLK (scratch:SI))
+	(unspec:BLK [(mem:BLK (scratch:SI))] UNSPEC_MEMORY_BLOCKAGE))
+   (set (match_operand:DF 2 "sse_reg_operand")
+	(unspec:DF [(match_operand:DI 3 "memory_operand")]
+		   UNSPEC_LDX_ATOMIC))
+   (set (match_operand:DI 4 "memory_operand")
+	(unspec:DI [(match_dup 2)]
+		   UNSPEC_STX_ATOMIC))]
+  "!TARGET_64BIT
+   && peep2_reg_dead_p (4, operands[2])
+   && rtx_equal_p (XEXP (operands[0], 0), XEXP (operands[3], 0))"
+  [(const_int 0)]
+{
+  emit_insn (gen_memory_blockage ());
+  emit_move_insn (gen_lowpart (DFmode, operands[4]), operands[1]);
+  DONE;
+})
+
 ;; ??? You'd think that we'd be able to perform this via FLOAT + FIX_TRUNC
 ;; operations.  But the fix_trunc patterns want way more setup than we want
 ;; to provide.  Note that the scratch is DFmode instead of XFmode in order

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PING][PATCH 2/3] retire mem_signal_fence pattern
  2017-09-05 10:29 ` Alexander Monakov
@ 2017-09-05 10:35   ` Uros Bizjak
  2017-09-05 11:40   ` Uros Bizjak
  1 sibling, 0 replies; 14+ messages in thread
From: Uros Bizjak @ 2017-09-05 10:35 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches, Jeff Law, H. J. Lu

On Tue, Sep 5, 2017 at 12:28 PM, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Mon, 4 Sep 2017, Uros Bizjak wrote:
>> introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:
>>
>> New failures:
>> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
>> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps
>
> Sorry.  I suggest that the tests be XFAIL'ed, the peepholes introduced in the
> fix for PR 71245 removed, and the PR reopened (it's a missed-optimization PR).
> I can do all of the above if you agree.

No, I have a solution for the regression. A prerequisite is patch at [1].

[1] https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00239.html

Uros.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PING][PATCH 2/3] retire mem_signal_fence pattern
  2017-09-04 19:02 [PING][PATCH 2/3] retire mem_signal_fence pattern Uros Bizjak
@ 2017-09-05 10:29 ` Alexander Monakov
  2017-09-05 10:35   ` Uros Bizjak
  2017-09-05 11:40   ` Uros Bizjak
  0 siblings, 2 replies; 14+ messages in thread
From: Alexander Monakov @ 2017-09-05 10:29 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches, Jeff Law, H. J. Lu

On Mon, 4 Sep 2017, Uros Bizjak wrote:
> introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:
> 
> New failures:
> FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
> FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps

Sorry.  I suggest that the tests be XFAIL'ed, the peepholes introduced in the
fix for PR 71245 removed, and the PR reopened (it's a missed-optimization PR).
I can do all of the above if you agree.

I think RTL peepholes are a poor way of fixing the original problem, which
actually exists on all targets with separate int/fp registers.  For instance,
trunk (without my patch) still gets a far simpler testcase wrong (-O2, 64-bit):

float f(_Atomic float *p)
{
  return *p;
}

f:
        movl    (%rdi), %eax
        movl    %eax, -4(%rsp)
        movss   -4(%rsp), %xmm0
        ret

I believe adding more peepholes to handle this, independently in each affected
backend, is hardly appropriate.

The issue is caused by translation of floating-point atomic loads/stores to a
sequence of integer atomic and a VIEW_CONVERT_EXPR on GIMPLE, which in turn
causes the load to be emitted (wrongly) in an integer mode on RTL.  I don't
know if GIMPLE atomic builtins can work with floating operands (which IMHO
would be a preferable way), but if not, then special-casing a sequence of
integer atomic mem op and a VCE at expand time might be a cleaner way of
achieving the desired optimization.

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PING][PATCH 2/3] retire mem_signal_fence pattern
@ 2017-09-04 19:02 Uros Bizjak
  2017-09-05 10:29 ` Alexander Monakov
  0 siblings, 1 reply; 14+ messages in thread
From: Uros Bizjak @ 2017-09-04 19:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jeff Law, Alexander Monakov, H. J. Lu

> On 09/01/2017 12:26 AM, Alexander Monakov wrote:
>> On Thu, 31 Aug 2017, Jeff Law wrote:
>>> This is OK.
>>>
>>> What's the point of the delete_insns_since calls in patch #3?
>>
>> Deleting the first barrier when maybe_expand_insn failed.
>> Other functions in the file use a similar approach.
> Thanks for clarifying.  OK for the trunk.  Sorry about the wait.

It looks that:

2017-09-04  Alexander Monakov  <amonakov@ispras.ru>

        PR rtl-optimization/57448
        PR target/67458
        PR target/81316
        * optabs.c (expand_atomic_load): Place compiler memory barriers if
        using atomic_load pattern.
        (expand_atomic_store): Likewise.

introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:

New failures:
FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps

Uros.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-09-05 15:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02 17:45 [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier Alexander Monakov
2017-08-02 17:45 ` [PATCH 3/3] optabs: ensure atomic_load/stores have compiler barriers Alexander Monakov
2017-08-02 17:45 ` [PATCH 2/3] retire mem_signal_fence pattern Alexander Monakov
2017-08-28 12:39   ` [PING][PATCH " Alexander Monakov
     [not found]     ` <d738f0d1-e9d1-bfb2-e413-4d4b78370cdb@redhat.com>
2017-09-01  6:26       ` Alexander Monakov
2017-09-04  4:56         ` Jeff Law
2017-08-05  9:03 ` [PATCH 1/3] optabs: ensure mem_thread_fence is a compiler barrier Richard Sandiford
2017-08-07  8:42   ` Alexander Monakov
2017-08-14 20:46     ` Jeff Law
2017-09-04 19:02 [PING][PATCH 2/3] retire mem_signal_fence pattern Uros Bizjak
2017-09-05 10:29 ` Alexander Monakov
2017-09-05 10:35   ` Uros Bizjak
2017-09-05 11:40   ` Uros Bizjak
2017-09-05 15:47     ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).