public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 3/4] Fix typo in gcov.texi
  2016-08-01  8:50 [PATCH 0/4] Various GCOV/PGO improvements marxin
@ 2016-08-01  8:49 ` marxin
  2016-08-01  8:50 ` [PATCH 2/4] Remove __gcov_indirect_call_profiler marxin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 95+ messages in thread
From: marxin @ 2016-08-01  8:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: jh, nathan

gcc/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	* doc/gcov.texi: Change _gcov_dump to __gcov_dump and
	_gcov_reset to __gcov_reset.
	* doc/gcov-tool.texi: Fix typo.

libgcc/ChangeLog:

2016-08-01  Martin Liska  <mliska@suse.cz>

	* libgcov-util.c: Fix typo and GNU coding style.
---
 gcc/doc/gcov-tool.texi | 2 +-
 gcc/doc/gcov.texi      | 6 +++---
 libgcc/libgcov-util.c  | 3 ++-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/gcov-tool.texi b/gcc/doc/gcov-tool.texi
index 845f14b..c4a9ce1 100644
--- a/gcc/doc/gcov-tool.texi
+++ b/gcc/doc/gcov-tool.texi
@@ -193,7 +193,7 @@ in the new profile.
 @end table
 
 @item overlap
-Computer the overlap score between the two specified profile directories.
+Compute the overlap score between the two specified profile directories.
 The overlap score is computed based on the arc profiles. It is defined as
 the sum of min (p1_counter[i] / p1_sum_all, p2_counter[i] / p2_sum_all),
 for all arc counter i, where p1_counter[i] and p2_counter[i] are two
diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index 89d8049..df58df8 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -582,10 +582,10 @@ now be calculable at compile time in some instances.  Because the
 coverage of all the uses of the inline function will be shown for the
 same source lines, the line counts themselves might seem inconsistent.
 
-Long-running applications can use the @code{_gcov_reset} and @code{_gcov_dump}
+Long-running applications can use the @code{__gcov_reset} and @code{__gcov_dump}
 facilities to restrict profile collection to the program region of
-interest. Calling @code{_gcov_reset(void)} will clear all profile counters
-to zero, and calling @code{_gcov_dump(void)} will cause the profile information
+interest. Calling @code{__gcov_reset(void)} will clear all profile counters
+to zero, and calling @code{__gcov_dump(void)} will cause the profile information
 collected at that point to be dumped to @file{.gcda} output files.
 
 @c man end
diff --git a/libgcc/libgcov-util.c b/libgcc/libgcov-util.c
index 7b3bc31..c8fb52d 100644
--- a/libgcc/libgcov-util.c
+++ b/libgcc/libgcov-util.c
@@ -1391,7 +1391,8 @@ calculate_overlap (struct gcov_info *gcov_list1,
   return prg_val;
 }
 
-/* Computer the overlap score of two lists of gcov_info objects PROFILE1 and PROFILE2.
+/* Compute the overlap score of two lists of gcov_info objects PROFILE1 and
+   PROFILE2.
    Return 0 on success: without mismatch. Reutrn 1 on error.  */
 
 int
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH 4/4] Add tests for __gcov_dump and __gcov_reset
  2016-08-01  8:50 [PATCH 0/4] Various GCOV/PGO improvements marxin
                   ` (2 preceding siblings ...)
  2016-08-01  8:50 ` [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch marxin
@ 2016-08-01  8:50 ` marxin
  2016-08-01 12:11 ` [PATCH 0/4] Various GCOV/PGO improvements Nathan Sidwell
  4 siblings, 0 replies; 95+ messages in thread
From: marxin @ 2016-08-01  8:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: jh, nathan

gcc/testsuite/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	* g++.dg/gcov/gcov-dump-1.C: New test.
	* g++.dg/gcov/gcov-dump-2.C: New test.
---
 gcc/testsuite/g++.dg/gcov/gcov-dump-1.C | 23 +++++++++++++++++++++++
 gcc/testsuite/g++.dg/gcov/gcov-dump-2.C | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-dump-1.C
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-dump-2.C

diff --git a/gcc/testsuite/g++.dg/gcov/gcov-dump-1.C b/gcc/testsuite/g++.dg/gcov/gcov-dump-1.C
new file mode 100644
index 0000000..f0e81e9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/gcov-dump-1.C
@@ -0,0 +1,23 @@
+/* { dg-options "-fprofile-generate -ftest-coverage -lgcov" } */
+/* { dg-do run { target native } } */
+
+int value;
+
+extern "C" { void __gcov_dump(void); }
+
+int main(int argc, char **argv)
+{
+  value = 123;					/* count(1) */
+
+  for (unsigned i = 0; i < 100; i++)
+    value += argc;				/* count(100) */
+
+  __gcov_dump();
+
+  for (unsigned i = 0; i < 1000; i++)		/* count(#####) */
+    value += argc;
+
+  return 0;					/* count(#####) */
+}
+
+/* { dg-final { run-gcov gcov-dump-1.C } } */
diff --git a/gcc/testsuite/g++.dg/gcov/gcov-dump-2.C b/gcc/testsuite/g++.dg/gcov/gcov-dump-2.C
new file mode 100644
index 0000000..6234a81
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/gcov-dump-2.C
@@ -0,0 +1,32 @@
+/* { dg-options "-fprofile-generate -ftest-coverage -lgcov" } */
+/* { dg-do run { target native } } */
+
+int value;
+
+extern "C"
+{
+  void __gcov_dump(void);
+  void __gcov_reset(void);
+}
+
+int main(int argc, char **argv)
+{
+  value = 123;					/* count(1) */
+
+  for (unsigned i = 0; i < 100; i++)
+    value += argc;				/* count(100) */
+
+  __gcov_dump();
+
+  for (unsigned i = 0; i < 1000; i++)		/* count(#####) */
+    value += argc;
+
+  __gcov_reset ();
+
+  for (unsigned i = 0; i < 10000; i++)		/* count(10001) */
+    value += argc;
+
+  return 0;					/* count(1) */
+}
+
+/* { dg-final { run-gcov gcov-dump-2.C } } */
-- 
2.9.2

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH 2/4] Remove __gcov_indirect_call_profiler
  2016-08-01  8:50 [PATCH 0/4] Various GCOV/PGO improvements marxin
  2016-08-01  8:49 ` [PATCH 3/4] Fix typo in gcov.texi marxin
@ 2016-08-01  8:50 ` marxin
  2016-08-01  8:50 ` [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch marxin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 95+ messages in thread
From: marxin @ 2016-08-01  8:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: jh, nathan

libgcc/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	* Makefile.in: Remove __gcov_indirect_call_profiler.
	* libgcov-profiler.c (__gcov_indirect_call_profiler): Remove
	function.
	* libgcov.h: And the declaration of the function.
---
 libgcc/Makefile.in        |  2 +-
 libgcc/libgcov-profiler.c | 27 ---------------------------
 libgcc/libgcov.h          |  2 --
 3 files changed, 1 insertion(+), 30 deletions(-)

diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 8b0fdd9..e2295ca 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -859,7 +859,7 @@ include $(iterator)
 LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single _gcov_merge_delta	\
 	_gcov_merge_ior _gcov_merge_time_profile _gcov_merge_icall_topn
 LIBGCOV_PROFILER = _gcov_interval_profiler _gcov_pow2_profiler		\
-	_gcov_one_value_profiler _gcov_indirect_call_profiler		\
+	_gcov_one_value_profiler					\
 	_gcov_one_value_profiler_atomic					\
  	_gcov_average_profiler _gcov_ior_profiler			\
 	_gcov_indirect_call_profiler_v2					\
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 1b307ac..c1e287d 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -257,33 +257,6 @@ __gcov_indirect_call_topn_profiler (gcov_type value, void* cur_func)
 
 #endif
 
-#ifdef L_gcov_indirect_call_profiler
-/* This function exist only for workaround of binutils bug 14342.
-   Once this compatibility hack is obsolette, it can be removed.  */
-
-/* By default, the C++ compiler will use function addresses in the
-   vtable entries.  Setting TARGET_VTABLE_USES_DESCRIPTORS to nonzero
-   tells the compiler to use function descriptors instead.  The value
-   of this macro says how many words wide the descriptor is (normally 2).
-
-   It is assumed that the address of a function descriptor may be treated
-   as a pointer to a function.  */
-
-/* Tries to determine the most common value among its inputs. */
-void
-__gcov_indirect_call_profiler (gcov_type* counter, gcov_type value,
-                               void* cur_func, void* callee_func)
-{
-  /* If the C++ virtual tables contain function descriptors then one
-     function may have multiple descriptors and we need to dereference
-     the descriptors to see if they point to the same function.  */
-  if (cur_func == callee_func
-      || (__LIBGCC_VTABLE_USES_DESCRIPTORS__ && callee_func
-          && *(void **) cur_func == *(void **) callee_func))
-    __gcov_one_value_profiler_body (counter, value);
-}
-#endif
-
 #ifdef L_gcov_indirect_call_profiler_v2
 
 /* These two variables are used to actually track caller and callee.  Keep
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 0bd905b..337e558 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -271,8 +271,6 @@ extern void __gcov_interval_profiler (gcov_type *, gcov_type, int, unsigned);
 extern void __gcov_pow2_profiler (gcov_type *, gcov_type);
 extern void __gcov_one_value_profiler (gcov_type *, gcov_type);
 extern void __gcov_one_value_profiler_atomic (gcov_type *, gcov_type);
-extern void __gcov_indirect_call_profiler (gcov_type*, gcov_type,
-                                           void*, void*);
 extern void __gcov_indirect_call_profiler_v2 (gcov_type, void *);
 extern void __gcov_indirect_call_profiler_v2_atomic (gcov_type, void *);
 extern void __gcov_time_profiler (gcov_type *);
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH 0/4] Various GCOV/PGO improvements
@ 2016-08-01  8:50 marxin
  2016-08-01  8:49 ` [PATCH 3/4] Fix typo in gcov.texi marxin
                   ` (4 more replies)
  0 siblings, 5 replies; 95+ messages in thread
From: marxin @ 2016-08-01  8:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: jh, nathan

Hi.

My attempt in the following small series is to cover couple of issues
I've recently observed. I'll briefly describe changes in respect to
an individual patch:

marxin (4):
  Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch

As mentioned in [1], our current implementation can produce a corrupted
profile due to a massive usage of threads. Well, the absolutely robust
solution would be to either utilize TLS or to use atomics & locking
mechanism. However, as David Li pointed out, the most interesting
counters that can suffer from multithreading are -fprofile-arcs counters
and indirect call counters. I've just cherry picked the functionality
from google/gcc-4_9 branch.

[1] https://gcc.gnu.org/ml/gcc/2016-07/msg00131.html

  Remove __gcov_indirect_call_profiler

The profiler function is unused, thus can be removed.

  Fix typo in gcov.texi

Just a small typo in names of functions that one can call from
a user application.

  Add tests for __gcov_dump and __gcov_reset

Adding tests for the aforementioned functions.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests
(tested together).

Thoughts?
Thanks,
Martin

marxin (4):
  Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  Remove __gcov_indirect_call_profiler
  Fix typo in gcov.texi
  Add tests for __gcov_dump and __gcov_reset

 gcc/common.opt                             |  9 ++++
 gcc/doc/gcov-tool.texi                     |  2 +-
 gcc/doc/gcov.texi                          |  6 +--
 gcc/doc/invoke.texi                        | 11 +++++
 gcc/gcov-io.h                              | 22 ++++++++++
 gcc/testsuite/g++.dg/gcov/gcov-dump-1.C    | 23 ++++++++++
 gcc/testsuite/g++.dg/gcov/gcov-dump-2.C    | 32 ++++++++++++++
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C | 46 ++++++++++++++++++++
 gcc/tree-profile.c                         | 62 ++++++++++++++++++---------
 libgcc/Makefile.in                         |  6 ++-
 libgcc/libgcov-profiler.c                  | 67 ++++++++++++++++++------------
 libgcc/libgcov-util.c                      |  3 +-
 libgcc/libgcov.h                           |  4 +-
 13 files changed, 238 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-dump-1.C
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-dump-2.C
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C

-- 
2.9.2

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-01  8:50 [PATCH 0/4] Various GCOV/PGO improvements marxin
  2016-08-01  8:49 ` [PATCH 3/4] Fix typo in gcov.texi marxin
  2016-08-01  8:50 ` [PATCH 2/4] Remove __gcov_indirect_call_profiler marxin
@ 2016-08-01  8:50 ` marxin
  2016-08-01 12:22   ` Nathan Sidwell
  2016-08-01  8:50 ` [PATCH 4/4] Add tests for __gcov_dump and __gcov_reset marxin
  2016-08-01 12:11 ` [PATCH 0/4] Various GCOV/PGO improvements Nathan Sidwell
  4 siblings, 1 reply; 95+ messages in thread
From: marxin @ 2016-08-01  8:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: jh, nathan

libgcc/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	* Makefile.in: Add functions to LIBGCOV_PROFILER.
	* libgcov-profiler.c (__gcov_one_value_profiler_body_atomic):
	New function.
	(__gcov_one_value_profiler_atomic): Likewise.
	(__gcov_indirect_call_profiler_v2): Fix GNU coding style.
	(__gcov_indirect_call_profiler_v2_atomic): New function.
	* libgcov.h: Declare __gcov_indirect_call_profiler_v2_atomic and
	__gcov_one_value_profiler_body_atomic.

gcc/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	* common.opt (fprofile-generate-atomic): Add new flag.
	* gcov-io.h: Declare GCOV_TYPE_ATOMIC_FETCH_ADD and
	GCOV_TYPE_ATOMIC_FETCH_ADD_FN.
	* tree-profile.c (gimple_init_edge_profiler): Generate
	also atomic profiler update.
	(gimple_gen_edge_profiler): Likewise.
	* doc/invoke.texi: Document -fprofile-generate-atomic.

gcc/testsuite/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	* g++.dg/gcov/gcov-threads-1.C: New test.
---
 gcc/common.opt                             |  9 +++++
 gcc/doc/invoke.texi                        | 11 ++++++
 gcc/gcov-io.h                              | 22 +++++++++++
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C | 46 ++++++++++++++++++++++
 gcc/tree-profile.c                         | 62 +++++++++++++++++++++---------
 libgcc/Makefile.in                         |  4 +-
 libgcc/libgcov-profiler.c                  | 42 +++++++++++++++++++-
 libgcc/libgcov.h                           |  2 +
 8 files changed, 177 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C

diff --git a/gcc/common.opt b/gcc/common.opt
index 8a292ed..1adb1d7 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1916,6 +1916,15 @@ fprofile-correction
 Common Report Var(flag_profile_correction)
 Enable correction of flow inconsistent profile data input.
 
+; fprofile-generate-atomic=0: disable atomically update.
+; fprofile-generate-atomic=1: atomically update edge profile counters.
+; fprofile-generate-atomic=2: atomically update value profile counters.
+; fprofile-generate-atomic=3: atomically update edge and value profile counters.
+; other values will be ignored (fall back to the default of 0).
+fprofile-generate-atomic=
+Common Joined UInteger Report Var(flag_profile_gen_atomic) Init(0) Optimization
+fprofile-generate-atomic=[0..3] Atomically increments for profile counters.
+
 fprofile-generate
 Common
 Enable common options for generating profile info for profile feedback directed optimizations.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 22001f9..147b448 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9933,6 +9933,17 @@ the profile feedback data files. See @option{-fprofile-dir}.
 To optimize the program based on the collected profile information, use
 @option{-fprofile-use}.  @xref{Optimize Options}, for more information.
 
+@item -fprofile-generate-atomic
+@opindex fprofile-generate-atomic
+
+Enable atomic increments for profile counters.  By default, an instrumented
+application can produce a corrupted profiled if it utilizes threads in
+a massive way.  The option provides atomic updates for edge profile counters
+(@option{-fprofile-generate-atomic=1}) and indirect call counters
+(@option{-fprofile-generate-atomic=2}).  Both can be enabled with
+(@option{-fprofile-generate-atomic=3}), default value of the option
+is equal to 0.
+
 @item -fsanitize=address
 @opindex fsanitize=address
 Enable AddressSanitizer, a fast memory error detector.
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index bbf013a..96ed78b 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -169,6 +169,19 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 
 typedef unsigned gcov_unsigned_t;
 typedef unsigned gcov_position_t;
+
+#if LONG_LONG_TYPE_SIZE > 32
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
+#else
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
+#endif
+#define PROFILE_GEN_EDGE_ATOMIC (flag_profile_gen_atomic == 1 || \
+				 flag_profile_gen_atomic == 3)
+#define PROFILE_GEN_VALUE_ATOMIC (flag_profile_gen_atomic == 2 || \
+				  flag_profile_gen_atomic == 3)
+
 /* gcov_type is typedef'd elsewhere for the compiler */
 #if IN_GCOV
 #define GCOV_LINKAGE static
@@ -196,6 +209,15 @@ typedef uint64_t gcov_type_unsigned;
 #endif
 
 #if IN_LIBGCOV
+
+#if LONG_LONG_TYPE_SIZE > 32
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
+#else
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
+#endif
+
 #define gcov_nonruntime_assert(EXPR) ((void)(0 && (EXPR)))
 #else
 #define gcov_nonruntime_assert(EXPR) gcc_assert (EXPR)
diff --git a/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
new file mode 100644
index 0000000..184beb9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
@@ -0,0 +1,46 @@
+/* { dg-options "-fprofile-arcs -ftest-coverage -pthread -fprofile-generate-atomic=3" } */
+/* { dg-do run { target native } } */
+
+#include <stdint.h>
+#include <pthread.h>
+#include <assert.h>
+
+#define NR 5
+
+pthread_mutex_t cndMs[NR];
+static void *ContentionNoDeadlock_thread(void *start)
+{
+  for (uint32_t k = 0; k < 100000; ++k)		/* count(500005) */
+    {
+      int starti = *(int*)start;		/* count(500000) */
+      for (uint32_t i = starti; i < NR; ++i) 
+	pthread_mutex_lock (&cndMs[i]);
+      for (int32_t i = NR - 1; i >= starti; --i)
+	pthread_mutex_unlock (&cndMs[i]);
+  }
+}
+int main(int argc, char **argv) {
+  for (unsigned i = 0; i < NR; i++)
+    cndMs[i] = PTHREAD_MUTEX_INITIALIZER;
+
+  pthread_t t[NR];
+  int ids[NR];
+
+  for (int i = 0; i < NR; i++)
+  {
+    ids[i] = i;
+    int r = pthread_create (&t[i], NULL, ContentionNoDeadlock_thread, &ids[i]);
+    assert (r == 0);				/* count(5) */
+  }
+
+  int ret;
+  for (int i = 0; i < NR; i++)
+    {
+      int r = pthread_join (t[i], (void**)&ret);
+      assert (r == 0);				/* count(5) */
+    }
+
+  return 0;					/* count(1) */
+}
+
+/* { dg-final { run-gcov gcov-threads-1.C } } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 39fe15f..2c6fbd1 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -164,7 +164,12 @@ gimple_init_edge_profiler (void)
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      tree_one_value_profiler_fn
+      if (PROFILE_GEN_VALUE_ATOMIC)
+	tree_one_value_profiler_fn
+	      = build_fn_decl ("__gcov_one_value_profiler_atomic",
+				     one_value_profiler_fn_type);
+      else
+	tree_one_value_profiler_fn
 	      = build_fn_decl ("__gcov_one_value_profiler",
 				     one_value_profiler_fn_type);
       TREE_NOTHROW (tree_one_value_profiler_fn) = 1;
@@ -180,11 +185,14 @@ gimple_init_edge_profiler (void)
 					  gcov_type_node,
 					  ptr_void,
 					  NULL_TREE);
+      const char *profiler_fn_name = "__gcov_indirect_call_profiler_v2";
+      if (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE))
+	profiler_fn_name = "__gcov_indirect_call_topn_profiler";
+      if (PROFILE_GEN_VALUE_ATOMIC)
+	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";
+
       tree_indirect_call_profiler_fn
-	      = build_fn_decl ( (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
-				 "__gcov_indirect_call_topn_profiler":
-				 "__gcov_indirect_call_profiler_v2"),
-			       ic_profiler_fn_type);
+	      = build_fn_decl (profiler_fn_name, ic_profiler_fn_type);
 
       TREE_NOTHROW (tree_indirect_call_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_indirect_call_profiler_fn)
@@ -241,22 +249,38 @@ gimple_init_edge_profiler (void)
 void
 gimple_gen_edge_profiler (int edgeno, edge e)
 {
-  tree ref, one, gcov_type_tmp_var;
-  gassign *stmt1, *stmt2, *stmt3;
+  tree one;
+  bool is_atomic = PROFILE_GEN_EDGE_ATOMIC;
 
-  ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
   one = build_int_cst (gcov_type_node, 1);
-  gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					  NULL, "PROF_edge_counter");
-  stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
-  gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					  NULL, "PROF_edge_counter");
-  stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
-			       gimple_assign_lhs (stmt1), one);
-  stmt3 = gimple_build_assign (unshare_expr (ref), gimple_assign_lhs (stmt2));
-  gsi_insert_on_edge (e, stmt1);
-  gsi_insert_on_edge (e, stmt2);
-  gsi_insert_on_edge (e, stmt3);
+
+  if (is_atomic)
+    {
+      /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
+      tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
+      gcall *stmt
+	= gimple_build_call (builtin_decl_explicit (GCOV_TYPE_ATOMIC_FETCH_ADD),
+			     3, addr, one,
+			     build_int_cst (integer_type_node,
+					    MEMMODEL_RELAXED));
+      gsi_insert_on_edge (e, stmt);
+    }
+  else
+    {
+      tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
+      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
+						   NULL, "PROF_edge_counter");
+      gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
+      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
+					      NULL, "PROF_edge_counter");
+      gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
+					    gimple_assign_lhs (stmt1), one);
+      gassign *stmt3 = gimple_build_assign (unshare_expr (ref),
+					    gimple_assign_lhs (stmt2));
+      gsi_insert_on_edge (e, stmt1);
+      gsi_insert_on_edge (e, stmt2);
+      gsi_insert_on_edge (e, stmt3);
+    }
 }
 
 /* Emits code to get VALUE to instrument at GSI, and returns the
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index f09b39b..8b0fdd9 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -860,8 +860,10 @@ LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single _gcov_merge_delta	\
 	_gcov_merge_ior _gcov_merge_time_profile _gcov_merge_icall_topn
 LIBGCOV_PROFILER = _gcov_interval_profiler _gcov_pow2_profiler		\
 	_gcov_one_value_profiler _gcov_indirect_call_profiler		\
+	_gcov_one_value_profiler_atomic					\
  	_gcov_average_profiler _gcov_ior_profiler			\
-	_gcov_indirect_call_profiler_v2 _gcov_time_profiler		\
+	_gcov_indirect_call_profiler_v2					\
+	_gcov_time_profiler						\
 	_gcov_indirect_call_topn_profiler
 LIBGCOV_INTERFACE = _gcov_dump _gcov_flush _gcov_fork			\
 	_gcov_execl _gcov_execlp					\
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index e947188..1b307ac 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -93,6 +93,31 @@ __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+/* Atomic update version of __gcov_one_value_profile_body().  */
+
+static inline void
+__gcov_one_value_profiler_body_atomic (gcov_type *counters, gcov_type value)
+{
+  if (value == counters[0])
+    GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[1], 1, MEMMODEL_RELAXED);
+  else if (counters[1] == 0)
+    {
+      counters[1] = 1;
+      counters[0] = value;
+    }
+  else
+    GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[1], -1, MEMMODEL_RELAXED);
+  GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[2], 1, MEMMODEL_RELAXED);
+}
+
+#ifdef L_gcov_one_value_profiler_atomic
+void
+__gcov_one_value_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __gcov_one_value_profiler_body_atomic (counters, value);
+}
+#endif
+
 #ifdef L_gcov_indirect_call_topn_profiler
 /* Tries to keep track the most frequent N values in the counters where
    N is specified by parameter TOPN_VAL. To track top N values, 2*N counter
@@ -229,6 +254,7 @@ __gcov_indirect_call_topn_profiler (gcov_type value, void* cur_func)
 	  && *(void **) cur_func == *(void **) callee_func))
     __gcov_topn_value_profiler_body (__gcov_indirect_call_topn_counters, value);
 }
+
 #endif
 
 #ifdef L_gcov_indirect_call_profiler
@@ -291,9 +317,23 @@ __gcov_indirect_call_profiler_v2 (gcov_type value, void* cur_func)
      the descriptors to see if they point to the same function.  */
   if (cur_func == __gcov_indirect_call_callee
       || (__LIBGCC_VTABLE_USES_DESCRIPTORS__ && __gcov_indirect_call_callee
-          && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
+	  && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
     __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value);
 }
+
+/* Atomic update version of __gcov_indirect_call_profiler_v2().  */
+void
+__gcov_indirect_call_profiler_v2_atomic (gcov_type value, void* cur_func)
+{
+  /* If the C++ virtual tables contain function descriptors then one
+     function may have multiple descriptors and we need to dereference
+     the descriptors to see if they point to the same function.  */
+  if (cur_func == __gcov_indirect_call_callee
+      || (__LIBGCC_VTABLE_USES_DESCRIPTORS__ && __gcov_indirect_call_callee
+	  && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
+    __gcov_one_value_profiler_body_atomic (__gcov_indirect_call_counters,
+					   value);
+}
 #endif
 
 #ifdef L_gcov_time_profiler
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index ae77998..0bd905b 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -270,9 +270,11 @@ extern void __gcov_merge_icall_topn (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
 extern void __gcov_interval_profiler (gcov_type *, gcov_type, int, unsigned);
 extern void __gcov_pow2_profiler (gcov_type *, gcov_type);
 extern void __gcov_one_value_profiler (gcov_type *, gcov_type);
+extern void __gcov_one_value_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_profiler (gcov_type*, gcov_type,
                                            void*, void*);
 extern void __gcov_indirect_call_profiler_v2 (gcov_type, void *);
+extern void __gcov_indirect_call_profiler_v2_atomic (gcov_type, void *);
 extern void __gcov_time_profiler (gcov_type *);
 extern void __gcov_average_profiler (gcov_type *, gcov_type);
 extern void __gcov_ior_profiler (gcov_type *, gcov_type);
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 0/4] Various GCOV/PGO improvements
  2016-08-01  8:50 [PATCH 0/4] Various GCOV/PGO improvements marxin
                   ` (3 preceding siblings ...)
  2016-08-01  8:50 ` [PATCH 4/4] Add tests for __gcov_dump and __gcov_reset marxin
@ 2016-08-01 12:11 ` Nathan Sidwell
  4 siblings, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-01 12:11 UTC (permalink / raw)
  To: marxin, gcc-patches; +Cc: jh

On 08/01/16 04:48, marxin wrote:
> Hi.
>
> My attempt in the following small series is to cover couple of issues
> I've recently observed. I'll briefly describe changes in respect to
> an individual patch:
>
> marxin (4):
>   Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
>
> As mentioned in [1], our current implementation can produce a corrupted
> profile due to a massive usage of threads. Well, the absolutely robust
> solution would be to either utilize TLS or to use atomics & locking
> mechanism. However, as David Li pointed out, the most interesting
> counters that can suffer from multithreading are -fprofile-arcs counters
> and indirect call counters. I've just cherry picked the functionality
> from google/gcc-4_9 branch.
>
> [1] https://gcc.gnu.org/ml/gcc/2016-07/msg00131.html
>
>   Remove __gcov_indirect_call_profiler
>
> The profiler function is unused, thus can be removed.
>
>   Fix typo in gcov.texi
>
> Just a small typo in names of functions that one can call from
> a user application.
>
>   Add tests for __gcov_dump and __gcov_reset
>
> Adding tests for the aforementioned functions.

>

Patches 2,3 & 4 are ok.  patch 1 (the fprofile-generate-atomic) needs work. 
I'll respond to that one directly.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-01  8:50 ` [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch marxin
@ 2016-08-01 12:22   ` Nathan Sidwell
  2016-08-01 13:29     ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-01 12:22 UTC (permalink / raw)
  To: marxin, gcc-patches; +Cc: jh

As I just  wrote, this patch needs work.  the general points are:
1) exposing integers 0-3 to the user as switch values.  Don't do that, give them 
names.  In this  case a comma separated list of orthogonal names seems 
appropriate.  But see below.
2) Poor documentation.  How might the user might choose an appropriate setting? 
(what happens if compilations need to use different settings).  What are 'edge' 
and 'value' counters.   Why might one want different settings for them?

I think this is jumping too deep into a solution with insufficient evidence. 
Particularly, why two edges and values can be set differently.  It doesn't lend 
itself to extending to TLS, if that proves to be a good solution (trading memory 
for time).  Something along the lines of 
'-fprofile-update={single,atomic,threaded},[edge,value]' might be better.  I.e. 
set the scheme as part of the option value, followed by  a list of the things it 
applies to.  (and as I hope I've  implied, it'd be good not to have that 
separate list until proven otherwise).


On 07/28/16 08:32, marxin wrote:
> libgcc/ChangeLog:
>
> 2016-07-28  Martin Liska  <mliska@suse.cz>

Shouldn't the original authors be named here too? (applies to the other patches 
too).


> --- a/gcc/gcov-io.h
> +++ b/gcc/gcov-io.h
> @@ -169,6 +169,19 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see

> +
> +#if LONG_LONG_TYPE_SIZE > 32
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
> +#else
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
> +#endif
...
>  #if IN_LIBGCOV
> +
> +#if LONG_LONG_TYPE_SIZE > 32
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
> +#else
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
> +#endif


BTW, these two blocks look stunningly similar.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-01 12:22   ` Nathan Sidwell
@ 2016-08-01 13:29     ` Martin Liška
  2016-08-04 14:48       ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-01 13:29 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 2613 bytes --]

On 08/01/2016 02:22 PM, Nathan Sidwell wrote:
> As I just  wrote, this patch needs work.  the general points are:

Thank for the comments.

> 1) exposing integers 0-3 to the user as switch values.  Don't do that, give them names.  In this  case a comma separated list of orthogonal names seems appropriate.  But see below.
> 2) Poor documentation.  How might the user might choose an appropriate setting? (what happens if compilations need to use different settings).  What are 'edge' and 'value' counters.   Why might one want different settings for them?

Sure, fully agree that it currently doesn't make sense to distinguish between individual types of profiles (edge, value).

> 
> I think this is jumping too deep into a solution with insufficient evidence. Particularly, why two edges and values can be set differently.  It doesn't lend itself to extending to TLS, if that proves to be a good solution (trading memory for time).  Something along the lines of '-fprofile-update={single,atomic,threaded},[edge,value]' might be better.  I.e. set the scheme as part of the option value, followed by  a list of the things it applies to.  (and as I hope I've  implied, it'd be good not to have that separate list until proven otherwise).

Yes.

> 
> 
> On 07/28/16 08:32, marxin wrote:
>> libgcc/ChangeLog:
>>
>> 2016-07-28  Martin Liska  <mliska@suse.cz>
> 
> Shouldn't the original authors be named here too? (applies to the other patches too).

Adding a cherry-pick entry to the original commit of the functionality.

> 
> 
>> --- a/gcc/gcov-io.h
>> +++ b/gcc/gcov-io.h
>> @@ -169,6 +169,19 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> 
>> +
>> +#if LONG_LONG_TYPE_SIZE > 32
>> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
>> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
>> +#else
>> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
>> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
>> +#endif
> ...
>>  #if IN_LIBGCOV
>> +
>> +#if LONG_LONG_TYPE_SIZE > 32
>> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
>> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
>> +#else
>> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
>> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
>> +#endif
> 
> 
> BTW, these two blocks look stunningly similar.

Fixed.

I also added a small hunk that describes problematic of app having not-joined (or detached) threads,
can you please take a look at documentation change, maybe it would need some transformation?

Martin

> 
> nathan


[-- Attachment #2: 0001-Cherry-pick-fprofile-generate-atomic-from-google-gcc-v2.patch --]
[-- Type: text/x-patch, Size: 14228 bytes --]

From 7e19e28f3d6e227bb67fb770575831d637abe3aa Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Thu, 28 Jul 2016 14:32:47 +0200
Subject: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9
 branch

libgcc/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	Cherry picked (and modified) from google-4_7 branch
	2012-12-26  Rong Xu  <xur@google.com>

	* Makefile.in: Add functions to LIBGCOV_PROFILER.
	* libgcov-profiler.c (__gcov_one_value_profiler_body_atomic):
	New function.
	(__gcov_one_value_profiler_atomic): Likewise.
	(__gcov_indirect_call_profiler_v2): Fix GNU coding style.
	(__gcov_indirect_call_profiler_v2_atomic): New function.
	* libgcov.h: Declare __gcov_indirect_call_profiler_v2_atomic and
	__gcov_one_value_profiler_body_atomic.

gcc/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	Cherry picked (and modified) from google-4_7 branch
	2012-12-26  Rong Xu  <xur@google.com>

	* common.opt (fprofile-update): Add new flag.
	* gcov-io.h: Declare GCOV_TYPE_ATOMIC_FETCH_ADD and
	GCOV_TYPE_ATOMIC_FETCH_ADD_FN.
	* tree-profile.c (gimple_init_edge_profiler): Generate
	also atomic profiler update.
	(gimple_gen_edge_profiler): Likewise.
	* doc/invoke.texi: Document -fprofile-update.

gcc/testsuite/ChangeLog:

2016-07-28  Martin Liska  <mliska@suse.cz>

	* g++.dg/gcov/gcov-threads-1.C: New test.
---
 gcc/common.opt                             | 13 +++++++
 gcc/coretypes.h                            |  6 +++
 gcc/doc/invoke.texi                        | 12 ++++++
 gcc/gcov-io.h                              |  8 ++++
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C | 46 ++++++++++++++++++++++
 gcc/tree-profile.c                         | 61 ++++++++++++++++++++----------
 libgcc/Makefile.in                         |  4 +-
 libgcc/libgcov-profiler.c                  | 42 +++++++++++++++++++-
 libgcc/libgcov.h                           |  2 +
 9 files changed, 173 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C

diff --git a/gcc/common.opt b/gcc/common.opt
index 8a292ed..44adae8 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1916,6 +1916,19 @@ fprofile-correction
 Common Report Var(flag_profile_correction)
 Enable correction of flow inconsistent profile data input.
 
+fprofile-update=
+Common Joined RejectNegative Enum(profile_update) Var(flag_profile_update) Init(PROFILE_UPDATE_SINGLE)
+-fprofile-update=[single|atomic]	Set the profile update method.
+
+Enum
+Name(profile_update) Type(enum profile_update) UnknownError(unknown profile update method %qs)
+
+EnumValue
+Enum(profile_update) String(single) Value(PROFILE_UPDATE_SINGLE)
+
+EnumValue
+Enum(profile_update) String(atomic) Value(PROFILE_UPDATE_ATOMIC)
+
 fprofile-generate
 Common
 Enable common options for generating profile info for profile feedback directed optimizations.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index b3a91a6..fe1e984 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -174,6 +174,12 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of profile update methods.  */
+enum profile_update {
+  PROFILE_UPDATE_SINGLE,
+  PROFILE_UPDATE_ATOMIC
+};
+
 /* Types of unwind/exception handling info that can be generated.  */
 
 enum unwind_info_type
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 22001f9..1cfaae7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9933,6 +9933,18 @@ the profile feedback data files. See @option{-fprofile-dir}.
 To optimize the program based on the collected profile information, use
 @option{-fprofile-use}.  @xref{Optimize Options}, for more information.
 
+@item -fprofile-update=@var{method}
+@opindex fprofile-update
+
+Alter the update method for an application instrumented for profile
+feedback based optimization.  The @var{method} argument should be one of
+@samp{single} or @samp{atomic}.  The first one is useful for single-threaded
+applications, while the second one prevents profile corruption by emitting
+thread-safe code.
+
+@strong{Warning:} When an application does not properly join all threads
+(or creates an detached thread), a profile file can be still corrupted.
+
 @item -fsanitize=address
 @opindex fsanitize=address
 Enable AddressSanitizer, a fast memory error detector.
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index bbf013a..afd00ac 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -164,6 +164,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #ifndef GCC_GCOV_IO_H
 #define GCC_GCOV_IO_H
 
+#if LONG_LONG_TYPE_SIZE > 32
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
+#else
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
+#endif
+
 #ifndef IN_LIBGCOV
 /* About the host */
 
diff --git a/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
new file mode 100644
index 0000000..a4a6f0a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
@@ -0,0 +1,46 @@
+/* { dg-options "-fprofile-arcs -ftest-coverage -pthread -fprofile-update=atomic" } */
+/* { dg-do run { target native } } */
+
+#include <stdint.h>
+#include <pthread.h>
+#include <assert.h>
+
+#define NR 5
+
+pthread_mutex_t cndMs[NR];
+static void *ContentionNoDeadlock_thread(void *start)
+{
+  for (uint32_t k = 0; k < 100000; ++k)		/* count(500005) */
+    {
+      int starti = *(int*)start;		/* count(500000) */
+      for (uint32_t i = starti; i < NR; ++i) 
+	pthread_mutex_lock (&cndMs[i]);
+      for (int32_t i = NR - 1; i >= starti; --i)
+	pthread_mutex_unlock (&cndMs[i]);
+  }
+}
+int main(int argc, char **argv) {
+  for (unsigned i = 0; i < NR; i++)
+    cndMs[i] = PTHREAD_MUTEX_INITIALIZER;
+
+  pthread_t t[NR];
+  int ids[NR];
+
+  for (int i = 0; i < NR; i++)
+  {
+    ids[i] = i;
+    int r = pthread_create (&t[i], NULL, ContentionNoDeadlock_thread, &ids[i]);
+    assert (r == 0);				/* count(5) */
+  }
+
+  int ret;
+  for (int i = 0; i < NR; i++)
+    {
+      int r = pthread_join (t[i], (void**)&ret);
+      assert (r == 0);				/* count(5) */
+    }
+
+  return 0;					/* count(1) */
+}
+
+/* { dg-final { run-gcov gcov-threads-1.C } } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 39fe15f..a2c86ac 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -164,7 +164,12 @@ gimple_init_edge_profiler (void)
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      tree_one_value_profiler_fn
+      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+	tree_one_value_profiler_fn
+	      = build_fn_decl ("__gcov_one_value_profiler_atomic",
+				     one_value_profiler_fn_type);
+      else
+	tree_one_value_profiler_fn
 	      = build_fn_decl ("__gcov_one_value_profiler",
 				     one_value_profiler_fn_type);
       TREE_NOTHROW (tree_one_value_profiler_fn) = 1;
@@ -180,11 +185,14 @@ gimple_init_edge_profiler (void)
 					  gcov_type_node,
 					  ptr_void,
 					  NULL_TREE);
+      const char *profiler_fn_name = "__gcov_indirect_call_profiler_v2";
+      if (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE))
+	profiler_fn_name = "__gcov_indirect_call_topn_profiler";
+      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";
+
       tree_indirect_call_profiler_fn
-	      = build_fn_decl ( (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
-				 "__gcov_indirect_call_topn_profiler":
-				 "__gcov_indirect_call_profiler_v2"),
-			       ic_profiler_fn_type);
+	      = build_fn_decl (profiler_fn_name, ic_profiler_fn_type);
 
       TREE_NOTHROW (tree_indirect_call_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_indirect_call_profiler_fn)
@@ -241,22 +249,37 @@ gimple_init_edge_profiler (void)
 void
 gimple_gen_edge_profiler (int edgeno, edge e)
 {
-  tree ref, one, gcov_type_tmp_var;
-  gassign *stmt1, *stmt2, *stmt3;
+  tree one;
 
-  ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
   one = build_int_cst (gcov_type_node, 1);
-  gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					  NULL, "PROF_edge_counter");
-  stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
-  gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					  NULL, "PROF_edge_counter");
-  stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
-			       gimple_assign_lhs (stmt1), one);
-  stmt3 = gimple_build_assign (unshare_expr (ref), gimple_assign_lhs (stmt2));
-  gsi_insert_on_edge (e, stmt1);
-  gsi_insert_on_edge (e, stmt2);
-  gsi_insert_on_edge (e, stmt3);
+
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+    {
+      /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
+      tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
+      gcall *stmt
+	= gimple_build_call (builtin_decl_explicit (GCOV_TYPE_ATOMIC_FETCH_ADD),
+			     3, addr, one,
+			     build_int_cst (integer_type_node,
+					    MEMMODEL_RELAXED));
+      gsi_insert_on_edge (e, stmt);
+    }
+  else
+    {
+      tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
+      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
+						   NULL, "PROF_edge_counter");
+      gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
+      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
+					      NULL, "PROF_edge_counter");
+      gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
+					    gimple_assign_lhs (stmt1), one);
+      gassign *stmt3 = gimple_build_assign (unshare_expr (ref),
+					    gimple_assign_lhs (stmt2));
+      gsi_insert_on_edge (e, stmt1);
+      gsi_insert_on_edge (e, stmt2);
+      gsi_insert_on_edge (e, stmt3);
+    }
 }
 
 /* Emits code to get VALUE to instrument at GSI, and returns the
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index f09b39b..8b0fdd9 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -860,8 +860,10 @@ LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single _gcov_merge_delta	\
 	_gcov_merge_ior _gcov_merge_time_profile _gcov_merge_icall_topn
 LIBGCOV_PROFILER = _gcov_interval_profiler _gcov_pow2_profiler		\
 	_gcov_one_value_profiler _gcov_indirect_call_profiler		\
+	_gcov_one_value_profiler_atomic					\
  	_gcov_average_profiler _gcov_ior_profiler			\
-	_gcov_indirect_call_profiler_v2 _gcov_time_profiler		\
+	_gcov_indirect_call_profiler_v2					\
+	_gcov_time_profiler						\
 	_gcov_indirect_call_topn_profiler
 LIBGCOV_INTERFACE = _gcov_dump _gcov_flush _gcov_fork			\
 	_gcov_execl _gcov_execlp					\
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index e947188..1b307ac 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -93,6 +93,31 @@ __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+/* Atomic update version of __gcov_one_value_profile_body().  */
+
+static inline void
+__gcov_one_value_profiler_body_atomic (gcov_type *counters, gcov_type value)
+{
+  if (value == counters[0])
+    GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[1], 1, MEMMODEL_RELAXED);
+  else if (counters[1] == 0)
+    {
+      counters[1] = 1;
+      counters[0] = value;
+    }
+  else
+    GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[1], -1, MEMMODEL_RELAXED);
+  GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[2], 1, MEMMODEL_RELAXED);
+}
+
+#ifdef L_gcov_one_value_profiler_atomic
+void
+__gcov_one_value_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __gcov_one_value_profiler_body_atomic (counters, value);
+}
+#endif
+
 #ifdef L_gcov_indirect_call_topn_profiler
 /* Tries to keep track the most frequent N values in the counters where
    N is specified by parameter TOPN_VAL. To track top N values, 2*N counter
@@ -229,6 +254,7 @@ __gcov_indirect_call_topn_profiler (gcov_type value, void* cur_func)
 	  && *(void **) cur_func == *(void **) callee_func))
     __gcov_topn_value_profiler_body (__gcov_indirect_call_topn_counters, value);
 }
+
 #endif
 
 #ifdef L_gcov_indirect_call_profiler
@@ -291,9 +317,23 @@ __gcov_indirect_call_profiler_v2 (gcov_type value, void* cur_func)
      the descriptors to see if they point to the same function.  */
   if (cur_func == __gcov_indirect_call_callee
       || (__LIBGCC_VTABLE_USES_DESCRIPTORS__ && __gcov_indirect_call_callee
-          && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
+	  && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
     __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value);
 }
+
+/* Atomic update version of __gcov_indirect_call_profiler_v2().  */
+void
+__gcov_indirect_call_profiler_v2_atomic (gcov_type value, void* cur_func)
+{
+  /* If the C++ virtual tables contain function descriptors then one
+     function may have multiple descriptors and we need to dereference
+     the descriptors to see if they point to the same function.  */
+  if (cur_func == __gcov_indirect_call_callee
+      || (__LIBGCC_VTABLE_USES_DESCRIPTORS__ && __gcov_indirect_call_callee
+	  && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
+    __gcov_one_value_profiler_body_atomic (__gcov_indirect_call_counters,
+					   value);
+}
 #endif
 
 #ifdef L_gcov_time_profiler
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index ae77998..0bd905b 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -270,9 +270,11 @@ extern void __gcov_merge_icall_topn (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
 extern void __gcov_interval_profiler (gcov_type *, gcov_type, int, unsigned);
 extern void __gcov_pow2_profiler (gcov_type *, gcov_type);
 extern void __gcov_one_value_profiler (gcov_type *, gcov_type);
+extern void __gcov_one_value_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_profiler (gcov_type*, gcov_type,
                                            void*, void*);
 extern void __gcov_indirect_call_profiler_v2 (gcov_type, void *);
+extern void __gcov_indirect_call_profiler_v2_atomic (gcov_type, void *);
 extern void __gcov_time_profiler (gcov_type *);
 extern void __gcov_average_profiler (gcov_type *, gcov_type);
 extern void __gcov_ior_profiler (gcov_type *, gcov_type);
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-01 13:29     ` Martin Liška
@ 2016-08-04 14:48       ` Nathan Sidwell
  2016-08-04 15:34         ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-04 14:48 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/01/16 09:29, Martin Liška wrote:

> I also added a small hunk that describes problematic of app having not-joined (or detached) threads,
> can you please take a look at documentation change, maybe it would need some transformation?

sorry for the tady response,thanks for the ping.

In general good.   Some nits:


+++ b/gcc/tree-profile.c
@@ -164,7 +164,12 @@ gimple_init_edge_profiler (void)
  	      = build_function_type_list (void_type_node,
  					  gcov_type_ptr, gcov_type_node,
  					  NULL_TREE);
-      tree_one_value_profiler_fn
+      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+	tree_one_value_profiler_fn
+	      = build_fn_decl ("__gcov_one_value_profiler_atomic",
+				     one_value_profiler_fn_type);
+      else
+	tree_one_value_profiler_fn
  	      = build_fn_decl ("__gcov_one_value_profiler",
  				     one_value_profiler_fn_type);

this hunk uses a different idiom to ...

@@ -180,11 +185,14 @@ gimple_init_edge_profiler (void)
  					  gcov_type_node,
  					  ptr_void,
  					  NULL_TREE);
+      const char *profiler_fn_name = "__gcov_indirect_call_profiler_v2";
+      if (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE))
+	profiler_fn_name = "__gcov_indirect_call_topn_profiler";
+      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";

I prefer the latter's approach.

@@ -241,22 +249,37 @@ gimple_init_edge_profiler (void)
  void
  gimple_gen_edge_profiler (int edgeno, edge e)
...
+  else
+    {
/* COMMENT thread unsafe sequence */
+      tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);


diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
+static inline void
+__gcov_one_value_profiler_body_atomic (gcov_type *counters, gcov_type value)
+{
...

The body looks to have data races.  Some kind of cmp_store needed on 
counters[1]?  Maybe it can't be completely race free?

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-04 14:48       ` Nathan Sidwell
@ 2016-08-04 15:34         ` Martin Liška
  2016-08-04 16:43           ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-04 15:34 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

On 08/04/2016 04:48 PM, Nathan Sidwell wrote:
> diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
> +static inline void
> +__gcov_one_value_profiler_body_atomic (gcov_type *counters, gcov_type value)
> +{
> ...
> 
> The body looks to have data races.  Some kind of cmp_store needed on counters[1]?  Maybe it can't be completely race free?
> 
> nathan

You are right, as we would need to atomically change 2 values (counters[0] and counters[1]),
it's impossible IMHO. It's question what to do with that:

1) atomically update just counters[2] and live with data racing for the first 2 values
2) add (probably conditionally) a spin lock
3) do not handle thread-safety of indirect call counters at all

Thoughts?
Martin

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-04 15:34         ` Martin Liška
@ 2016-08-04 16:43           ` Nathan Sidwell
  2016-08-04 17:03             ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-04 16:43 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/04/16 11:34, Martin Liška wrote:
> On 08/04/2016 04:48 PM, Nathan Sidwell wrote:
>> diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
>> +static inline void
>> +__gcov_one_value_profiler_body_atomic (gcov_type *counters, gcov_type value)
>> +{
>> ...
>>
>> The body looks to have data races.  Some kind of cmp_store needed on counters[1]?  Maybe it can't be completely race free?
>>
>> nathan
>
> You are right, as we would need to atomically change 2 values (counters[0] and counters[1]),
> it's impossible IMHO. It's question what to do with that:
>
> 1) atomically update just counters[2] and live with data racing for the first 2 values
> 2) add (probably conditionally) a spin lock
> 3) do not handle thread-safety of indirect call counters at all

Thanks for confirming my thoughts.

For this case there are three 'counters'
1) a value we're checking.  Set when the delta is zero.
2) a count of number of uses
3) a count on the delta of the uses that matched and the uses that did not.

Notice that the recorded value can change, whenever the delta returns to zero. 
That's intentional.  This has the side effect of preventing the delta ever going 
negative.

The tricky case is resetting the value when the delta is zero.  We can't 
simultaneously set the delta and the value.  We could use a 2 step process 
though -- set delta to 'updating', set value, set delta to 1. That will put a 
compare_exchange in the hot path though ... and still turns out to be tricky.

How about:
gcov_t expected;
atomic_load (&counter[0],  val, ...);
gcov_t delta = val == value ? 1 : -1;
atomic_add (&counter[1], delta);   <-- or atomic_add_fetch
if (delta < 0) {
   /* can we set counter[0]? */
   atomic_load (&counter[1], &expected, ...);
   if (expected < 0) {
     atomic_store (&counter[0], value, ...);
     atomic_add (&counter[1], 2, ...);
   }
}
atomic_add (&counter[2], 1, ...);

This does have a race condition -- two threads could get into the inner if body. 
  But I think that's harmless.  One of them will win  the store of value, and 
both of them will restore the delta counter.  We'll end up with delta being 1 
too high.

wdyt?

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-04 16:43           ` Nathan Sidwell
@ 2016-08-04 17:03             ` Nathan Sidwell
  2016-08-05  8:55               ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-04 17:03 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/04/16 12:43, Nathan Sidwell wrote:

> How about:
> gcov_t expected;
> atomic_load (&counter[0],  val, ...);
> gcov_t delta = val == value ? 1 : -1;
> atomic_add (&counter[1], delta);   <-- or atomic_add_fetch
> if (delta < 0) {
>   /* can we set counter[0]? */
>   atomic_load (&counter[1], &expected, ...);
>   if (expected < 0) {
>     atomic_store (&counter[0], value, ...);
>     atomic_add (&counter[1], 2, ...);
>   }
> }
> atomic_add (&counter[2], 1, ...);

we could do better by using compare_exchange storing value, and detect the race 
I mentioned:

gcov_t expected, val;
atomic_load (&counter[0],  &val, ...);
gcov_t delta = val == value ? 1 : -1;
atomic_add (&counter[1], delta);
if (delta < 0) {
    retry:
     /* can we set counter[0]? */
     atomic_load (&counter[1], &expected, ...);
     if (expected < 0) {
       bool stored = atomic_compare_exchange (&counter[0], &val, &value, ...);
       if (!stored && val != value)
         goto retry;
       atomic_add (&counter[1], 2, ...);
   }
}
atomic_add (&counter[2], 1, ...);

This  corrects the off-by one issue.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-04 17:03             ` Nathan Sidwell
@ 2016-08-05  8:55               ` Martin Liška
  2016-08-05 12:38                 ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-05  8:55 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 3781 bytes --]

On 08/04/2016 07:03 PM, Nathan Sidwell wrote:
> On 08/04/16 12:43, Nathan Sidwell wrote:
> 
>> How about:
>> gcov_t expected;
>> atomic_load (&counter[0],  val, ...);
>> gcov_t delta = val == value ? 1 : -1;
>> atomic_add (&counter[1], delta);   <-- or atomic_add_fetch
>> if (delta < 0) {
>>   /* can we set counter[0]? */
>>   atomic_load (&counter[1], &expected, ...);
>>   if (expected < 0) {
>>     atomic_store (&counter[0], value, ...);
>>     atomic_add (&counter[1], 2, ...);
>>   }
>> }
>> atomic_add (&counter[2], 1, ...);
> 

Hi.

Thank you for very intensive brainstorming ;) Well I still believe that following code
is not thread safe, let's image following situation:

> we could do better by using compare_exchange storing value, and detect the race I mentioned:
> 
> gcov_t expected, val;
> atomic_load (&counter[0],  &val, ...);

[thread 1]: value == 1, read val == 1 // scheduled here

> gcov_t delta = val == value ? 1 : -1;
> atomic_add (&counter[1], delta);
> if (delta < 0) {
>    retry:
>     /* can we set counter[0]? */
>     atomic_load (&counter[1], &expected, ...);
>     if (expected < 0) {
>       bool stored = atomic_compare_exchange (&counter[0], &val, &value, ...);
>       if (!stored && val != value)
>         goto retry;
[thread 2]: value == 2, just updated counter[0] to 2
// after that [thread 1] continue, but wrongly does counter[1]++, but value != counter[0]
>       atomic_add (&counter[1], 2, ...);
>   }
> }
> atomic_add (&counter[2], 1, ...);
> 
> This  corrects the off-by one issue.
> 
> nathan

Well, I wrote attached test-case which should trigger a data-race, but TSAN is silent:

$ g++ race.cc  -pthread -fprofile-generate -g -fsanitize=thread -fprofile-update=atomic
$ ./a.out

In main: creating thread 0
In main: creating thread 1
new counter[1] value, N:0
In main: creating thread 2
new counter[1] value, N:1
new counter[1] value, N:2
new counter[1] value, N:3
new counter[1] value, N:4
new counter[1] value, N:5
new counter[1] value, N:6
new counter[1] value, N:7
new counter[1] value, N:8
new counter[1] value, N:9
new counter[1] value, N:10
new counter[1] value, N:11
new counter[1] value, N:12
new counter[1] value, N:12
new counter[1] value, N:13
new counter[1] value, N:14
new counter[1] value, N:15
new counter[1] value, N:16
In main: creating thread 3
In main: creating thread 4
In main: creating thread 5
In main: creating thread 6
In main: creating thread 7
In main: creating thread 8
In main: creating thread 9
In main: creating thread 10
In main: creating thread 11
In main: creating thread 12
In main: creating thread 13
In main: creating thread 14
In main: creating thread 15

However, not updating arc counters with atomic operations causes really many races:

$ g++ race.cc  -pthread -fprofile-generate -g -fsanitize=thread
$ ./a.out 2>&1 | grep 'data race' | wc -l
110

Sample:
WARNING: ThreadSanitizer: data race (pid=11424)
  Read of size 8 at 0x000000606718 by thread T4:
    #0 A::foo() /tmp/race.cc:10 (a.out+0x000000401e78)

  Previous write of size 8 at 0x000000606718 by thread T1:
    [failed to restore the stack]

  Location is global '__gcov0._ZN1A3fooEv' of size 16 at 0x000000606710 (a.out+0x000000606718)

  Thread T4 (tid=11429, running) created by main thread at:
    #0 pthread_create ../../../../libsanitizer/tsan/tsan_interceptors.cc:876 (libtsan.so.0+0x00000002ad2d)
    #1 main /tmp/race.cc:43 (a.out+0x000000401afb)

  Thread T1 (tid=11426, finished) created by main thread at:
    #0 pthread_create ../../../../libsanitizer/tsan/tsan_interceptors.cc:876 (libtsan.so.0+0x00000002ad2d)
    #1 main /tmp/race.cc:43 (a.out+0x000000401afb)

Maybe I miss something and my tester sample is wrong (please apply attached patch to use original __gcov_one_value_profiler_body)?
Thanks,
Martin


[-- Attachment #2: indirect-profiler-not-atomic.patch --]
[-- Type: text/x-patch, Size: 1399 bytes --]

diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 6e96fa9..42b780f 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -188,8 +188,8 @@ gimple_init_edge_profiler (void)
       profiler_fn_name = "__gcov_indirect_call_profiler_v2";
       if (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE))
 	profiler_fn_name = "__gcov_indirect_call_topn_profiler";
-      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
-	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";
+//      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+//	profiler_fn_name = "__gcov_indirect_call_profiler_v2_atomic";
 
       tree_indirect_call_profiler_fn
 	      = build_fn_decl (profiler_fn_name, ic_profiler_fn_type);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index c1e287d..d43932e 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -70,6 +70,8 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 
    In any case, COUNTERS[2] is incremented.  */
 
+static int counter = 0;
+
 static inline void
 __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
 {
@@ -77,6 +79,7 @@ __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
     counters[1]++;
   else if (counters[1] == 0)
     {
+      fprintf (stderr, "new counter[1] value, N:%d\n", counter++);
       counters[1] = 1;
       counters[0] = value;
     }

[-- Attachment #3: race.cc --]
[-- Type: text/x-c++src, Size: 949 bytes --]

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define NUM_THREADS	16
#define ITERATIONS (1 * 100 * 1000)

struct A
{
  virtual void foo() {  }
};

struct B: A
{
  virtual void foo() {  }
};

static int counter = 0;

void *PrintHello(void *p)
{
   A *a = new A();
   B *b = new B();

   for (unsigned i = 0; i < ITERATIONS; i++)
   {
     A *ptr = i % 2 ? a : b;
     ptr->foo();

     // uncommenting this produces a data race seen by tsan
     // counter++;
   }
}

int main(int argc, char *argv[])
{
   A *a = new A();
   pthread_t threads[NUM_THREADS];
   int rc;
   long t;
   for(t=0;t<NUM_THREADS;t++){
     printf("In main: creating thread %ld\n", t);
     rc = pthread_create(&threads[t], NULL, PrintHello, a);
     if (rc){
       printf("ERROR; return code from pthread_create() is %d\n", rc);
       exit(-1);
       }
     }

   int retval;
   for(t=0;t<NUM_THREADS;t++)
     pthread_join (threads[t], (void**)&retval);
}

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-05  8:55               ` Martin Liška
@ 2016-08-05 12:38                 ` Nathan Sidwell
  2016-08-05 12:48                   ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-05 12:38 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/05/16 04:55, Martin Liška wrote:

> Thank you for very intensive brainstorming ;) Well I still believe that following code
> is not thread safe, let's image following situation:
>

yeah, you're right.

>> we could do better by using compare_exchange storing value, and detect the race I mentioned:
>>
>> gcov_t expected, val;
>> atomic_load (&counter[0],  &val, ...);
>
> [thread 1]: value == 1, read val == 1 // scheduled here
>
>> gcov_t delta = val == value ? 1 : -1;
>> atomic_add (&counter[1], delta);
>> if (delta < 0) {
>>    retry:
>>     /* can we set counter[0]? */
>>     atomic_load (&counter[1], &expected, ...);
>>     if (expected < 0) {
>>       bool stored = atomic_compare_exchange (&counter[0], &val, &value, ...);
>>       if (!stored && val != value)
>>         goto retry;
> [thread 2]: value == 2, just updated counter[0] to 2
> // after that [thread 1] continue, but wrongly does counter[1]++, but value != counter[0]
>>       atomic_add (&counter[1], 2, ...);

Bah.  but (a) does it matter enough? and (b) if so does changing the delta<0 
handling to store a count of 1 solve it?: (answer: no)

gcov_t expected, val;
A:atomic_load (&counter[0],  &val, ...);
gcov_t delta = val == value ? 1 : -1;
B:atomic_add (&counter[1], delta);

if (delta < 0) {
      /* can we set counter[0]? */
      C:atomic_load (&counter[1], &expected, ...);
      if (expected < 0) {
        D:atomic_store (&counter[0], &value);
        E: atomic_store (&counter[1], 1);
   }
atomic_add (&counter[1], 2, ...);


thread-1: value = 1, reads '1' at A
thread-2: value = 2, reads '1' at A
thread-2: decrements count @ B
thread-2: reads -1 at C
thread-2: write 2 at D
thread-2: stores 1 at E
thread-1: increments count @ B (finally)

So we still can go awry.  But the code's simpler.  Like you said, I don't think 
it's possible to solve without an atomic update to both counter[0] & counter[1].


> Well, I wrote attached test-case which should trigger a data-race, but TSAN is silent:

I'm not too surprised.  The race window is tiny and you put a printf in the 
middle of one path.  I suspect if you put a sleep/printf on the counter[1] 
increment path you'll see it more often.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-05 12:38                 ` Nathan Sidwell
@ 2016-08-05 12:48                   ` Martin Liška
  2016-08-05 13:14                     ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-05 12:48 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

On 08/05/2016 02:38 PM, Nathan Sidwell wrote:
> On 08/05/16 04:55, Martin Liška wrote:
> 
>> Thank you for very intensive brainstorming ;) Well I still believe that following code
>> is not thread safe, let's image following situation:
>>
> 
> yeah, you're right.
> 
>>> we could do better by using compare_exchange storing value, and detect the race I mentioned:
>>>
>>> gcov_t expected, val;
>>> atomic_load (&counter[0],  &val, ...);
>>
>> [thread 1]: value == 1, read val == 1 // scheduled here
>>
>>> gcov_t delta = val == value ? 1 : -1;
>>> atomic_add (&counter[1], delta);
>>> if (delta < 0) {
>>>    retry:
>>>     /* can we set counter[0]? */
>>>     atomic_load (&counter[1], &expected, ...);
>>>     if (expected < 0) {
>>>       bool stored = atomic_compare_exchange (&counter[0], &val, &value, ...);
>>>       if (!stored && val != value)
>>>         goto retry;
>> [thread 2]: value == 2, just updated counter[0] to 2
>> // after that [thread 1] continue, but wrongly does counter[1]++, but value != counter[0]
>>>       atomic_add (&counter[1], 2, ...);
> 
> Bah.  but (a) does it matter enough? and (b) if so does changing the delta<0 handling to store a count of 1 solve it?: (answer: no)
> 
> gcov_t expected, val;
> A:atomic_load (&counter[0],  &val, ...);
> gcov_t delta = val == value ? 1 : -1;
> B:atomic_add (&counter[1], delta);
> 
> if (delta < 0) {
>      /* can we set counter[0]? */
>      C:atomic_load (&counter[1], &expected, ...);
>      if (expected < 0) {
>        D:atomic_store (&counter[0], &value);
>        E: atomic_store (&counter[1], 1);
>   }
> atomic_add (&counter[1], 2, ...);
> 
> 
> thread-1: value = 1, reads '1' at A
> thread-2: value = 2, reads '1' at A
> thread-2: decrements count @ B
> thread-2: reads -1 at C
> thread-2: write 2 at D
> thread-2: stores 1 at E
> thread-1: increments count @ B (finally)
> 
> So we still can go awry.  But the code's simpler.  Like you said, I don't think it's possible to solve without an atomic update to both counter[0] & counter[1].
> 
> 
>> Well, I wrote attached test-case which should trigger a data-race, but TSAN is silent:
> 
> I'm not too surprised.  The race window is tiny and you put a printf in the middle of one path.  I suspect if you put a sleep/printf on the counter[1] increment path you'll see it more often.
> 
> nathan

Ok, after all the experimenting and inventing "almost" thread-safe code, I incline to not to include __gcov_one_value_profiler_body_atomic
counter. The final solution is cumbersome and probably does not worth doing. Moreover, even having a thread-safe implementation, result of an indirect call counter
is not going to be stable among different runs (due to a single value storage capability).

If you agree, I'll prepare a final version of patch?

Thanks,
Martin

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-05 12:48                   ` Martin Liška
@ 2016-08-05 13:14                     ` Nathan Sidwell
  2016-08-05 13:43                       ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-05 13:14 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/05/16 08:48, Martin Liška wrote:

> Ok, after all the experimenting and inventing "almost" thread-safe code, I incline to not to include __gcov_one_value_profiler_body_atomic
> counter. The final solution is cumbersome and probably does not worth doing. Moreover, even having a thread-safe implementation, result of an indirect call counter
> is not going to be stable among different runs (due to a single value storage capability).
>
> If you agree, I'll prepare a final version of patch?

Agreed.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-05 13:14                     ` Nathan Sidwell
@ 2016-08-05 13:43                       ` Martin Liška
  2016-08-08 13:59                         ` [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic) Martin Liška
                                           ` (4 more replies)
  0 siblings, 5 replies; 95+ messages in thread
From: Martin Liška @ 2016-08-05 13:43 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 626 bytes --]

On 08/05/2016 03:14 PM, Nathan Sidwell wrote:
> On 08/05/16 08:48, Martin Liška wrote:
> 
>> Ok, after all the experimenting and inventing "almost" thread-safe code, I incline to not to include __gcov_one_value_profiler_body_atomic
>> counter. The final solution is cumbersome and probably does not worth doing. Moreover, even having a thread-safe implementation, result of an indirect call counter
>> is not going to be stable among different runs (due to a single value storage capability).
>>
>> If you agree, I'll prepare a final version of patch?
> 
> Agreed.
> 
> nathan
> 

Great, attaching install candidate.

Martin

[-- Attachment #2: 0001-Cherry-pick-fprofile-generate-atomic-from-google-gcc-v3.patch --]
[-- Type: text/x-patch, Size: 8707 bytes --]

From 0b3ac8636ef34b02e301f22c86dde0602f9969ef Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Thu, 28 Jul 2016 14:32:47 +0200
Subject: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9
 branch

gcc/ChangeLog:

2016-08-05  Martin Liska  <mliska@suse.cz>

	Cherry picked (and modified) from google-4_7 branch
	2012-12-26  Rong Xu  <xur@google.com>
	* common.opt (fprofile-update): Add new flag.
	* coretypes.h: Define enum profile_update.
	* doc/invoke.texi: Document -fprofile-update.
	* gcov-io.h: Declare GCOV_TYPE_ATOMIC_FETCH_ADD and
	GCOV_TYPE_ATOMIC_FETCH_ADD_FN.
	* tree-profile.c (gimple_init_edge_profiler): Generate
	also atomic profiler update.
	(gimple_gen_edge_profiler): Likewise.

gcc/testsuite/ChangeLog:

2016-08-05  Martin Liska  <mliska@suse.cz>

	* g++.dg/gcov/gcov-threads-1.C: New test.
---
 gcc/common.opt                             | 13 ++++++++
 gcc/coretypes.h                            |  6 ++++
 gcc/doc/invoke.texi                        | 12 +++++++
 gcc/gcov-io.h                              |  8 +++++
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C | 46 ++++++++++++++++++++++++++
 gcc/tree-profile.c                         | 53 ++++++++++++++++++++----------
 6 files changed, 120 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C

diff --git a/gcc/common.opt b/gcc/common.opt
index 8a292ed..44adae8 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1916,6 +1916,19 @@ fprofile-correction
 Common Report Var(flag_profile_correction)
 Enable correction of flow inconsistent profile data input.
 
+fprofile-update=
+Common Joined RejectNegative Enum(profile_update) Var(flag_profile_update) Init(PROFILE_UPDATE_SINGLE)
+-fprofile-update=[single|atomic]	Set the profile update method.
+
+Enum
+Name(profile_update) Type(enum profile_update) UnknownError(unknown profile update method %qs)
+
+EnumValue
+Enum(profile_update) String(single) Value(PROFILE_UPDATE_SINGLE)
+
+EnumValue
+Enum(profile_update) String(atomic) Value(PROFILE_UPDATE_ATOMIC)
+
 fprofile-generate
 Common
 Enable common options for generating profile info for profile feedback directed optimizations.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index b3a91a6..fe1e984 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -174,6 +174,12 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of profile update methods.  */
+enum profile_update {
+  PROFILE_UPDATE_SINGLE,
+  PROFILE_UPDATE_ATOMIC
+};
+
 /* Types of unwind/exception handling info that can be generated.  */
 
 enum unwind_info_type
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 22001f9..1cfaae7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9933,6 +9933,18 @@ the profile feedback data files. See @option{-fprofile-dir}.
 To optimize the program based on the collected profile information, use
 @option{-fprofile-use}.  @xref{Optimize Options}, for more information.
 
+@item -fprofile-update=@var{method}
+@opindex fprofile-update
+
+Alter the update method for an application instrumented for profile
+feedback based optimization.  The @var{method} argument should be one of
+@samp{single} or @samp{atomic}.  The first one is useful for single-threaded
+applications, while the second one prevents profile corruption by emitting
+thread-safe code.
+
+@strong{Warning:} When an application does not properly join all threads
+(or creates an detached thread), a profile file can be still corrupted.
+
 @item -fsanitize=address
 @opindex fsanitize=address
 Enable AddressSanitizer, a fast memory error detector.
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index bbf013a..afd00ac 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -164,6 +164,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #ifndef GCC_GCOV_IO_H
 #define GCC_GCOV_IO_H
 
+#if LONG_LONG_TYPE_SIZE > 32
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
+#else
+#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
+#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
+#endif
+
 #ifndef IN_LIBGCOV
 /* About the host */
 
diff --git a/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
new file mode 100644
index 0000000..a4a6f0a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
@@ -0,0 +1,46 @@
+/* { dg-options "-fprofile-arcs -ftest-coverage -pthread -fprofile-update=atomic" } */
+/* { dg-do run { target native } } */
+
+#include <stdint.h>
+#include <pthread.h>
+#include <assert.h>
+
+#define NR 5
+
+pthread_mutex_t cndMs[NR];
+static void *ContentionNoDeadlock_thread(void *start)
+{
+  for (uint32_t k = 0; k < 100000; ++k)		/* count(500005) */
+    {
+      int starti = *(int*)start;		/* count(500000) */
+      for (uint32_t i = starti; i < NR; ++i) 
+	pthread_mutex_lock (&cndMs[i]);
+      for (int32_t i = NR - 1; i >= starti; --i)
+	pthread_mutex_unlock (&cndMs[i]);
+  }
+}
+int main(int argc, char **argv) {
+  for (unsigned i = 0; i < NR; i++)
+    cndMs[i] = PTHREAD_MUTEX_INITIALIZER;
+
+  pthread_t t[NR];
+  int ids[NR];
+
+  for (int i = 0; i < NR; i++)
+  {
+    ids[i] = i;
+    int r = pthread_create (&t[i], NULL, ContentionNoDeadlock_thread, &ids[i]);
+    assert (r == 0);				/* count(5) */
+  }
+
+  int ret;
+  for (int i = 0; i < NR; i++)
+    {
+      int r = pthread_join (t[i], (void**)&ret);
+      assert (r == 0);				/* count(5) */
+    }
+
+  return 0;					/* count(1) */
+}
+
+/* { dg-final { run-gcov gcov-threads-1.C } } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 39fe15f..740f7ab 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -127,6 +127,7 @@ gimple_init_edge_profiler (void)
   tree ic_profiler_fn_type;
   tree average_profiler_fn_type;
   tree time_profiler_fn_type;
+  const char *profiler_fn_name;
 
   if (!gcov_type_node)
     {
@@ -180,11 +181,12 @@ gimple_init_edge_profiler (void)
 					  gcov_type_node,
 					  ptr_void,
 					  NULL_TREE);
+      profiler_fn_name = "__gcov_indirect_call_profiler_v2";
+      if (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE))
+	profiler_fn_name = "__gcov_indirect_call_topn_profiler";
+
       tree_indirect_call_profiler_fn
-	      = build_fn_decl ( (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ?
-				 "__gcov_indirect_call_topn_profiler":
-				 "__gcov_indirect_call_profiler_v2"),
-			       ic_profiler_fn_type);
+	      = build_fn_decl (profiler_fn_name, ic_profiler_fn_type);
 
       TREE_NOTHROW (tree_indirect_call_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_indirect_call_profiler_fn)
@@ -241,22 +243,37 @@ gimple_init_edge_profiler (void)
 void
 gimple_gen_edge_profiler (int edgeno, edge e)
 {
-  tree ref, one, gcov_type_tmp_var;
-  gassign *stmt1, *stmt2, *stmt3;
+  tree one;
 
-  ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
   one = build_int_cst (gcov_type_node, 1);
-  gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					  NULL, "PROF_edge_counter");
-  stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
-  gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					  NULL, "PROF_edge_counter");
-  stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
-			       gimple_assign_lhs (stmt1), one);
-  stmt3 = gimple_build_assign (unshare_expr (ref), gimple_assign_lhs (stmt2));
-  gsi_insert_on_edge (e, stmt1);
-  gsi_insert_on_edge (e, stmt2);
-  gsi_insert_on_edge (e, stmt3);
+
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+    {
+      /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
+      tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
+      gcall *stmt
+	= gimple_build_call (builtin_decl_explicit (GCOV_TYPE_ATOMIC_FETCH_ADD),
+			     3, addr, one,
+			     build_int_cst (integer_type_node,
+					    MEMMODEL_RELAXED));
+      gsi_insert_on_edge (e, stmt);
+    }
+  else
+    {
+      tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
+      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
+						   NULL, "PROF_edge_counter");
+      gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
+      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
+					      NULL, "PROF_edge_counter");
+      gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
+					    gimple_assign_lhs (stmt1), one);
+      gassign *stmt3 = gimple_build_assign (unshare_expr (ref),
+					    gimple_assign_lhs (stmt2));
+      gsi_insert_on_edge (e, stmt1);
+      gsi_insert_on_edge (e, stmt2);
+      gsi_insert_on_edge (e, stmt3);
+    }
 }
 
 /* Emits code to get VALUE to instrument at GSI, and returns the
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic)
  2016-08-05 13:43                       ` Martin Liška
@ 2016-08-08 13:59                         ` Martin Liška
  2016-08-08 15:24                           ` Nathan Sidwell
  2016-08-09 11:24                         ` [PATCH] Set -fprofile-update=atomic when -pthread is present Martin Liška
                                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-08 13:59 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 767 bytes --]

Hello.

This patch is follow-up of the series where I introduce a set of counter update
function that are thread-safe. I originally thought that majority of profile corruptions are
caused by non-atomic updated of CFG (-fprofile-arc). But there are multiple counters that compare
it's count to a number of execution of a basic block:

blake2s.cpp:150:40: error: corrupted value profile: value profile counter (11301120 out of 11314388) inconsistent with basic-block count (11555117)
       memcpy( S->buf + left, in, fill ); // Fill buffer

This can be seen for unrar binary: PR58306. I'm also adding a simple test-case which reveals the inconsistency: val-profiler-threads-1.c.

I've been running regression tests, ready to install after it finishes?

Thanks,
Martin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-Add-new-_atomic-counter-update-function-fprofile-upd.patch --]
[-- Type: text/x-patch; name="0005-Add-new-_atomic-counter-update-function-fprofile-upd.patch", Size: 14545 bytes --]

From a55681992f572b0591a24e82715ddc01330230e0 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 8 Aug 2016 15:44:28 +0200
Subject: [PATCH 5/5] Add new *_atomic counter update function
 (-fprofile-update=atomic)

libgcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* Makefile.in: New functions (modules) are added.
	* libgcov-profiler.c (__gcov_interval_profiler_atomic): New
	function.
	(__gcov_pow2_profiler_atomic): New function.
	(__gcov_one_value_profiler_body): New argument is instroduced.
	(__gcov_one_value_profiler): Call with the new argument.
	(__gcov_one_value_profiler_atomic): Likewise.
	(__gcov_indirect_call_profiler_v2): Likewise.
	(__gcov_time_profiler_atomic): New function.
	(__gcov_average_profiler_atomic): Likewise.
	(__gcov_ior_profiler_atomic): Likewise.
	* libgcov.h: Declare the aforementioned functions.

gcc/testsuite/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* gcc.dg/tree-prof/val-profiler-threads-1.c: New test.

gcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* tree-profile.c (gimple_init_edge_profiler): Create conditionally
	atomic variants of profile update functions.
---
 .../gcc.dg/tree-prof/val-profiler-threads-1.c      | 41 +++++++++
 gcc/tree-profile.c                                 | 36 ++++----
 libgcc/Makefile.in                                 | 15 +++-
 libgcc/libgcov-profiler.c                          | 97 ++++++++++++++++++++--
 libgcc/libgcov.h                                   | 10 +++
 5 files changed, 173 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
new file mode 100644
index 0000000..0f7477e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
@@ -0,0 +1,41 @@
+/* { dg-options "-O0 -pthread -fprofile-update=atomic" } */
+#include <pthread.h>
+
+#define NUM_THREADS	8
+#define SIZE 1024
+#define ITERATIONS (1000 * 1000)
+
+char buffer[SIZE];
+char buffer2[SIZE];
+
+void *copy_memory(char *dst, char *src, unsigned size)
+{
+   for (unsigned i = 0; i < ITERATIONS; i++)
+   {
+     dst[size % 10] = src[size % 20];
+   }
+}
+
+void *foo(void *d)
+{
+  copy_memory (buffer, buffer2, SIZE);
+}
+
+int main(int argc, char *argv[])
+{
+   pthread_t threads[NUM_THREADS];
+   int rc;
+   long t;
+   for(t=0;t<NUM_THREADS;t++){
+     rc = pthread_create(&threads[t], NULL, foo, 0);
+     if (rc){
+	 return 1;
+       }
+     }
+
+   int retval;
+   for(t=0;t<NUM_THREADS;t++)
+     pthread_join (threads[t], (void**)&retval);
+
+   return buffer[10];
+}
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 740f7ab..0b29e72 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -128,9 +128,13 @@ gimple_init_edge_profiler (void)
   tree average_profiler_fn_type;
   tree time_profiler_fn_type;
   const char *profiler_fn_name;
+  const char *fn_name;
 
   if (!gcov_type_node)
     {
+      const char *fn_suffix
+	= flag_profile_update == PROFILE_UPDATE_ATOMIC ? "_atomic" : "";
+
       gcov_type_node = get_gcov_type ();
       gcov_type_ptr = build_pointer_type (gcov_type_node);
 
@@ -140,9 +144,9 @@ gimple_init_edge_profiler (void)
 					  gcov_type_ptr, gcov_type_node,
 					  integer_type_node,
 					  unsigned_type_node, NULL_TREE);
-      tree_interval_profiler_fn
-	      = build_fn_decl ("__gcov_interval_profiler",
-				     interval_profiler_fn_type);
+      fn_name = concat ("__gcov_interval_profiler", fn_suffix, NULL);
+      tree_interval_profiler_fn = build_fn_decl (fn_name,
+						 interval_profiler_fn_type);
       TREE_NOTHROW (tree_interval_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_interval_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -153,8 +157,8 @@ gimple_init_edge_profiler (void)
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      tree_pow2_profiler_fn = build_fn_decl ("__gcov_pow2_profiler",
-						   pow2_profiler_fn_type);
+      fn_name = concat ("__gcov_pow2_profiler", fn_suffix, NULL);
+      tree_pow2_profiler_fn = build_fn_decl (fn_name, pow2_profiler_fn_type);
       TREE_NOTHROW (tree_pow2_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_pow2_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -165,9 +169,9 @@ gimple_init_edge_profiler (void)
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      tree_one_value_profiler_fn
-	      = build_fn_decl ("__gcov_one_value_profiler",
-				     one_value_profiler_fn_type);
+      fn_name = concat ("__gcov_one_value_profiler", fn_suffix, NULL);
+      tree_one_value_profiler_fn = build_fn_decl (fn_name,
+						  one_value_profiler_fn_type);
       TREE_NOTHROW (tree_one_value_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_one_value_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -197,9 +201,8 @@ gimple_init_edge_profiler (void)
       time_profiler_fn_type
 	       = build_function_type_list (void_type_node,
 					  gcov_type_ptr, NULL_TREE);
-      tree_time_profiler_fn
-	      = build_fn_decl ("__gcov_time_profiler",
-				     time_profiler_fn_type);
+      fn_name = concat ("__gcov_time_profiler", fn_suffix, NULL);
+      tree_time_profiler_fn = build_fn_decl (fn_name, time_profiler_fn_type);
       TREE_NOTHROW (tree_time_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_time_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -209,16 +212,15 @@ gimple_init_edge_profiler (void)
       average_profiler_fn_type
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node, NULL_TREE);
-      tree_average_profiler_fn
-	      = build_fn_decl ("__gcov_average_profiler",
-				     average_profiler_fn_type);
+      fn_name = concat ("__gcov_average_profiler", fn_suffix, NULL);
+      tree_average_profiler_fn = build_fn_decl (fn_name,
+						average_profiler_fn_type);
       TREE_NOTHROW (tree_average_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_average_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
 		     DECL_ATTRIBUTES (tree_average_profiler_fn));
-      tree_ior_profiler_fn
-	      = build_fn_decl ("__gcov_ior_profiler",
-				     average_profiler_fn_type);
+      fn_name = concat ("__gcov_ior_profiler", fn_suffix, NULL);
+      tree_ior_profiler_fn = build_fn_decl (fn_name, average_profiler_fn_type);
       TREE_NOTHROW (tree_ior_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_ior_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index efaf7f7..f0cbc92 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -858,10 +858,19 @@ include $(iterator)
 
 LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single _gcov_merge_delta	\
 	_gcov_merge_ior _gcov_merge_time_profile _gcov_merge_icall_topn
-LIBGCOV_PROFILER = _gcov_interval_profiler _gcov_pow2_profiler		\
+LIBGCOV_PROFILER = _gcov_interval_profiler				\
+	_gcov_interval_profiler_atomic					\
+	_gcov_pow2_profiler						\
+	_gcov_pow2_profiler_atomic					\
 	_gcov_one_value_profiler					\
- 	_gcov_average_profiler _gcov_ior_profiler			\
-	_gcov_indirect_call_profiler_v2 _gcov_time_profiler		\
+	_gcov_one_value_profiler_atomic					\
+	_gcov_average_profiler						\
+	_gcov_average_profiler_atomic					\
+	_gcov_ior_profiler						\
+	_gcov_ior_profiler_atomic					\
+	_gcov_indirect_call_profiler_v2					\
+	_gcov_time_profiler						\
+	_gcov_time_profiler_atomic					\
 	_gcov_indirect_call_topn_profiler
 LIBGCOV_INTERFACE = _gcov_dump _gcov_flush _gcov_fork			\
 	_gcov_execl _gcov_execlp					\
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 6b4557a..0837ea1 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -46,6 +46,26 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 }
 #endif
 
+#ifdef L_gcov_interval_profiler_atomic
+/* If VALUE is in interval <START, START + STEPS - 1>, then increases the
+   corresponding counter in COUNTERS.  If the VALUE is above or below
+   the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased
+   instead.  Function is thread-safe.  */
+
+void
+__gcov_interval_profiler_atomic (gcov_type *counters, gcov_type value,
+				 int start, unsigned steps)
+{
+  gcov_type delta = value - start;
+  if (delta < 0)
+    __atomic_fetch_add (&counters[steps + 1], 1, MEMMODEL_RELAXED);
+  else if (delta >= steps)
+    __atomic_fetch_add (&counters[steps], 1, MEMMODEL_RELAXED);
+  else
+    __atomic_fetch_add (&counters[delta], 1, MEMMODEL_RELAXED);
+}
+#endif
+
 #ifdef L_gcov_pow2_profiler
 /* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
    COUNTERS[0] is incremented.  */
@@ -60,6 +80,21 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_pow2_profiler_atomic
+/* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
+   COUNTERS[0] is incremented.  Function is thread-safe.  */
+
+void
+__gcov_pow2_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  if (value & (value - 1))
+    __atomic_fetch_add (&counters[0], 1, MEMMODEL_RELAXED);
+  else
+    __atomic_fetch_add (&counters[1], 1, MEMMODEL_RELAXED);
+}
+#endif
+
+
 /* Tries to determine the most common value among its inputs.  Checks if the
    value stored in COUNTERS[0] matches VALUE.  If this is the case, COUNTERS[1]
    is incremented.  If this is not the case and COUNTERS[1] is not zero,
@@ -68,10 +103,12 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
    function is called more than 50% of the time with one value, this value
    will be in COUNTERS[0] in the end.
 
-   In any case, COUNTERS[2] is incremented.  */
+   In any case, COUNTERS[2] is incremented.  If USE_ATOMIC is set to 1,
+   COUNTERS[2] is updated with an atomic instruction.  */
 
 static inline void
-__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
+__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
+				int use_atomic)
 {
   if (value == counters[0])
     counters[1]++;
@@ -82,14 +119,26 @@ __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
     }
   else
     counters[1]--;
-  counters[2]++;
+
+  if (use_atomic)
+    __atomic_fetch_add (&counters[2], 1, MEMMODEL_RELAXED);
+  else
+    counters[2]++;
 }
 
 #ifdef L_gcov_one_value_profiler
 void
 __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 {
-  __gcov_one_value_profiler_body (counters, value);
+  __gcov_one_value_profiler_body (counters, value, 0);
+}
+#endif
+
+#ifdef L_gcov_one_value_profiler_atomic
+void
+__gcov_one_value_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __gcov_one_value_profiler_body (counters, value, 1);
 }
 #endif
 
@@ -265,14 +314,14 @@ __gcov_indirect_call_profiler_v2 (gcov_type value, void* cur_func)
   if (cur_func == __gcov_indirect_call_callee
       || (__LIBGCC_VTABLE_USES_DESCRIPTORS__ && __gcov_indirect_call_callee
           && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
-    __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value);
+    __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value, 0);
 }
 #endif
 
 #ifdef L_gcov_time_profiler
 
 /* Counter for first visit of each function.  */
-static gcov_type function_counter;
+gcov_type function_counter;
 
 /* Sets corresponding COUNTERS if there is no value.  */
 
@@ -284,6 +333,19 @@ __gcov_time_profiler (gcov_type* counters)
 }
 #endif
 
+/* Sets corresponding COUNTERS if there is no value.
+   Function is thread-safe.  */
+
+#ifdef L_gcov_time_profiler_atomic
+void
+__gcov_time_profiler_atomic (gcov_type* counters)
+{
+  if (!counters[0])
+    counters[0] = __atomic_add_fetch (&function_counter, 1, MEMMODEL_RELAXED);
+}
+#endif
+
+
 #ifdef L_gcov_average_profiler
 /* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
    to saturate up.  */
@@ -296,6 +358,18 @@ __gcov_average_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_average_profiler_atomic
+/* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
+   to saturate up.  Function is thread-safe.  */
+
+void
+__gcov_average_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __atomic_fetch_add (&counters[0], value, MEMMODEL_RELAXED);
+  __atomic_fetch_add (&counters[1], 1, MEMMODEL_RELAXED);
+}
+#endif
+
 #ifdef L_gcov_ior_profiler
 /* Bitwise-OR VALUE into COUNTER.  */
 
@@ -306,4 +380,15 @@ __gcov_ior_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_ior_profiler_atomic
+/* Bitwise-OR VALUE into COUNTER.  Function is thread-safe.  */
+
+void
+__gcov_ior_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __atomic_fetch_or (&counters[0], value, MEMMODEL_RELAXED);
+}
+#endif
+
+
 #endif /* inhibit_libc */
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 80f13e2..82f620a 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -268,15 +268,25 @@ extern void __gcov_merge_icall_topn (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
 
 /* The profiler functions.  */
 extern void __gcov_interval_profiler (gcov_type *, gcov_type, int, unsigned);
+extern void __gcov_interval_profiler_atomic (gcov_type *, gcov_type, int,
+					     unsigned);
 extern void __gcov_pow2_profiler (gcov_type *, gcov_type);
+extern void __gcov_pow2_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_one_value_profiler (gcov_type *, gcov_type);
+extern void __gcov_one_value_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_profiler_v2 (gcov_type, void *);
 extern void __gcov_time_profiler (gcov_type *);
+extern void __gcov_time_profiler_atomic (gcov_type *);
 extern void __gcov_average_profiler (gcov_type *, gcov_type);
+extern void __gcov_average_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_ior_profiler (gcov_type *, gcov_type);
+extern void __gcov_ior_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_topn_profiler (gcov_type, void *);
 extern void gcov_sort_n_vals (gcov_type *, int);
 
+/* Counter for first visit of each function.  */
+extern gcov_type function_counter;
+
 #ifndef inhibit_libc
 /* The wrappers around some library functions..  */
 extern pid_t __gcov_fork (void) ATTRIBUTE_HIDDEN;
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic)
  2016-08-08 13:59                         ` [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic) Martin Liška
@ 2016-08-08 15:24                           ` Nathan Sidwell
  2016-08-08 16:51                             ` Martin Liška
  2016-08-08 16:56                             ` [PATCH] Fix POW2 histogram Martin Liška
  0 siblings, 2 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-08 15:24 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/08/16 09:59, Martin Liška wrote:
> Hello.
>
> This patch is follow-up of the series where I introduce a set of counter update
> function that are thread-safe. I originally thought that majority of profile corruptions are
> caused by non-atomic updated of CFG (-fprofile-arc). But there are multiple counters that compare
> it's count to a number of execution of a basic block:
>
> blake2s.cpp:150:40: error: corrupted value profile: value profile counter (11301120 out of 11314388) inconsistent with basic-block count (11555117)
>        memcpy( S->buf + left, in, fill ); // Fill buffer
>
> This can be seen for unrar binary: PR58306. I'm also adding a simple test-case which reveals the inconsistency: val-profiler-threads-1.c.
>
> I've been running regression tests, ready to install after it finishes?


+      fn_name = concat ("__gcov_interval_profiler", fn_suffix, NULL);
+      tree_interval_profiler_fn = build_fn_decl (fn_name,
+						 interval_profiler_fn_type);

I like this idiom, but doesn't 'concat' call for a following 'free'?

+__gcov_pow2_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  if (value & (value - 1))
+    __atomic_fetch_add (&counters[0], 1, MEMMODEL_RELAXED);

This seems to think '0' is  a power of 2.  (I suspect a bug in the existing 
code, not  something you've  introduced)

-__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
+__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
+				int use_atomic)
  {
    if (value == counters[0])

This function should be commented along the lines of the email discussion we had 
last week.  the 'atomic' param doesn't make it completely thread safe.

  /* Counter for first visit of each function.  */
-static gcov_type function_counter;
+gcov_type function_counter;

why is this no longer static?  If  it must be globally visible, it'll need a 
suitable rename.  (perhaps it's simpler to put the 2(?) fns that access it into 
a single object file?)

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic)
  2016-08-08 15:24                           ` Nathan Sidwell
@ 2016-08-08 16:51                             ` Martin Liška
  2016-08-08 17:03                               ` Martin Liška
  2016-08-08 16:56                             ` [PATCH] Fix POW2 histogram Martin Liška
  1 sibling, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-08 16:51 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 2335 bytes --]

On 08/08/2016 05:24 PM, Nathan Sidwell wrote:
> On 08/08/16 09:59, Martin Liška wrote:
>> Hello.
>>
>> This patch is follow-up of the series where I introduce a set of counter update
>> function that are thread-safe. I originally thought that majority of profile corruptions are
>> caused by non-atomic updated of CFG (-fprofile-arc). But there are multiple counters that compare
>> it's count to a number of execution of a basic block:
>>
>> blake2s.cpp:150:40: error: corrupted value profile: value profile counter (11301120 out of 11314388) inconsistent with basic-block count (11555117)
>>        memcpy( S->buf + left, in, fill ); // Fill buffer
>>
>> This can be seen for unrar binary: PR58306. I'm also adding a simple test-case which reveals the inconsistency: val-profiler-threads-1.c.
>>
>> I've been running regression tests, ready to install after it finishes?
> 
> 
> +      fn_name = concat ("__gcov_interval_profiler", fn_suffix, NULL);
> +      tree_interval_profiler_fn = build_fn_decl (fn_name,
> +                         interval_profiler_fn_type);
> 
> I like this idiom, but doesn't 'concat' call for a following 'free'?

Fixed in the second version of patch.


> 
> +__gcov_pow2_profiler_atomic (gcov_type *counters, gcov_type value)
> +{
> +  if (value & (value - 1))
> +    __atomic_fetch_add (&counters[0], 1, MEMMODEL_RELAXED);
> 
> This seems to think '0' is  a power of 2.  (I suspect a bug in the existing code, not  something you've  introduced)

I'll send separate email for that issue.

> 
> -__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
> +__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
> +                int use_atomic)
>  {
>    if (value == counters[0])
> 
> This function should be commented along the lines of the email discussion we had last week.  the 'atomic' param doesn't make it completely thread safe.

Done, with a link to this mailing list thread.

> 
>  /* Counter for first visit of each function.  */
> -static gcov_type function_counter;
> +gcov_type function_counter;
> 
> why is this no longer static?  If  it must be globally visible, it'll need a suitable rename.  (perhaps it's simpler to put the 2(?) fns that access it into a single object file?)

Yeah, I'm putting these 2 functions to the same object.

Martin

> 
> nathan


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-Add-new-_atomic-counter-update-function-fprofile-upd-v2.patch --]
[-- Type: text/x-patch; name="0005-Add-new-_atomic-counter-update-function-fprofile-upd-v2.patch", Size: 14811 bytes --]

From 5d446ca405f432ab9ab4aeef25cb458bd27896bd Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 8 Aug 2016 15:44:28 +0200
Subject: [PATCH 5/5] Add new *_atomic counter update function
 (-fprofile-update=atomic)

libgcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* Makefile.in: New functions (modules) are added.
	* libgcov-profiler.c (__gcov_interval_profiler_atomic): New
	function.
	(__gcov_pow2_profiler_atomic): New function.
	(__gcov_one_value_profiler_body): New argument is instroduced.
	(__gcov_one_value_profiler): Call with the new argument.
	(__gcov_one_value_profiler_atomic): Likewise.
	(__gcov_indirect_call_profiler_v2): Likewise.
	(__gcov_time_profiler_atomic): New function.
	(__gcov_average_profiler_atomic): Likewise.
	(__gcov_ior_profiler_atomic): Likewise.
	* libgcov.h: Declare the aforementioned functions.

gcc/testsuite/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* gcc.dg/tree-prof/val-profiler-threads-1.c: New test.

gcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* tree-profile.c (gimple_init_edge_profiler): Create conditionally
	atomic variants of profile update functions.
---
 .../gcc.dg/tree-prof/val-profiler-threads-1.c      |  41 ++++++++
 gcc/tree-profile.c                                 |  42 +++++----
 libgcc/Makefile.in                                 |  14 ++-
 libgcc/libgcov-profiler.c                          | 103 ++++++++++++++++++++-
 libgcc/libgcov.h                                   |   7 ++
 5 files changed, 182 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
new file mode 100644
index 0000000..0f7477e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
@@ -0,0 +1,41 @@
+/* { dg-options "-O0 -pthread -fprofile-update=atomic" } */
+#include <pthread.h>
+
+#define NUM_THREADS	8
+#define SIZE 1024
+#define ITERATIONS (1000 * 1000)
+
+char buffer[SIZE];
+char buffer2[SIZE];
+
+void *copy_memory(char *dst, char *src, unsigned size)
+{
+   for (unsigned i = 0; i < ITERATIONS; i++)
+   {
+     dst[size % 10] = src[size % 20];
+   }
+}
+
+void *foo(void *d)
+{
+  copy_memory (buffer, buffer2, SIZE);
+}
+
+int main(int argc, char *argv[])
+{
+   pthread_t threads[NUM_THREADS];
+   int rc;
+   long t;
+   for(t=0;t<NUM_THREADS;t++){
+     rc = pthread_create(&threads[t], NULL, foo, 0);
+     if (rc){
+	 return 1;
+       }
+     }
+
+   int retval;
+   for(t=0;t<NUM_THREADS;t++)
+     pthread_join (threads[t], (void**)&retval);
+
+   return buffer[10];
+}
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 740f7ab..fdf0201 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -128,9 +128,13 @@ gimple_init_edge_profiler (void)
   tree average_profiler_fn_type;
   tree time_profiler_fn_type;
   const char *profiler_fn_name;
+  const char *fn_name;
 
   if (!gcov_type_node)
     {
+      const char *fn_suffix
+	= flag_profile_update == PROFILE_UPDATE_ATOMIC ? "_atomic" : "";
+
       gcov_type_node = get_gcov_type ();
       gcov_type_ptr = build_pointer_type (gcov_type_node);
 
@@ -140,9 +144,10 @@ gimple_init_edge_profiler (void)
 					  gcov_type_ptr, gcov_type_node,
 					  integer_type_node,
 					  unsigned_type_node, NULL_TREE);
-      tree_interval_profiler_fn
-	      = build_fn_decl ("__gcov_interval_profiler",
-				     interval_profiler_fn_type);
+      fn_name = concat ("__gcov_interval_profiler", fn_suffix, NULL);
+      tree_interval_profiler_fn = build_fn_decl (fn_name,
+						 interval_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_interval_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_interval_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -153,8 +158,9 @@ gimple_init_edge_profiler (void)
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      tree_pow2_profiler_fn = build_fn_decl ("__gcov_pow2_profiler",
-						   pow2_profiler_fn_type);
+      fn_name = concat ("__gcov_pow2_profiler", fn_suffix, NULL);
+      tree_pow2_profiler_fn = build_fn_decl (fn_name, pow2_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_pow2_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_pow2_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -165,9 +171,10 @@ gimple_init_edge_profiler (void)
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      tree_one_value_profiler_fn
-	      = build_fn_decl ("__gcov_one_value_profiler",
-				     one_value_profiler_fn_type);
+      fn_name = concat ("__gcov_one_value_profiler", fn_suffix, NULL);
+      tree_one_value_profiler_fn = build_fn_decl (fn_name,
+						  one_value_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_one_value_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_one_value_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -197,9 +204,9 @@ gimple_init_edge_profiler (void)
       time_profiler_fn_type
 	       = build_function_type_list (void_type_node,
 					  gcov_type_ptr, NULL_TREE);
-      tree_time_profiler_fn
-	      = build_fn_decl ("__gcov_time_profiler",
-				     time_profiler_fn_type);
+      fn_name = concat ("__gcov_time_profiler", fn_suffix, NULL);
+      tree_time_profiler_fn = build_fn_decl (fn_name, time_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_time_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_time_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -209,16 +216,17 @@ gimple_init_edge_profiler (void)
       average_profiler_fn_type
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node, NULL_TREE);
-      tree_average_profiler_fn
-	      = build_fn_decl ("__gcov_average_profiler",
-				     average_profiler_fn_type);
+      fn_name = concat ("__gcov_average_profiler", fn_suffix, NULL);
+      tree_average_profiler_fn = build_fn_decl (fn_name,
+						average_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_average_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_average_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
 		     DECL_ATTRIBUTES (tree_average_profiler_fn));
-      tree_ior_profiler_fn
-	      = build_fn_decl ("__gcov_ior_profiler",
-				     average_profiler_fn_type);
+      fn_name = concat ("__gcov_ior_profiler", fn_suffix, NULL);
+      tree_ior_profiler_fn = build_fn_decl (fn_name, average_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_ior_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_ior_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index efaf7f7..81675b6 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -858,10 +858,18 @@ include $(iterator)
 
 LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single _gcov_merge_delta	\
 	_gcov_merge_ior _gcov_merge_time_profile _gcov_merge_icall_topn
-LIBGCOV_PROFILER = _gcov_interval_profiler _gcov_pow2_profiler		\
+LIBGCOV_PROFILER = _gcov_interval_profiler				\
+	_gcov_interval_profiler_atomic					\
+	_gcov_pow2_profiler						\
+	_gcov_pow2_profiler_atomic					\
 	_gcov_one_value_profiler					\
- 	_gcov_average_profiler _gcov_ior_profiler			\
-	_gcov_indirect_call_profiler_v2 _gcov_time_profiler		\
+	_gcov_average_profiler						\
+	_gcov_average_profiler_atomic					\
+	_gcov_ior_profiler						\
+	_gcov_ior_profiler_atomic					\
+	_gcov_indirect_call_profiler_v2					\
+	_gcov_time_profiler						\
+	_gcov_time_profiler_atomic					\
 	_gcov_indirect_call_topn_profiler
 LIBGCOV_INTERFACE = _gcov_dump _gcov_flush _gcov_fork			\
 	_gcov_execl _gcov_execlp					\
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index a99d93b..70a821d 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -46,6 +46,26 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 }
 #endif
 
+#ifdef L_gcov_interval_profiler_atomic
+/* If VALUE is in interval <START, START + STEPS - 1>, then increases the
+   corresponding counter in COUNTERS.  If the VALUE is above or below
+   the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased
+   instead.  Function is thread-safe.  */
+
+void
+__gcov_interval_profiler_atomic (gcov_type *counters, gcov_type value,
+				 int start, unsigned steps)
+{
+  gcov_type delta = value - start;
+  if (delta < 0)
+    __atomic_fetch_add (&counters[steps + 1], 1, MEMMODEL_RELAXED);
+  else if (delta >= steps)
+    __atomic_fetch_add (&counters[steps], 1, MEMMODEL_RELAXED);
+  else
+    __atomic_fetch_add (&counters[delta], 1, MEMMODEL_RELAXED);
+}
+#endif
+
 #ifdef L_gcov_pow2_profiler
 /* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
    COUNTERS[0] is incremented.  */
@@ -60,6 +80,21 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_pow2_profiler_atomic
+/* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
+   COUNTERS[0] is incremented.  Function is thread-safe.  */
+
+void
+__gcov_pow2_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  if (value == 0 || (value & (value - 1)))
+    __atomic_fetch_add (&counters[0], 1, MEMMODEL_RELAXED);
+  else
+    __atomic_fetch_add (&counters[1], 1, MEMMODEL_RELAXED);
+}
+#endif
+
+
 /* Tries to determine the most common value among its inputs.  Checks if the
    value stored in COUNTERS[0] matches VALUE.  If this is the case, COUNTERS[1]
    is incremented.  If this is not the case and COUNTERS[1] is not zero,
@@ -68,10 +103,12 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
    function is called more than 50% of the time with one value, this value
    will be in COUNTERS[0] in the end.
 
-   In any case, COUNTERS[2] is incremented.  */
+   In any case, COUNTERS[2] is incremented.  If USE_ATOMIC is set to 1,
+   COUNTERS[2] is updated with an atomic instruction.  */
 
 static inline void
-__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
+__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
+				int use_atomic)
 {
   if (value == counters[0])
     counters[1]++;
@@ -82,14 +119,36 @@ __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
     }
   else
     counters[1]--;
-  counters[2]++;
+
+  if (use_atomic)
+    __atomic_fetch_add (&counters[2], 1, MEMMODEL_RELAXED);
+  else
+    counters[2]++;
 }
 
 #ifdef L_gcov_one_value_profiler
 void
 __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 {
-  __gcov_one_value_profiler_body (counters, value);
+  __gcov_one_value_profiler_body (counters, value, 0);
+}
+#endif
+
+#ifdef L_gcov_one_value_profiler_atomic
+
+/* Update one value profilers (COUNTERS) for a given VALUE.
+
+   CAVEAT: Following function is not thread-safe, only total number
+   of executions (COUNTERS[2]) is update with an atomic instruction.
+   Problem is that one cannot atomically update two counters
+   (COUNTERS[0] and COUNTERS[1]), for more information please read
+   following email thread:
+   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00024.html.  */
+
+void
+__gcov_one_value_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __gcov_one_value_profiler_body (counters, value, 1);
 }
 #endif
 
@@ -265,7 +324,7 @@ __gcov_indirect_call_profiler_v2 (gcov_type value, void* cur_func)
   if (cur_func == __gcov_indirect_call_callee
       || (__LIBGCC_VTABLE_USES_DESCRIPTORS__ && __gcov_indirect_call_callee
           && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
-    __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value);
+    __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value, 0);
 }
 #endif
 
@@ -282,8 +341,19 @@ __gcov_time_profiler (gcov_type* counters)
   if (!counters[0])
     counters[0] = ++function_counter;
 }
+
+/* Sets corresponding COUNTERS if there is no value.
+   Function is thread-safe.  */
+
+void
+__gcov_time_profiler_atomic (gcov_type* counters)
+{
+  if (!counters[0])
+    counters[0] = __atomic_add_fetch (&function_counter, 1, MEMMODEL_RELAXED);
+}
 #endif
 
+
 #ifdef L_gcov_average_profiler
 /* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
    to saturate up.  */
@@ -296,6 +366,18 @@ __gcov_average_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_average_profiler_atomic
+/* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
+   to saturate up.  Function is thread-safe.  */
+
+void
+__gcov_average_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __atomic_fetch_add (&counters[0], value, MEMMODEL_RELAXED);
+  __atomic_fetch_add (&counters[1], 1, MEMMODEL_RELAXED);
+}
+#endif
+
 #ifdef L_gcov_ior_profiler
 /* Bitwise-OR VALUE into COUNTER.  */
 
@@ -306,4 +388,15 @@ __gcov_ior_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_ior_profiler_atomic
+/* Bitwise-OR VALUE into COUNTER.  Function is thread-safe.  */
+
+void
+__gcov_ior_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __atomic_fetch_or (&counters[0], value, MEMMODEL_RELAXED);
+}
+#endif
+
+
 #endif /* inhibit_libc */
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 80f13e2..25147de 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -268,12 +268,19 @@ extern void __gcov_merge_icall_topn (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
 
 /* The profiler functions.  */
 extern void __gcov_interval_profiler (gcov_type *, gcov_type, int, unsigned);
+extern void __gcov_interval_profiler_atomic (gcov_type *, gcov_type, int,
+					     unsigned);
 extern void __gcov_pow2_profiler (gcov_type *, gcov_type);
+extern void __gcov_pow2_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_one_value_profiler (gcov_type *, gcov_type);
+extern void __gcov_one_value_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_profiler_v2 (gcov_type, void *);
 extern void __gcov_time_profiler (gcov_type *);
+extern void __gcov_time_profiler_atomic (gcov_type *);
 extern void __gcov_average_profiler (gcov_type *, gcov_type);
+extern void __gcov_average_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_ior_profiler (gcov_type *, gcov_type);
+extern void __gcov_ior_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_topn_profiler (gcov_type, void *);
 extern void gcov_sort_n_vals (gcov_type *, int);
 
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH] Fix POW2 histogram
  2016-08-08 15:24                           ` Nathan Sidwell
  2016-08-08 16:51                             ` Martin Liška
@ 2016-08-08 16:56                             ` Martin Liška
  2016-08-09  8:41                               ` [PATCH 2/N] Fix usage of " Martin Liška
  2016-08-09 12:34                               ` [PATCH] Fix " Nathan Sidwell
  1 sibling, 2 replies; 95+ messages in thread
From: Martin Liška @ 2016-08-08 16:56 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

Hello.

Currently, we utilize pow2 profile histogram to track gimple STMTs like this: ssa_name_x % value.

void
__gcov_pow2_profiler (gcov_type *counters, gcov_type value)
{
  if (value & (value - 1))
    counters[0]++;
  else
    counters[1]++;
}

Although __gcov_pow2_profiler function wrongly handles 0 (which is not power of two), it's impossible
to write a test-case which would not expose a division by zero. As one can potentially use the same profiler
for a different purpose, I would like to fix it. Apart from that, we've got a small bug in a dump function
of the POW2 histograms.

Survives make check -k -j10 RUNTESTFLAGS="tree-prof.exp"
Ready for trunk?

Thanks,
Martin

[-- Attachment #2: 0001-Fix-POW2-histogram.patch --]
[-- Type: text/x-patch, Size: 2460 bytes --]

From 5c1173eec8bb6a16d2a570e845075c6e868a26f9 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 8 Aug 2016 18:44:19 +0200
Subject: [PATCH] Fix POW2 histogram

gcc/testsuite/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	* gcc.dg/tree-prof/val-prof-8.c: New test.

gcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	* value-prof.c (dump_histogram_value): Swap pow2 and non-pow2
	values.

libgcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	* libgcov-profiler.c (__gcov_pow2_profiler): Consider 0 as not
	power of two.
---
 gcc/testsuite/gcc.dg/tree-prof/val-prof-8.c | 19 +++++++++++++++++++
 gcc/value-prof.c                            |  4 ++--
 libgcc/libgcov-profiler.c                   |  2 +-
 3 files changed, 22 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/val-prof-8.c

diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-8.c b/gcc/testsuite/gcc.dg/tree-prof/val-prof-8.c
new file mode 100644
index 0000000..2c505e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-prof-8.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O0 -fdump-ipa-profile" } */
+
+int
+main (int argc, char **argv)
+{
+  unsigned u = (argc - 1);
+  int counter = 0;
+
+  for (unsigned i = 0; i < 100; i++)
+  {
+    unsigned x = i < 10 ? 16 : 15;
+    counter += u % x;
+  }
+
+  return counter;
+}
+
+/* autofdo does not do value profiling so far */
+/* { dg-final-use-not-autofdo { scan-ipa-dump "Pow2 counter pow2:10 nonpow2:90." "profile" } } */
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 2976a86..0527c2c 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -264,8 +264,8 @@ dump_histogram_value (FILE *dump_file, histogram_value hist)
 	{
 	   fprintf (dump_file, "pow2:%" PRId64
 		    " nonpow2:%" PRId64,
-		    (int64_t) hist->hvalue.counters[0],
-		    (int64_t) hist->hvalue.counters[1]);
+		    (int64_t) hist->hvalue.counters[1],
+		    (int64_t) hist->hvalue.counters[0]);
 	}
       fprintf (dump_file, ".\n");
       break;
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index e947188..6da8a94 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -53,7 +53,7 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 void
 __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 {
-  if (value & (value - 1))
+  if (value == 0 || (value & (value - 1)))
     counters[0]++;
   else
     counters[1]++;
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic)
  2016-08-08 16:51                             ` Martin Liška
@ 2016-08-08 17:03                               ` Martin Liška
  2016-08-09 12:36                                 ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-08 17:03 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

On 08/08/2016 06:50 PM, Martin Liška wrote:
> On 08/08/2016 05:24 PM, Nathan Sidwell wrote:
>> On 08/08/16 09:59, Martin Liška wrote:
>>> Hello.
>>>
>>> This patch is follow-up of the series where I introduce a set of counter update
>>> function that are thread-safe. I originally thought that majority of profile corruptions are
>>> caused by non-atomic updated of CFG (-fprofile-arc). But there are multiple counters that compare
>>> it's count to a number of execution of a basic block:
>>>
>>> blake2s.cpp:150:40: error: corrupted value profile: value profile counter (11301120 out of 11314388) inconsistent with basic-block count (11555117)
>>>        memcpy( S->buf + left, in, fill ); // Fill buffer
>>>
>>> This can be seen for unrar binary: PR58306. I'm also adding a simple test-case which reveals the inconsistency: val-profiler-threads-1.c.
>>>
>>> I've been running regression tests, ready to install after it finishes?
>>
>>
>> +      fn_name = concat ("__gcov_interval_profiler", fn_suffix, NULL);
>> +      tree_interval_profiler_fn = build_fn_decl (fn_name,
>> +                         interval_profiler_fn_type);
>>
>> I like this idiom, but doesn't 'concat' call for a following 'free'?
> 
> Fixed in the second version of patch.
> 
> 
>>
>> +__gcov_pow2_profiler_atomic (gcov_type *counters, gcov_type value)
>> +{
>> +  if (value & (value - 1))
>> +    __atomic_fetch_add (&counters[0], 1, MEMMODEL_RELAXED);
>>
>> This seems to think '0' is  a power of 2.  (I suspect a bug in the existing code, not  something you've  introduced)
> 
> I'll send separate email for that issue.
> 
>>
>> -__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
>> +__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
>> +                int use_atomic)
>>  {
>>    if (value == counters[0])
>>
>> This function should be commented along the lines of the email discussion we had last week.  the 'atomic' param doesn't make it completely thread safe.
> 
> Done, with a link to this mailing list thread.
> 
>>
>>  /* Counter for first visit of each function.  */
>> -static gcov_type function_counter;
>> +gcov_type function_counter;
>>
>> why is this no longer static?  If  it must be globally visible, it'll need a suitable rename.  (perhaps it's simpler to put the 2(?) fns that access it into a single object file?)
> 
> Yeah, I'm putting these 2 functions to the same object.
> 
> Martin
> 
>>
>> nathan
> 

v3: fixed wrong defines in libgcc/Makefine.in

Martin

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-Add-new-_atomic-counter-update-function-fprofile-upd-v3.patch --]
[-- Type: text/x-patch; name="0005-Add-new-_atomic-counter-update-function-fprofile-upd-v3.patch", Size: 14816 bytes --]

From 7f442afa98e037ee9839fad1ced8a0055842b4e6 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 8 Aug 2016 15:44:28 +0200
Subject: [PATCH 5/5] Add new *_atomic counter update function
 (-fprofile-update=atomic)

libgcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* Makefile.in: New functions (modules) are added.
	* libgcov-profiler.c (__gcov_interval_profiler_atomic): New
	function.
	(__gcov_pow2_profiler_atomic): New function.
	(__gcov_one_value_profiler_body): New argument is instroduced.
	(__gcov_one_value_profiler): Call with the new argument.
	(__gcov_one_value_profiler_atomic): Likewise.
	(__gcov_indirect_call_profiler_v2): Likewise.
	(__gcov_time_profiler_atomic): New function.
	(__gcov_average_profiler_atomic): Likewise.
	(__gcov_ior_profiler_atomic): Likewise.
	* libgcov.h: Declare the aforementioned functions.

gcc/testsuite/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* gcc.dg/tree-prof/val-profiler-threads-1.c: New test.

gcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	PR gcov-profile/58306
	* tree-profile.c (gimple_init_edge_profiler): Create conditionally
	atomic variants of profile update functions.
---
 .../gcc.dg/tree-prof/val-profiler-threads-1.c      |  41 ++++++++
 gcc/tree-profile.c                                 |  42 +++++----
 libgcc/Makefile.in                                 |  15 ++-
 libgcc/libgcov-profiler.c                          | 103 ++++++++++++++++++++-
 libgcc/libgcov.h                                   |   7 ++
 5 files changed, 182 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c

diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
new file mode 100644
index 0000000..0f7477e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
@@ -0,0 +1,41 @@
+/* { dg-options "-O0 -pthread -fprofile-update=atomic" } */
+#include <pthread.h>
+
+#define NUM_THREADS	8
+#define SIZE 1024
+#define ITERATIONS (1000 * 1000)
+
+char buffer[SIZE];
+char buffer2[SIZE];
+
+void *copy_memory(char *dst, char *src, unsigned size)
+{
+   for (unsigned i = 0; i < ITERATIONS; i++)
+   {
+     dst[size % 10] = src[size % 20];
+   }
+}
+
+void *foo(void *d)
+{
+  copy_memory (buffer, buffer2, SIZE);
+}
+
+int main(int argc, char *argv[])
+{
+   pthread_t threads[NUM_THREADS];
+   int rc;
+   long t;
+   for(t=0;t<NUM_THREADS;t++){
+     rc = pthread_create(&threads[t], NULL, foo, 0);
+     if (rc){
+	 return 1;
+       }
+     }
+
+   int retval;
+   for(t=0;t<NUM_THREADS;t++)
+     pthread_join (threads[t], (void**)&retval);
+
+   return buffer[10];
+}
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 740f7ab..fdf0201 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -128,9 +128,13 @@ gimple_init_edge_profiler (void)
   tree average_profiler_fn_type;
   tree time_profiler_fn_type;
   const char *profiler_fn_name;
+  const char *fn_name;
 
   if (!gcov_type_node)
     {
+      const char *fn_suffix
+	= flag_profile_update == PROFILE_UPDATE_ATOMIC ? "_atomic" : "";
+
       gcov_type_node = get_gcov_type ();
       gcov_type_ptr = build_pointer_type (gcov_type_node);
 
@@ -140,9 +144,10 @@ gimple_init_edge_profiler (void)
 					  gcov_type_ptr, gcov_type_node,
 					  integer_type_node,
 					  unsigned_type_node, NULL_TREE);
-      tree_interval_profiler_fn
-	      = build_fn_decl ("__gcov_interval_profiler",
-				     interval_profiler_fn_type);
+      fn_name = concat ("__gcov_interval_profiler", fn_suffix, NULL);
+      tree_interval_profiler_fn = build_fn_decl (fn_name,
+						 interval_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_interval_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_interval_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -153,8 +158,9 @@ gimple_init_edge_profiler (void)
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      tree_pow2_profiler_fn = build_fn_decl ("__gcov_pow2_profiler",
-						   pow2_profiler_fn_type);
+      fn_name = concat ("__gcov_pow2_profiler", fn_suffix, NULL);
+      tree_pow2_profiler_fn = build_fn_decl (fn_name, pow2_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_pow2_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_pow2_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -165,9 +171,10 @@ gimple_init_edge_profiler (void)
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node,
 					  NULL_TREE);
-      tree_one_value_profiler_fn
-	      = build_fn_decl ("__gcov_one_value_profiler",
-				     one_value_profiler_fn_type);
+      fn_name = concat ("__gcov_one_value_profiler", fn_suffix, NULL);
+      tree_one_value_profiler_fn = build_fn_decl (fn_name,
+						  one_value_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_one_value_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_one_value_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -197,9 +204,9 @@ gimple_init_edge_profiler (void)
       time_profiler_fn_type
 	       = build_function_type_list (void_type_node,
 					  gcov_type_ptr, NULL_TREE);
-      tree_time_profiler_fn
-	      = build_fn_decl ("__gcov_time_profiler",
-				     time_profiler_fn_type);
+      fn_name = concat ("__gcov_time_profiler", fn_suffix, NULL);
+      tree_time_profiler_fn = build_fn_decl (fn_name, time_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_time_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_time_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
@@ -209,16 +216,17 @@ gimple_init_edge_profiler (void)
       average_profiler_fn_type
 	      = build_function_type_list (void_type_node,
 					  gcov_type_ptr, gcov_type_node, NULL_TREE);
-      tree_average_profiler_fn
-	      = build_fn_decl ("__gcov_average_profiler",
-				     average_profiler_fn_type);
+      fn_name = concat ("__gcov_average_profiler", fn_suffix, NULL);
+      tree_average_profiler_fn = build_fn_decl (fn_name,
+						average_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_average_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_average_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
 		     DECL_ATTRIBUTES (tree_average_profiler_fn));
-      tree_ior_profiler_fn
-	      = build_fn_decl ("__gcov_ior_profiler",
-				     average_profiler_fn_type);
+      fn_name = concat ("__gcov_ior_profiler", fn_suffix, NULL);
+      tree_ior_profiler_fn = build_fn_decl (fn_name, average_profiler_fn_type);
+      free (CONST_CAST (char *, fn_name));
       TREE_NOTHROW (tree_ior_profiler_fn) = 1;
       DECL_ATTRIBUTES (tree_ior_profiler_fn)
 	= tree_cons (get_identifier ("leaf"), NULL,
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index efaf7f7..b478056 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -858,10 +858,17 @@ include $(iterator)
 
 LIBGCOV_MERGE = _gcov_merge_add _gcov_merge_single _gcov_merge_delta	\
 	_gcov_merge_ior _gcov_merge_time_profile _gcov_merge_icall_topn
-LIBGCOV_PROFILER = _gcov_interval_profiler _gcov_pow2_profiler		\
-	_gcov_one_value_profiler					\
- 	_gcov_average_profiler _gcov_ior_profiler			\
-	_gcov_indirect_call_profiler_v2 _gcov_time_profiler		\
+LIBGCOV_PROFILER = _gcov_interval_profiler				\
+	_gcov_interval_profiler_atomic					\
+	_gcov_pow2_profiler						\
+	_gcov_pow2_profiler_atomic					\
+	_gcov_one_value_profiler_atomic					\
+	_gcov_average_profiler						\
+	_gcov_average_profiler_atomic					\
+	_gcov_ior_profiler						\
+	_gcov_ior_profiler_atomic					\
+	_gcov_indirect_call_profiler_v2					\
+	_gcov_time_profiler						\
 	_gcov_indirect_call_topn_profiler
 LIBGCOV_INTERFACE = _gcov_dump _gcov_flush _gcov_fork			\
 	_gcov_execl _gcov_execlp					\
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index a99d93b..70a821d 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -46,6 +46,26 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 }
 #endif
 
+#ifdef L_gcov_interval_profiler_atomic
+/* If VALUE is in interval <START, START + STEPS - 1>, then increases the
+   corresponding counter in COUNTERS.  If the VALUE is above or below
+   the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased
+   instead.  Function is thread-safe.  */
+
+void
+__gcov_interval_profiler_atomic (gcov_type *counters, gcov_type value,
+				 int start, unsigned steps)
+{
+  gcov_type delta = value - start;
+  if (delta < 0)
+    __atomic_fetch_add (&counters[steps + 1], 1, MEMMODEL_RELAXED);
+  else if (delta >= steps)
+    __atomic_fetch_add (&counters[steps], 1, MEMMODEL_RELAXED);
+  else
+    __atomic_fetch_add (&counters[delta], 1, MEMMODEL_RELAXED);
+}
+#endif
+
 #ifdef L_gcov_pow2_profiler
 /* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
    COUNTERS[0] is incremented.  */
@@ -60,6 +80,21 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_pow2_profiler_atomic
+/* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
+   COUNTERS[0] is incremented.  Function is thread-safe.  */
+
+void
+__gcov_pow2_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  if (value == 0 || (value & (value - 1)))
+    __atomic_fetch_add (&counters[0], 1, MEMMODEL_RELAXED);
+  else
+    __atomic_fetch_add (&counters[1], 1, MEMMODEL_RELAXED);
+}
+#endif
+
+
 /* Tries to determine the most common value among its inputs.  Checks if the
    value stored in COUNTERS[0] matches VALUE.  If this is the case, COUNTERS[1]
    is incremented.  If this is not the case and COUNTERS[1] is not zero,
@@ -68,10 +103,12 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
    function is called more than 50% of the time with one value, this value
    will be in COUNTERS[0] in the end.
 
-   In any case, COUNTERS[2] is incremented.  */
+   In any case, COUNTERS[2] is incremented.  If USE_ATOMIC is set to 1,
+   COUNTERS[2] is updated with an atomic instruction.  */
 
 static inline void
-__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
+__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value,
+				int use_atomic)
 {
   if (value == counters[0])
     counters[1]++;
@@ -82,14 +119,36 @@ __gcov_one_value_profiler_body (gcov_type *counters, gcov_type value)
     }
   else
     counters[1]--;
-  counters[2]++;
+
+  if (use_atomic)
+    __atomic_fetch_add (&counters[2], 1, MEMMODEL_RELAXED);
+  else
+    counters[2]++;
 }
 
 #ifdef L_gcov_one_value_profiler
 void
 __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 {
-  __gcov_one_value_profiler_body (counters, value);
+  __gcov_one_value_profiler_body (counters, value, 0);
+}
+#endif
+
+#ifdef L_gcov_one_value_profiler_atomic
+
+/* Update one value profilers (COUNTERS) for a given VALUE.
+
+   CAVEAT: Following function is not thread-safe, only total number
+   of executions (COUNTERS[2]) is update with an atomic instruction.
+   Problem is that one cannot atomically update two counters
+   (COUNTERS[0] and COUNTERS[1]), for more information please read
+   following email thread:
+   https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00024.html.  */
+
+void
+__gcov_one_value_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __gcov_one_value_profiler_body (counters, value, 1);
 }
 #endif
 
@@ -265,7 +324,7 @@ __gcov_indirect_call_profiler_v2 (gcov_type value, void* cur_func)
   if (cur_func == __gcov_indirect_call_callee
       || (__LIBGCC_VTABLE_USES_DESCRIPTORS__ && __gcov_indirect_call_callee
           && *(void **) cur_func == *(void **) __gcov_indirect_call_callee))
-    __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value);
+    __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value, 0);
 }
 #endif
 
@@ -282,8 +341,19 @@ __gcov_time_profiler (gcov_type* counters)
   if (!counters[0])
     counters[0] = ++function_counter;
 }
+
+/* Sets corresponding COUNTERS if there is no value.
+   Function is thread-safe.  */
+
+void
+__gcov_time_profiler_atomic (gcov_type* counters)
+{
+  if (!counters[0])
+    counters[0] = __atomic_add_fetch (&function_counter, 1, MEMMODEL_RELAXED);
+}
 #endif
 
+
 #ifdef L_gcov_average_profiler
 /* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
    to saturate up.  */
@@ -296,6 +366,18 @@ __gcov_average_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_average_profiler_atomic
+/* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
+   to saturate up.  Function is thread-safe.  */
+
+void
+__gcov_average_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __atomic_fetch_add (&counters[0], value, MEMMODEL_RELAXED);
+  __atomic_fetch_add (&counters[1], 1, MEMMODEL_RELAXED);
+}
+#endif
+
 #ifdef L_gcov_ior_profiler
 /* Bitwise-OR VALUE into COUNTER.  */
 
@@ -306,4 +388,15 @@ __gcov_ior_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
+#ifdef L_gcov_ior_profiler_atomic
+/* Bitwise-OR VALUE into COUNTER.  Function is thread-safe.  */
+
+void
+__gcov_ior_profiler_atomic (gcov_type *counters, gcov_type value)
+{
+  __atomic_fetch_or (&counters[0], value, MEMMODEL_RELAXED);
+}
+#endif
+
+
 #endif /* inhibit_libc */
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 80f13e2..25147de 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -268,12 +268,19 @@ extern void __gcov_merge_icall_topn (gcov_type *, unsigned) ATTRIBUTE_HIDDEN;
 
 /* The profiler functions.  */
 extern void __gcov_interval_profiler (gcov_type *, gcov_type, int, unsigned);
+extern void __gcov_interval_profiler_atomic (gcov_type *, gcov_type, int,
+					     unsigned);
 extern void __gcov_pow2_profiler (gcov_type *, gcov_type);
+extern void __gcov_pow2_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_one_value_profiler (gcov_type *, gcov_type);
+extern void __gcov_one_value_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_profiler_v2 (gcov_type, void *);
 extern void __gcov_time_profiler (gcov_type *);
+extern void __gcov_time_profiler_atomic (gcov_type *);
 extern void __gcov_average_profiler (gcov_type *, gcov_type);
+extern void __gcov_average_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_ior_profiler (gcov_type *, gcov_type);
+extern void __gcov_ior_profiler_atomic (gcov_type *, gcov_type);
 extern void __gcov_indirect_call_topn_profiler (gcov_type, void *);
 extern void gcov_sort_n_vals (gcov_type *, int);
 
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH 2/N] Fix usage of POW2 histogram
  2016-08-08 16:56                             ` [PATCH] Fix POW2 histogram Martin Liška
@ 2016-08-09  8:41                               ` Martin Liška
  2016-08-09 12:37                                 ` Nathan Sidwell
  2016-08-09 12:34                               ` [PATCH] Fix " Nathan Sidwell
  1 sibling, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-09  8:41 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 211 bytes --]

Another small follow-up changes instrumentation of MOD exprs
to not instrument divisors different from SSA_NAME.

Patch survives:
make check -k -j10 RUNTESTFLAGS="tree-prof.exp"

Ready for trunk?
Thanks,
Martin

[-- Attachment #2: 0002-Fix-usage-of-POW2-histogram.patch --]
[-- Type: text/x-patch, Size: 1924 bytes --]

From 00aecc0dd74c4546a1882bdbbb0f66fbf39a5408 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 8 Aug 2016 19:18:11 +0200
Subject: [PATCH 2/2] Fix usage of POW2 histogram

gcc/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	* value-prof.c (gimple_divmod_values_to_profile): Do not
	instrument MOD histogram if a value is not a SSA name.

gcc/testsuite/ChangeLog:

2016-08-08  Martin Liska  <mliska@suse.cz>

	* gcc.dg/tree-prof/val-prof-9.c: New test.
---
 gcc/testsuite/gcc.dg/tree-prof/val-prof-9.c | 18 ++++++++++++++++++
 gcc/value-prof.c                            |  3 ++-
 2 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/val-prof-9.c

diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-prof-9.c b/gcc/testsuite/gcc.dg/tree-prof/val-prof-9.c
new file mode 100644
index 0000000..8fc2301
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-prof-9.c
@@ -0,0 +1,18 @@
+/* { dg-options "-O0 -fdump-tree-optimized" } */
+
+int
+main (int argc, char **argv)
+{
+  unsigned u = (argc - 1);
+  int counter = 0;
+
+  for (unsigned i = 0; i < 100; i++)
+  {
+    counter += u % 16;
+  }
+
+  return counter;
+}
+
+/* autofdo does not do value profiling so far */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "__gcov_pow2_profiler" 0 "optimized" } } */
diff --git a/gcc/value-prof.c b/gcc/value-prof.c
index 0527c2c..a4653aa 100644
--- a/gcc/value-prof.c
+++ b/gcc/value-prof.c
@@ -1950,7 +1950,8 @@ gimple_divmod_values_to_profile (gimple *stmt, histogram_values *values)
       /* For mod, check whether it is not often a noop (or replaceable by
 	 a few subtractions).  */
       if (gimple_assign_rhs_code (stmt) == TRUNC_MOD_EXPR
-	  && TYPE_UNSIGNED (type))
+	  && TYPE_UNSIGNED (type)
+	  && TREE_CODE (divisor) == SSA_NAME)
 	{
           tree val;
           /* Check for a special case where the divisor is power of 2.  */
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-05 13:43                       ` Martin Liška
  2016-08-08 13:59                         ` [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic) Martin Liška
@ 2016-08-09 11:24                         ` Martin Liška
  2016-08-09 12:40                           ` Nathan Sidwell
  2016-08-09 19:04                           ` Andi Kleen
  2016-08-10 12:57                         ` [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch Nathan Sidwell
                                           ` (2 subsequent siblings)
  4 siblings, 2 replies; 95+ messages in thread
From: Martin Liška @ 2016-08-09 11:24 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh

[-- Attachment #1: Type: text/plain, Size: 206 bytes --]

Hi.

As mention in [1], enabling -fprofile-update=atomic when -pthread is logical
thing and is quite expected default behavior.

Ready for trunk?

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58306#c21

[-- Attachment #2: 0001-Set-fprofile-update-atomic-when-pthread-is-present.patch --]
[-- Type: text/x-patch, Size: 884 bytes --]

From 7135fcf5f7b8f2f3c55a6e80048660c1beea5052 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 9 Aug 2016 13:19:26 +0200
Subject: [PATCH] Set -fprofile-update=atomic when -pthread is present

gcc/ChangeLog:

2016-08-09  Martin Liska  <mliska@suse.cz>

	* gcc.c: Add -fprofile-update=atomic when
	-pthread is present.
---
 gcc/gcc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 7460f6a..2f42619 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1141,7 +1141,8 @@ static const char *cc1_options =
  %{-help=*:--help=%*}\
  %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}}\
  %{fsyntax-only:-o %j} %{-param*}\
- %{coverage:-fprofile-arcs -ftest-coverage}";
+ %{coverage:-fprofile-arcs -ftest-coverage}\
+ %{pthread:-fprofile-update=atomic}";
 
 static const char *asm_options =
 "%{-target-help:%:print-asm-header()} "
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Fix POW2 histogram
  2016-08-08 16:56                             ` [PATCH] Fix POW2 histogram Martin Liška
  2016-08-09  8:41                               ` [PATCH 2/N] Fix usage of " Martin Liška
@ 2016-08-09 12:34                               ` Nathan Sidwell
  1 sibling, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-09 12:34 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/08/16 12:56, Martin Liška wrote:
> Hello.
>
> Currently, we utilize pow2 profile histogram to track gimple STMTs like this: ssa_name_x % value.
>
> void
> __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
> {
>   if (value & (value - 1))
>     counters[0]++;
>   else
>     counters[1]++;
> }
>
> Although __gcov_pow2_profiler function wrongly handles 0 (which is not power of two), it's impossible
> to write a test-case which would not expose a division by zero. As one can potentially use the same profiler
> for a different purpose, I would like to fix it. Apart from that, we've got a small bug in a dump function
> of the POW2 histograms.
>
> Survives make check -k -j10 RUNTESTFLAGS="tree-prof.exp"
> Ready for trunk?

Ok.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic)
  2016-08-08 17:03                               ` Martin Liška
@ 2016-08-09 12:36                                 ` Nathan Sidwell
  0 siblings, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-09 12:36 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/08/16 13:03, Martin Liška wrote:

>
> v3: fixed wrong defines in libgcc/Makefine.in

ok, thanks

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 2/N] Fix usage of POW2 histogram
  2016-08-09  8:41                               ` [PATCH 2/N] Fix usage of " Martin Liška
@ 2016-08-09 12:37                                 ` Nathan Sidwell
  0 siblings, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-09 12:37 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/09/16 04:41, Martin Liška wrote:
> Another small follow-up changes instrumentation of MOD exprs
> to not instrument divisors different from SSA_NAME.
>
> Patch survives:
> make check -k -j10 RUNTESTFLAGS="tree-prof.exp"
>

ok, thanks

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-09 11:24                         ` [PATCH] Set -fprofile-update=atomic when -pthread is present Martin Liška
@ 2016-08-09 12:40                           ` Nathan Sidwell
  2016-08-09 19:04                           ` Andi Kleen
  1 sibling, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-09 12:40 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/09/16 07:24, Martin Liška wrote:
> Hi.
>
> As mention in [1], enabling -fprofile-update=atomic when -pthread is logical
> thing and is quite expected default behavior.
>
> Ready for trunk?
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58306#c21

This certainly requires changes to invoke.texi, and  possibly NEWS.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-09 11:24                         ` [PATCH] Set -fprofile-update=atomic when -pthread is present Martin Liška
  2016-08-09 12:40                           ` Nathan Sidwell
@ 2016-08-09 19:04                           ` Andi Kleen
  2016-08-12 13:31                             ` Martin Liška
  1 sibling, 1 reply; 95+ messages in thread
From: Andi Kleen @ 2016-08-09 19:04 UTC (permalink / raw)
  To: Martin Liška; +Cc: Nathan Sidwell, gcc-patches, jh

Martin Liška <mliska@suse.cz> writes:

> Hi.
>
> As mention in [1], enabling -fprofile-update=atomic when -pthread is logical
> thing and is quite expected default behavior.
>
> Ready for trunk?

It could potentially make things a lot slower. I don't think it's a good
idea to do this by default.

-Andi

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch
  2016-08-05 13:43                       ` Martin Liška
  2016-08-08 13:59                         ` [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic) Martin Liška
  2016-08-09 11:24                         ` [PATCH] Set -fprofile-update=atomic when -pthread is present Martin Liška
@ 2016-08-10 12:57                         ` Nathan Sidwell
  2016-08-13 12:14                         ` [BUILDROBOT] avr broken (was: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch) Jan-Benedict Glaw
  2016-08-16 12:56                         ` [PATCH] Detect whether target can use -fprofile-update=atomic Martin Liška
  4 siblings, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-10 12:57 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh

On 08/05/16 09:43, Martin Liška wrote:
> On 08/05/2016 03:14 PM, Nathan Sidwell wrote:
>> On 08/05/16 08:48, Martin Liška wrote:
>>
>>> Ok, after all the experimenting and inventing "almost" thread-safe code, I incline to not to include __gcov_one_value_profiler_body_atomic
>>> counter. The final solution is cumbersome and probably does not worth doing. Moreover, even having a thread-safe implementation, result of an indirect call counter
>>> is not going to be stable among different runs (due to a single value storage capability).
>>>
>>> If you agree, I'll prepare a final version of patch?
>>
>> Agreed.
>>
>> nathan
>>
>
> Great, attaching install candidate.

ok, thanks.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-09 19:04                           ` Andi Kleen
@ 2016-08-12 13:31                             ` Martin Liška
  2016-08-18  3:16                               ` Jeff Law
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-12 13:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Nathan Sidwell, gcc-patches, jh

On 08/09/2016 09:03 PM, Andi Kleen wrote:
> It could potentially make things a lot slower. I don't think it's a good
> idea to do this by default.
> 
> -Andi

Ok, alternative can be a warning in the driver that would inform a user
that combining -pthread and -fprofile-update=single can lead to profile corruption.
My first attempt with option handling gcc.c was not successful, I'll try it once.

Is it reasonable approach?

Martin

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [BUILDROBOT] avr broken (was: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch)
  2016-08-05 13:43                       ` Martin Liška
                                           ` (2 preceding siblings ...)
  2016-08-10 12:57                         ` [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch Nathan Sidwell
@ 2016-08-13 12:14                         ` Jan-Benedict Glaw
       [not found]                           ` <4455937b-eba7-fe66-fe1a-3172567dd1e4@suse.cz>
  2016-08-16 12:56                         ` [PATCH] Detect whether target can use -fprofile-update=atomic Martin Liška
  4 siblings, 1 reply; 95+ messages in thread
From: Jan-Benedict Glaw @ 2016-08-13 12:14 UTC (permalink / raw)
  To: Martin Liška; +Cc: Nathan Sidwell, gcc-patches, jh

[-- Attachment #1: Type: text/plain, Size: 3770 bytes --]

On Fri, 2016-08-05 15:43:02 +0200, Martin Liška <mliska@suse.cz> wrote:
[...]
> Great, attaching install candidate.

> >From 0b3ac8636ef34b02e301f22c86dde0602f9969ef Mon Sep 17 00:00:00 2001
> From: marxin <mliska@suse.cz>
> Date: Thu, 28 Jul 2016 14:32:47 +0200
> Subject: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9
>  branch
> 
> gcc/ChangeLog:
> 
> 2016-08-05  Martin Liska  <mliska@suse.cz>
> 
> 	Cherry picked (and modified) from google-4_7 branch
> 	2012-12-26  Rong Xu  <xur@google.com>
> 	* common.opt (fprofile-update): Add new flag.
> 	* coretypes.h: Define enum profile_update.
> 	* doc/invoke.texi: Document -fprofile-update.
> 	* gcov-io.h: Declare GCOV_TYPE_ATOMIC_FETCH_ADD and
> 	GCOV_TYPE_ATOMIC_FETCH_ADD_FN.
> 	* tree-profile.c (gimple_init_edge_profiler): Generate
> 	also atomic profiler update.
> 	(gimple_gen_edge_profiler): Likewise.
[...]
> --- a/gcc/gcov-io.h
> +++ b/gcc/gcov-io.h
> @@ -164,6 +164,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  #ifndef GCC_GCOV_IO_H
>  #define GCC_GCOV_IO_H
>  
> +#if LONG_LONG_TYPE_SIZE > 32
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8
> +#else
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4
> +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4
> +#endif
> +
>  #ifndef IN_LIBGCOV
>  /* About the host */
>  

This doesn't work for AVR since their LONG_LONG_TYPE_SIZE depents on
target flags (see eg. build
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=602648)



g++ -fno-PIE -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include  -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace   -o auto-profile.o -MT auto-profile.o -MMD -MP -MF ./.deps/auto-profile.TPo /home/jbglaw/repos/gcc/gcc/auto-profile.c
In file included from ./tm.h:19:0,
                 from /home/jbglaw/repos/gcc/gcc/backend.h:28,
                 from /home/jbglaw/repos/gcc/gcc/auto-profile.c:26:
./options.h:261:36: error: token "." is not valid in preprocessor expressions
 #define target_flags global_options.x_target_flags
                                    ^
./options.h:5153:23: note: in expansion of macro ‘target_flags’
 #define TARGET_INT8 ((target_flags & MASK_INT8) != 0)
                       ^
/home/jbglaw/repos/gcc/gcc/config/avr/avr.h:138:24: note: in expansion of macro ‘TARGET_INT8’
 #define INT_TYPE_SIZE (TARGET_INT8 ? 8 : 16)
                        ^
/home/jbglaw/repos/gcc/gcc/config/avr/avr.h:141:30: note: in expansion of macro ‘INT_TYPE_SIZE’
 #define LONG_LONG_TYPE_SIZE (INT_TYPE_SIZE == 8 ? 32 : 64)
                              ^
/home/jbglaw/repos/gcc/gcc/gcov-io.h:167:5: note: in expansion of macro ‘LONG_LONG_TYPE_SIZE’
 #if LONG_LONG_TYPE_SIZE > 32
     ^
Makefile:1096: recipe for target 'auto-profile.o' failed
make[1]: *** [auto-profile.o] Error 1
make[1]: Leaving directory '/home/jbglaw/build/avr/build-gcc/gcc'


MfG, JBG

-- 
      Jan-Benedict Glaw      jbglaw@lug-owl.de              +49-172-7608481
Signature of:            http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
the second  :

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-08-05 13:43                       ` Martin Liška
                                           ` (3 preceding siblings ...)
  2016-08-13 12:14                         ` [BUILDROBOT] avr broken (was: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch) Jan-Benedict Glaw
@ 2016-08-16 12:56                         ` Martin Liška
  2016-08-16 14:31                           ` Nathan Sidwell
  4 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-08-16 12:56 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches; +Cc: jh, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 904 bytes --]

Hello.

As reported in [1], m68k has been broken since I installed the patch. Reason is that the target
does not support atomic operations (add, or) for a mode of gcov_type. Because of that, we see
an undefined symbols.

Proper fix contains of 2 parts:
a) compiler emission must verify that -fprofile-update=atomic is doable for a given target; it's done
via a new function can_generate_atomic_builtin
b) libgcc must detect whether __atomic_fetch_add_x can be expanded on the target; that requires configure
support and if the target is not capable to expand these, we must conditionally remove all gcov_.*profiler_atomic
functions from libgcov.a.

Andreas reported that it fixes the test-case mentioned in the PR and I tested that on -march=i386.
Apart from that I've been doing bootstrap on x86_64-linux-gnu.

Ready after it finishes?
Martin

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58306#c30

[-- Attachment #2: 0002-Detect-whether-target-can-use-fprofile-update-atomic.patch --]
[-- Type: text/x-patch, Size: 16466 bytes --]

From a5c6dbdbabc36193f7becce78af58b276b0d3660 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 16 Aug 2016 10:13:13 +0200
Subject: [PATCH 2/2] Detect whether target can use -fprofile-update=atomic

gcc/ChangeLog:

2016-08-12  Martin Liska  <mliska@suse.cz>

	* optabs.c (can_generate_atomic_builtin): New function.
	* optabs.h (can_generate_atomic_builtin): Declare the function.
	* tree-profile.c (tree_profiling):  Detect whether target can use
	-fprofile-update=atomic.

gcc/testsuite/ChangeLog:

2016-08-12  Martin Liska  <mliska@suse.cz>

	* gcc.dg/profile-update-warning.c: New test.

libgcc/ChangeLog:

2016-08-16  Martin Liska  <mliska@suse.cz>

	* acinclude.m4: New file.
	* config.in: New macro defines.
	* configure: Regenerated.
	* configure.ac: Detect atomic operations.
	* libgcov-profiler.c: Detect GCOV_SUPPORTS_ATOMIC and
	conditionaly enable/disable *_atomic functions.
---
 gcc/optabs.c                                  |  13 ++
 gcc/optabs.h                                  |   5 +
 gcc/testsuite/gcc.dg/profile-update-warning.c |   7 +
 gcc/tree-profile.c                            |  18 +++
 libgcc/acinclude.m4                           |  22 ++++
 libgcc/config.in                              |  15 ++-
 libgcc/configure                              | 179 +++++++++++++++++++++++++-
 libgcc/configure.ac                           |   8 ++
 libgcc/libgcov-profiler.c                     |  24 +++-
 9 files changed, 279 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/profile-update-warning.c
 create mode 100644 libgcc/acinclude.m4

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 87b4f97..3be0dfe 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6476,6 +6476,19 @@ get_atomic_op_for_code (struct atomic_op_functions *op, enum rtx_code code)
     }
 }
 
+bool
+can_generate_atomic_builtin (enum rtx_code code, machine_mode mode)
+{
+  struct atomic_op_functions optab;
+  get_atomic_op_for_code (&optab, code);
+  enum insn_code icode = direct_optab_handler (optab.mem_no_result, mode);
+  if (icode != CODE_FOR_nothing)
+    return true;
+
+  return can_compare_and_swap_p (mode, false)
+    || can_compare_and_swap_p (mode, true);
+}
+
 /* See if there is a more optimal way to implement the operation "*MEM CODE VAL"
    using memory order MODEL.  If AFTER is true the operation needs to return
    the value of *MEM after the operation, otherwise the previous value.  
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 03fd94d..769685a 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -348,4 +348,9 @@ extern void expand_jump_insn (enum insn_code icode, unsigned int nops,
 
 extern enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp);
 
+/* Return true when a target is capable of expansion of an atomic builtin
+   with CODE of a specified machine MODE.  */
+
+extern bool can_generate_atomic_builtin (enum rtx_code code, machine_mode mode);
+
 #endif /* GCC_OPTABS_H */
diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
new file mode 100644
index 0000000..0614fad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
+
+int main(int argc, char *argv[])
+{
+  return 0;
+} /* { dg-warning "target does not support atomic profile update, single mode is selected" } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 622869e..799de84 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "profile.h"
 #include "tree-cfgcleanup.h"
 #include "params.h"
+#include "rtl.h"
+#include "optabs.h"
 
 static GTY(()) tree gcov_type_node;
 static GTY(()) tree tree_interval_profiler_fn;
@@ -535,6 +537,22 @@ tree_profiling (void)
 {
   struct cgraph_node *node;
 
+  /* Verify whether we can utilize atomic update operations.  */
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+    {
+      machine_mode mode = mode_for_size (LONG_LONG_TYPE_SIZE > 32 ? 64: 32,
+					 MODE_INT, 1);
+      bool r = can_generate_atomic_builtin (PLUS, mode)
+	&& can_generate_atomic_builtin (IOR, mode);
+
+      if (!r)
+	{
+	  warning (0, "target does not support atomic profile update, "
+		   "single mode is selected");
+	  flag_profile_update = PROFILE_UPDATE_SINGLE;
+	}
+    }
+
   /* This is a small-ipa pass that gets called only once, from
      cgraphunit.c:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/acinclude.m4 b/libgcc/acinclude.m4
new file mode 100644
index 0000000..87ac32b
--- /dev/null
+++ b/libgcc/acinclude.m4
@@ -0,0 +1,22 @@
+dnl Check whether the target supports __atomic_operations.
+AC_DEFUN([LIBGCC_CHECK_ATOMIC_OPERATION], [
+  AC_CACHE_CHECK([whether the target supports atomic operations for $1B],
+		 libgcc_cv_have_atomic_operations_$1, [
+  libgcc_cv_have_atomic_operations_$1=no
+
+  AC_LANG_CONFTEST(
+  [AC_LANG_PROGRAM([[int foovar = 0;]], [[__atomic_fetch_add_$1 (&foovar, 1, 0);
+  __atomic_fetch_or_$1 (&foovar, 1, 0)]])])
+  if AC_TRY_COMMAND(${CC-cc} -Werror -S -o conftest.s conftest.c 1>&AS_MESSAGE_LOG_FD); then
+      if grep __atomic_fetch_add_$1 conftest.s > /dev/null; then
+	:
+      else
+	libgcc_cv_have_atomic_operations_$1=yes
+      fi
+    fi
+    rm -f conftest.*
+    ])
+  if test $libgcc_cv_have_atomic_operations_$1 = yes; then
+    AC_DEFINE(HAVE_ATOMIC_OPERATIONS_$1, 1,
+	      [Define to 1 if the target supports atomic operations for $1B])
+  fi])
diff --git a/libgcc/config.in b/libgcc/config.in
index 4d33411..03e848c 100644
--- a/libgcc/config.in
+++ b/libgcc/config.in
@@ -1,5 +1,17 @@
 /* config.in.  Generated from configure.ac by autoheader.  */
 
+/* Define to 1 if the target supports atomic operations for 16B */
+#undef HAVE_ATOMIC_OPERATIONS_16
+
+/* Define to 1 if the target supports atomic operations for 32B */
+#undef HAVE_ATOMIC_OPERATIONS_32
+
+/* Define to 1 if the target supports atomic operations for 4B */
+#undef HAVE_ATOMIC_OPERATIONS_4
+
+/* Define to 1 if the target supports atomic operations for 8B */
+#undef HAVE_ATOMIC_OPERATIONS_8
+
 /* Define to 1 if the target assembler supports thread-local storage. */
 #undef HAVE_CC_TLS
 
@@ -21,9 +33,6 @@
 /* Define if the system-provided CRTs are present on Solaris. */
 #undef HAVE_SOLARIS_CRTS
 
-/* Define if the system-provided CRTs are present on Solaris. */
-#undef HAVE_SOLARIS_CRTS
-
 /* Define to 1 if you have the <stdint.h> header file. */
 #undef HAVE_STDINT_H
 
diff --git a/libgcc/configure b/libgcc/configure
index bf96aec..0058467 100644
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -2321,10 +2321,6 @@ case "${host}" in
 	;;
     i[34567]86-*-mingw* | x86_64-*-mingw*)
 	;;
-    i[34567]86-*-interix[3-9]*)
-	# Interix 3.x gcc -fpic/-fPIC options generate broken code.
-	# Instead, we relocate shared libraries at runtime.
-	;;
     i[34567]86-*-nto-qnx*)
 	# QNX uses GNU C++, but need to define -shared option too, otherwise
 	# it will coredump.
@@ -5080,6 +5076,180 @@ esac
 
 
 
+# Check out sync builtins support.
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the target supports atomic operations for 32B" >&5
+$as_echo_n "checking whether the target supports atomic operations for 32B... " >&6; }
+if test "${libgcc_cv_have_atomic_operations_32+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+
+  libgcc_cv_have_atomic_operations_32=no
+
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+int foovar = 0;
+int
+main ()
+{
+__atomic_fetch_add_32 (&foovar, 1, 0);
+  __atomic_fetch_or_32 (&foovar, 1, 0)
+  ;
+  return 0;
+}
+_ACEOF
+  if { ac_try='${CC-cc} -Werror -S -o conftest.s conftest.c 1>&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then
+      if grep __atomic_fetch_add_32 conftest.s > /dev/null; then
+	:
+      else
+	libgcc_cv_have_atomic_operations_32=yes
+      fi
+    fi
+    rm -f conftest.*
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_have_atomic_operations_32" >&5
+$as_echo "$libgcc_cv_have_atomic_operations_32" >&6; }
+  if test $libgcc_cv_have_atomic_operations_32 = yes; then
+
+$as_echo "#define HAVE_ATOMIC_OPERATIONS_32 1" >>confdefs.h
+
+  fi
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the target supports atomic operations for 16B" >&5
+$as_echo_n "checking whether the target supports atomic operations for 16B... " >&6; }
+if test "${libgcc_cv_have_atomic_operations_16+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+
+  libgcc_cv_have_atomic_operations_16=no
+
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+int foovar = 0;
+int
+main ()
+{
+__atomic_fetch_add_16 (&foovar, 1, 0);
+  __atomic_fetch_or_16 (&foovar, 1, 0)
+  ;
+  return 0;
+}
+_ACEOF
+  if { ac_try='${CC-cc} -Werror -S -o conftest.s conftest.c 1>&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then
+      if grep __atomic_fetch_add_16 conftest.s > /dev/null; then
+	:
+      else
+	libgcc_cv_have_atomic_operations_16=yes
+      fi
+    fi
+    rm -f conftest.*
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_have_atomic_operations_16" >&5
+$as_echo "$libgcc_cv_have_atomic_operations_16" >&6; }
+  if test $libgcc_cv_have_atomic_operations_16 = yes; then
+
+$as_echo "#define HAVE_ATOMIC_OPERATIONS_16 1" >>confdefs.h
+
+  fi
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the target supports atomic operations for 8B" >&5
+$as_echo_n "checking whether the target supports atomic operations for 8B... " >&6; }
+if test "${libgcc_cv_have_atomic_operations_8+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+
+  libgcc_cv_have_atomic_operations_8=no
+
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+int foovar = 0;
+int
+main ()
+{
+__atomic_fetch_add_8 (&foovar, 1, 0);
+  __atomic_fetch_or_8 (&foovar, 1, 0)
+  ;
+  return 0;
+}
+_ACEOF
+  if { ac_try='${CC-cc} -Werror -S -o conftest.s conftest.c 1>&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then
+      if grep __atomic_fetch_add_8 conftest.s > /dev/null; then
+	:
+      else
+	libgcc_cv_have_atomic_operations_8=yes
+      fi
+    fi
+    rm -f conftest.*
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_have_atomic_operations_8" >&5
+$as_echo "$libgcc_cv_have_atomic_operations_8" >&6; }
+  if test $libgcc_cv_have_atomic_operations_8 = yes; then
+
+$as_echo "#define HAVE_ATOMIC_OPERATIONS_8 1" >>confdefs.h
+
+  fi
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the target supports atomic operations for 4B" >&5
+$as_echo_n "checking whether the target supports atomic operations for 4B... " >&6; }
+if test "${libgcc_cv_have_atomic_operations_4+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+
+  libgcc_cv_have_atomic_operations_4=no
+
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+int foovar = 0;
+int
+main ()
+{
+__atomic_fetch_add_4 (&foovar, 1, 0);
+  __atomic_fetch_or_4 (&foovar, 1, 0)
+  ;
+  return 0;
+}
+_ACEOF
+  if { ac_try='${CC-cc} -Werror -S -o conftest.s conftest.c 1>&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }; then
+      if grep __atomic_fetch_add_4 conftest.s > /dev/null; then
+	:
+      else
+	libgcc_cv_have_atomic_operations_4=yes
+      fi
+    fi
+    rm -f conftest.*
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $libgcc_cv_have_atomic_operations_4" >&5
+$as_echo "$libgcc_cv_have_atomic_operations_4" >&6; }
+  if test $libgcc_cv_have_atomic_operations_4 = yes; then
+
+$as_echo "#define HAVE_ATOMIC_OPERATIONS_4 1" >>confdefs.h
+
+  fi
+
 # Substitute configuration variables
 
 
@@ -5100,6 +5270,7 @@ ac_config_files="$ac_config_files Makefile"
 
 ac_config_commands="$ac_config_commands default"
 
+
 cat >confcache <<\_ACEOF
 # This file is a shell script that caches the results of configure
 # tests run on this system so they can be shared between configure
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index 269997f..f74be17 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -11,6 +11,7 @@ sinclude(../config/dfp.m4)
 sinclude(../config/unwind_ipinfo.m4)
 sinclude(../config/gthr.m4)
 sinclude(../config/sjlj.m4)
+sinclude(acinclude.m4)
 
 AC_PREREQ(2.64)
 AC_INIT([GNU C Runtime Library], 1.0,,[libgcc])
@@ -544,6 +545,12 @@ AC_SUBST(tm_defines)
 # Map from thread model to thread header.
 GCC_AC_THREAD_HEADER([$target_thread_file])
 
+# Check out sync builtins support.
+LIBGCC_CHECK_ATOMIC_OPERATION(32)
+LIBGCC_CHECK_ATOMIC_OPERATION(16)
+LIBGCC_CHECK_ATOMIC_OPERATION(8)
+LIBGCC_CHECK_ATOMIC_OPERATION(4)
+
 # Substitute configuration variables
 AC_SUBST(cpu_type)
 AC_SUBST(extra_parts)
@@ -572,4 +579,5 @@ CONFIG_SHELL=${CONFIG_SHELL-/bin/sh}
 libgcc_topdir=${libgcc_topdir}
 CC="${CC}"
 ]])
+
 AC_OUTPUT
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 70a821d..d07f81a 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -24,8 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
 #include "libgcov.h"
+#include "auto-target.h"
 #if !defined(inhibit_libc)
 
+/* Detect whether target can support atomic update of profilers.  */
+#if LONG_LONG_TYPE_SIZE <= 32 && HAVE_ATOMIC_OPERATIONS_4
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#if LONG_LONG_TYPE_SIZE > 32 && HAVE_ATOMIC_OPERATIONS_8
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#define GCOV_SUPPORTS_ATOMIC 0
+#endif
+#endif
+
 #ifdef L_gcov_interval_profiler
 /* If VALUE is in interval <START, START + STEPS - 1>, then increases the
    corresponding counter in COUNTERS.  If the VALUE is above or below
@@ -46,7 +58,7 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 }
 #endif
 
-#ifdef L_gcov_interval_profiler_atomic
+#if defined(L_gcov_interval_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is in interval <START, START + STEPS - 1>, then increases the
    corresponding counter in COUNTERS.  If the VALUE is above or below
    the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased
@@ -80,7 +92,7 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_pow2_profiler_atomic
+#if defined(L_gcov_pow2_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
    COUNTERS[0] is incremented.  Function is thread-safe.  */
 
@@ -134,7 +146,7 @@ __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_one_value_profiler_atomic
+#if defined(L_gcov_one_value_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 
 /* Update one value profilers (COUNTERS) for a given VALUE.
 
@@ -342,6 +354,7 @@ __gcov_time_profiler (gcov_type* counters)
     counters[0] = ++function_counter;
 }
 
+#if GCOV_SUPPORTS_ATOMIC
 /* Sets corresponding COUNTERS if there is no value.
    Function is thread-safe.  */
 
@@ -352,6 +365,7 @@ __gcov_time_profiler_atomic (gcov_type* counters)
     counters[0] = __atomic_add_fetch (&function_counter, 1, MEMMODEL_RELAXED);
 }
 #endif
+#endif
 
 
 #ifdef L_gcov_average_profiler
@@ -366,7 +380,7 @@ __gcov_average_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_average_profiler_atomic
+#if defined(L_gcov_average_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
    to saturate up.  Function is thread-safe.  */
 
@@ -388,7 +402,7 @@ __gcov_ior_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_ior_profiler_atomic
+#if defined(L_gcov_ior_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* Bitwise-OR VALUE into COUNTER.  Function is thread-safe.  */
 
 void
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [BUILDROBOT] avr broken
       [not found]                           ` <4455937b-eba7-fe66-fe1a-3172567dd1e4@suse.cz>
@ 2016-08-16 13:36                             ` Nathan Sidwell
       [not found]                               ` <617e8799-b7db-fefd-b3a3-842e9a7decfd@suse.cz>
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-16 13:36 UTC (permalink / raw)
  To: Martin Liška, Jan-Benedict Glaw; +Cc: gcc-patches, jh

On 08/16/16 08:49, Martin Liška wrote:
> On 08/13/2016 02:14 PM, Jan-Benedict Glaw wrote:
>> This doesn't work for AVR since their LONG_LONG_TYPE_SIZE depents on
>> target flags (see eg. build
>> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=602648)
>
> Hello.
>
> Sorry for the breakage, I haven't tried to build avr target for the patch.
> I'm sending a candidate patch which survives regression tests on ppc64le-redhat-linux.
> Apart from that, cc1 can be built with --target=avr-linux.
>
> Ready to be installed?
> Martin
>

is LONG_LONG_TYPE_SIZE something not suitable for a #if on avr?

presuming that's the problem, I think this approach is fine.  I did have a think 
about putting the 32/64 bit check in one place rather than the two places it is. 
  But (a) that's overkill and (b) it's in 2 places already.

So this is ok, of it's ok from AVR's POV.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [BUILDROBOT] avr broken
       [not found]                               ` <617e8799-b7db-fefd-b3a3-842e9a7decfd@suse.cz>
@ 2016-08-16 14:31                                 ` Nathan Sidwell
  2016-08-16 17:05                                   ` Jan-Benedict Glaw
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-16 14:31 UTC (permalink / raw)
  To: Martin Liška, Jan-Benedict Glaw; +Cc: gcc-patches, jh, chertykov

On 08/16/16 10:23, Martin Liška wrote:
> On 08/16/2016 03:36 PM, Nathan Sidwell wrote:
>> On 08/16/16 08:49, Martin Liška wrote:
>>> On 08/13/2016 02:14 PM, Jan-Benedict Glaw wrote:
>>>> This doesn't work for AVR since their LONG_LONG_TYPE_SIZE depents on
>>>> target flags (see eg. build
>>>> http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=602648)
>>>
>>> Hello.
>>>
>>> Sorry for the breakage, I haven't tried to build avr target for the patch.
>>> I'm sending a candidate patch which survives regression tests on ppc64le-redhat-linux.
>>> Apart from that, cc1 can be built with --target=avr-linux.
>>>
>>> Ready to be installed?
>>> Martin
>>>
>>
>> is LONG_LONG_TYPE_SIZE something not suitable for a #if on avr?
>
> Exactly that's causing the problem.

ok.  good.  There's  nothing particularly AVRish in the patch, so I'd  say 
commit if you don't hear from JB in a timely manner

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-08-16 12:56                         ` [PATCH] Detect whether target can use -fprofile-update=atomic Martin Liška
@ 2016-08-16 14:31                           ` Nathan Sidwell
  2016-09-06 10:57                             ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-16 14:31 UTC (permalink / raw)
  To: Martin Liška, gcc-patches; +Cc: jh, Andreas Schwab

On 08/16/16 08:55, Martin Liška wrote:
> Hello.
>
> As reported in [1], m68k has been broken since I installed the patch. Reason is that the target
> does not support atomic operations (add, or) for a mode of gcov_type. Because of that, we see
> an undefined symbols.
>
> Proper fix contains of 2 parts:
> a) compiler emission must verify that -fprofile-update=atomic is doable for a given target; it's done
> via a new function can_generate_atomic_builtin
> b) libgcc must detect whether __atomic_fetch_add_x can be expanded on the target; that requires configure
> support and if the target is not capable to expand these, we must conditionally remove all gcov_.*profiler_atomic
> functions from libgcov.a.

I'm fine with the coverage-pecific changes, but the new hooks etc are not 
something I can approve.

gcc/ChangeLog:

2016-08-12  Martin Liska  <mliska@suse.cz>

	* optabs.c (can_generate_atomic_builtin): New function.
	* optabs.h (can_generate_atomic_builtin): Declare the function.
Need GWM or similar review

	* tree-profile.c (tree_profiling):  Detect whether target can use
	-fprofile-update=atomic.
ok

gcc/testsuite/ChangeLog:

2016-08-12  Martin Liska  <mliska@suse.cz>

	* gcc.dg/profile-update-warning.c: New test.
ok

libgcc/ChangeLog:

2016-08-16  Martin Liska  <mliska@suse.cz>

	* acinclude.m4: New file.
	* config.in: New macro defines.
	* configure: Regenerated.
	* configure.ac: Detect atomic operations.
need GWM or similar review

	* libgcov-profiler.c: Detect GCOV_SUPPORTS_ATOMIC and
	conditionaly enable/disable *_atomic functions.
OK.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [BUILDROBOT] avr broken
  2016-08-16 14:31                                 ` Nathan Sidwell
@ 2016-08-16 17:05                                   ` Jan-Benedict Glaw
  2016-08-16 18:26                                     ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Jan-Benedict Glaw @ 2016-08-16 17:05 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Martin Liška, gcc-patches, jh, chertykov

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

On Tue, 2016-08-16 10:31:41 -0400, Nathan Sidwell <nathan@acm.org> wrote:
> On 08/16/16 10:23, Martin Liška wrote:
> > On 08/16/2016 03:36 PM, Nathan Sidwell wrote:
> > > On 08/16/16 08:49, Martin Liška wrote:
> > > > On 08/13/2016 02:14 PM, Jan-Benedict Glaw wrote:
> > > > > This doesn't work for AVR since their LONG_LONG_TYPE_SIZE
> > > > > depents on target flags (see eg. build
> > > > > http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=602648)
> > > > Sorry for the breakage, I haven't tried to build avr target
> > > > for the patch.  I'm sending a candidate patch which survives
> > > > regression tests on ppc64le-redhat-linux.  Apart from that,
> > > > cc1 can be built with --target=avr-linux.
> > >
> > > is LONG_LONG_TYPE_SIZE something not suitable for a #if on avr?
> >
> > Exactly that's causing the problem.
> 
> ok.  good.  There's  nothing particularly AVRish in the patch, so
> I'd  say commit if you don't hear from JB in a timely manner

That'll probably work. But after all, I'm not an AVR maintainer (not
even an user), but just running the Build Robot.

MfG, JBG

-- 
      Jan-Benedict Glaw      jbglaw@lug-owl.de              +49-172-7608481
 Signature of:                    Don't believe in miracles: Rely on them!
 the second  :

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [BUILDROBOT] avr broken
  2016-08-16 17:05                                   ` Jan-Benedict Glaw
@ 2016-08-16 18:26                                     ` Nathan Sidwell
  2016-08-17  7:21                                       ` Denis Chertykov
  2016-08-17  8:11                                       ` Jan-Benedict Glaw
  0 siblings, 2 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-16 18:26 UTC (permalink / raw)
  To: Jan-Benedict Glaw; +Cc: Martin Liška, gcc-patches, jh, chertykov

On 08/16/16 13:04, Jan-Benedict Glaw wrote:

> That'll probably work. But after all, I'm not an AVR maintainer (not
> even an user), but just running the Build Robot.

Does your robot approve? :)

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [BUILDROBOT] avr broken
  2016-08-16 18:26                                     ` Nathan Sidwell
@ 2016-08-17  7:21                                       ` Denis Chertykov
  2016-08-17  7:22                                         ` Martin Liška
  2016-08-17  8:11                                       ` Jan-Benedict Glaw
  1 sibling, 1 reply; 95+ messages in thread
From: Denis Chertykov @ 2016-08-17  7:21 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Jan-Benedict Glaw, Martin Liška, gcc-patches, jh

2016-08-16 21:26 GMT+03:00 Nathan Sidwell <nathan@acm.org>:
> On 08/16/16 13:04, Jan-Benedict Glaw wrote:
>
>> That'll probably work. But after all, I'm not an AVR maintainer (not
>> even an user), but just running the Build Robot.
>
>
> Does your robot approve? :)
>

I'm an AVR maintainer.
The patch does not have any AVR port modifications.
I can't approve it.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [BUILDROBOT] avr broken
  2016-08-17  7:21                                       ` Denis Chertykov
@ 2016-08-17  7:22                                         ` Martin Liška
  0 siblings, 0 replies; 95+ messages in thread
From: Martin Liška @ 2016-08-17  7:22 UTC (permalink / raw)
  To: Denis Chertykov, Nathan Sidwell; +Cc: Jan-Benedict Glaw, gcc-patches, jh

On 08/17/2016 09:20 AM, Denis Chertykov wrote:
> 2016-08-16 21:26 GMT+03:00 Nathan Sidwell <nathan@acm.org>:
>> On 08/16/16 13:04, Jan-Benedict Glaw wrote:
>>
>>> That'll probably work. But after all, I'm not an AVR maintainer (not
>>> even an user), but just running the Build Robot.
>>
>>
>> Does your robot approve? :)
>>
> 
> I'm an AVR maintainer.
> The patch does not have any AVR port modifications.
> I can't approve it.
> 

Based on Nathan's ACK, I've installed the patch as r239522.

Martin

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [BUILDROBOT] avr broken
  2016-08-16 18:26                                     ` Nathan Sidwell
  2016-08-17  7:21                                       ` Denis Chertykov
@ 2016-08-17  8:11                                       ` Jan-Benedict Glaw
  1 sibling, 0 replies; 95+ messages in thread
From: Jan-Benedict Glaw @ 2016-08-17  8:11 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Martin Liška, gcc-patches, jh, chertykov

[-- Attachment #1: Type: text/plain, Size: 752 bytes --]

On Tue, 2016-08-16 14:26:38 -0400, Nathan Sidwell <nathan@acm.org> wrote:
> On 08/16/16 13:04, Jan-Benedict Glaw wrote:
> 
> > That'll probably work. But after all, I'm not an AVR maintainer
> > (not even an user), but just running the Build Robot.
> 
> Does your robot approve? :)

Ohoooh! See there! :)

http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=602648

The Robot has an answer. I guess that means he's okay with the patch
(of which, unfortunately, both versions didn't make it to the list.)

MfG, JBG

-- 
      Jan-Benedict Glaw      jbglaw@lug-owl.de              +49-172-7608481
Signature of:  The course of history shows that as a government grows, liberty
the second  : decreases."  (Thomas Jefferson)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-12 13:31                             ` Martin Liška
@ 2016-08-18  3:16                               ` Jeff Law
  2016-08-18 11:02                                 ` Nathan Sidwell
  2016-08-18 15:51                                 ` Andi Kleen
  0 siblings, 2 replies; 95+ messages in thread
From: Jeff Law @ 2016-08-18  3:16 UTC (permalink / raw)
  To: Martin Liška, Andi Kleen; +Cc: Nathan Sidwell, gcc-patches, jh

On 08/12/2016 07:31 AM, Martin Liška wrote:
> On 08/09/2016 09:03 PM, Andi Kleen wrote:
>> It could potentially make things a lot slower. I don't think it's a good
>> idea to do this by default.
>>
>> -Andi
>
> Ok, alternative can be a warning in the driver that would inform a user
> that combining -pthread and -fprofile-update=single can lead to profile corruption.
> My first attempt with option handling gcc.c was not successful, I'll try it once.
>
> Is it reasonable approach?
I'd prefer to make updates atomic in multi-threaded applications.  The 
best proxy we have for that is -pthread.

Is it slower, most definitely, but odds are we're giving folks garbage 
data otherwise, which in many ways is even worse.

jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-18  3:16                               ` Jeff Law
@ 2016-08-18 11:02                                 ` Nathan Sidwell
  2016-08-18 15:51                                 ` Andi Kleen
  1 sibling, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-08-18 11:02 UTC (permalink / raw)
  To: Jeff Law, Martin Liška, Andi Kleen; +Cc: gcc-patches, jh

On 08/17/16 23:15, Jeff Law wrote:
> On 08/12/2016 07:31 AM, Martin Liška wrote:
>> On 08/09/2016 09:03 PM, Andi Kleen wrote:
>>> It could potentially make things a lot slower. I don't think it's a good
>>> idea to do this by default.
>>>
>>> -Andi
>>
>> Ok, alternative can be a warning in the driver that would inform a user
>> that combining -pthread and -fprofile-update=single can lead to profile
>> corruption.
>> My first attempt with option handling gcc.c was not successful, I'll try it once.
>>
>> Is it reasonable approach?
> I'd prefer to make updates atomic in multi-threaded applications.  The best
> proxy we have for that is -pthread.
>
> Is it slower, most definitely, but odds are we're giving folks garbage data
> otherwise, which in many ways is even worse.

True, and if someone cares they can always override the new default behaviour.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-18  3:16                               ` Jeff Law
  2016-08-18 11:02                                 ` Nathan Sidwell
@ 2016-08-18 15:51                                 ` Andi Kleen
  2016-08-18 15:53                                   ` Jeff Law
                                                     ` (2 more replies)
  1 sibling, 3 replies; 95+ messages in thread
From: Andi Kleen @ 2016-08-18 15:51 UTC (permalink / raw)
  To: Jeff Law; +Cc: Martin Liška, Andi Kleen, Nathan Sidwell, gcc-patches, jh

> I'd prefer to make updates atomic in multi-threaded applications.
> The best proxy we have for that is -pthread.
> 
> Is it slower, most definitely, but odds are we're giving folks
> garbage data otherwise, which in many ways is even worse.

It will likely be catastrophically slower in some cases. 

Catastrophically as in too slow to be usable.

An atomic instruction is a lot more expensive than a single increment. Also
they sometimes are really slow depending on the state of the machine.

-Andi

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-18 15:51                                 ` Andi Kleen
@ 2016-08-18 15:53                                   ` Jeff Law
  2016-10-03 12:13                                     ` Martin Liška
  2016-08-18 15:54                                   ` [PATCH] Set -fprofile-update=atomic when -pthread is present Jakub Jelinek
  2016-08-18 16:04                                   ` [PATCH] Set -fprofile-update=atomic when -pthread is present Richard Biener
  2 siblings, 1 reply; 95+ messages in thread
From: Jeff Law @ 2016-08-18 15:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Martin Liška, Nathan Sidwell, gcc-patches, jh

On 08/18/2016 09:51 AM, Andi Kleen wrote:
>> I'd prefer to make updates atomic in multi-threaded applications.
>> The best proxy we have for that is -pthread.
>>
>> Is it slower, most definitely, but odds are we're giving folks
>> garbage data otherwise, which in many ways is even worse.
>
> It will likely be catastrophically slower in some cases.
>
> Catastrophically as in too slow to be usable.
>
> An atomic instruction is a lot more expensive than a single increment. Also
> they sometimes are really slow depending on the state of the machine.
And for those cases there's a way to override.

The default should be set for correctness.

jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-18 15:51                                 ` Andi Kleen
  2016-08-18 15:53                                   ` Jeff Law
@ 2016-08-18 15:54                                   ` Jakub Jelinek
  2016-08-18 16:06                                     ` Richard Biener
  2016-08-18 16:04                                   ` [PATCH] Set -fprofile-update=atomic when -pthread is present Richard Biener
  2 siblings, 1 reply; 95+ messages in thread
From: Jakub Jelinek @ 2016-08-18 15:54 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeff Law, Martin Liška, Nathan Sidwell, gcc-patches, jh

On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
> > I'd prefer to make updates atomic in multi-threaded applications.
> > The best proxy we have for that is -pthread.
> > 
> > Is it slower, most definitely, but odds are we're giving folks
> > garbage data otherwise, which in many ways is even worse.
> 
> It will likely be catastrophically slower in some cases. 
> 
> Catastrophically as in too slow to be usable.
> 
> An atomic instruction is a lot more expensive than a single increment. Also
> they sometimes are really slow depending on the state of the machine.

Can't we just have thread-local copies of all the counters (perhaps using
__thread pointer as base) and just atomically merge at thread termination?

	Jakub

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-18 15:51                                 ` Andi Kleen
  2016-08-18 15:53                                   ` Jeff Law
  2016-08-18 15:54                                   ` [PATCH] Set -fprofile-update=atomic when -pthread is present Jakub Jelinek
@ 2016-08-18 16:04                                   ` Richard Biener
  2 siblings, 0 replies; 95+ messages in thread
From: Richard Biener @ 2016-08-18 16:04 UTC (permalink / raw)
  To: Andi Kleen, Jeff Law; +Cc: Martin Liška, Nathan Sidwell, gcc-patches, jh

On August 18, 2016 5:51:31 PM GMT+02:00, Andi Kleen <andi@firstfloor.org> wrote:
>> I'd prefer to make updates atomic in multi-threaded applications.
>> The best proxy we have for that is -pthread.
>> 
>> Is it slower, most definitely, but odds are we're giving folks
>> garbage data otherwise, which in many ways is even worse.
>
>It will likely be catastrophically slower in some cases. 
>
>Catastrophically as in too slow to be usable.
>
>An atomic instruction is a lot more expensive than a single increment.
>Also
>they sometimes are really slow depending on the state of the machine.

The important part is to optimize increments in loops - sth we have special ways to do for the regular counters.  OTOH if we can delay instrumenting to late enough we can instrument loops optimized in the first place.

Richard.

>-Andi


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-18 15:54                                   ` [PATCH] Set -fprofile-update=atomic when -pthread is present Jakub Jelinek
@ 2016-08-18 16:06                                     ` Richard Biener
  2016-09-07 11:41                                       ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Richard Biener @ 2016-08-18 16:06 UTC (permalink / raw)
  To: Jakub Jelinek, Andi Kleen
  Cc: Jeff Law, Martin Liška, Nathan Sidwell, gcc-patches, jh

On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>> > I'd prefer to make updates atomic in multi-threaded applications.
>> > The best proxy we have for that is -pthread.
>> > 
>> > Is it slower, most definitely, but odds are we're giving folks
>> > garbage data otherwise, which in many ways is even worse.
>> 
>> It will likely be catastrophically slower in some cases. 
>> 
>> Catastrophically as in too slow to be usable.
>> 
>> An atomic instruction is a lot more expensive than a single
>increment. Also
>> they sometimes are really slow depending on the state of the machine.
>
>Can't we just have thread-local copies of all the counters (perhaps
>using
>__thread pointer as base) and just atomically merge at thread
>termination?

I suggested that as well but of course it'll have its own class of issues (short lived threads, so we need to somehow re-use counters from terminated threads, large number of threads and thus using too much memory for the counters)

Richard.

>	Jakub


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-08-16 14:31                           ` Nathan Sidwell
@ 2016-09-06 10:57                             ` Martin Liška
  2016-09-06 11:17                               ` David Edelsohn
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-09-06 10:57 UTC (permalink / raw)
  To: Nathan Sidwell, gcc-patches
  Cc: jh, Andreas Schwab, David Edelsohn, Jakub Jelinek, Richard Biener

On 08/16/2016 04:30 PM, Nathan Sidwell wrote:
> On 08/16/16 08:55, Martin Liška wrote:
>> Hello.
>>
>> As reported in [1], m68k has been broken since I installed the patch. Reason is that the target
>> does not support atomic operations (add, or) for a mode of gcov_type. Because of that, we see
>> an undefined symbols.
>>
>> Proper fix contains of 2 parts:
>> a) compiler emission must verify that -fprofile-update=atomic is doable for a given target; it's done
>> via a new function can_generate_atomic_builtin
>> b) libgcc must detect whether __atomic_fetch_add_x can be expanded on the target; that requires configure
>> support and if the target is not capable to expand these, we must conditionally remove all gcov_.*profiler_atomic
>> functions from libgcov.a.
> 
> I'm fine with the coverage-pecific changes, but the new hooks etc are not something I can approve.
> 
> gcc/ChangeLog:
> 
> 2016-08-12  Martin Liska  <mliska@suse.cz>
> 
>     * optabs.c (can_generate_atomic_builtin): New function.
>     * optabs.h (can_generate_atomic_builtin): Declare the function.
> Need GWM or similar review
> 
>     * tree-profile.c (tree_profiling):  Detect whether target can use
>     -fprofile-update=atomic.
> ok
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-08-12  Martin Liska  <mliska@suse.cz>
> 
>     * gcc.dg/profile-update-warning.c: New test.
> ok
> 
> libgcc/ChangeLog:
> 
> 2016-08-16  Martin Liska  <mliska@suse.cz>
> 
>     * acinclude.m4: New file.
>     * config.in: New macro defines.
>     * configure: Regenerated.
>     * configure.ac: Detect atomic operations.
> need GWM or similar review
> 
>     * libgcov-profiler.c: Detect GCOV_SUPPORTS_ATOMIC and
>     conditionaly enable/disable *_atomic functions.
> OK.
> 
> nathan

Hi Nathan.

Thanks for review, I'm CCing Jakub and Richard for the review.
The patch should fix very similar issue spotted on AIX target by David.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 10:57                             ` Martin Liška
@ 2016-09-06 11:17                               ` David Edelsohn
  2016-09-06 12:15                                 ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: David Edelsohn @ 2016-09-06 11:17 UTC (permalink / raw)
  To: Martin Liška, Nathan Sidwell, Jakub Jelinek
  Cc: GCC Patches, Jan Hubicka, Andreas Schwab, Richard Biener

On Tue, Sep 6, 2016 at 6:45 AM, Martin Liška <mliska@suse.cz> wrote:

>>> Proper fix contains of 2 parts:
>>> a) compiler emission must verify that -fprofile-update=atomic is doable for a given target; it's done
>>> via a new function can_generate_atomic_builtin
>>> b) libgcc must detect whether __atomic_fetch_add_x can be expanded on the target; that requires configure
>>> support and if the target is not capable to expand these, we must conditionally remove all gcov_.*profiler_atomic
>>> functions from libgcov.a.
>>
>> I'm fine with the coverage-pecific changes, but the new hooks etc are not something I can approve.
>>
>> gcc/ChangeLog:
>>
>> 2016-08-12  Martin Liska  <mliska@suse.cz>
>>
>>     * optabs.c (can_generate_atomic_builtin): New function.
>>     * optabs.h (can_generate_atomic_builtin): Declare the function.
>> Need GWM or similar review
>>
>>     * tree-profile.c (tree_profiling):  Detect whether target can use
>>     -fprofile-update=atomic.
>> ok
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-08-12  Martin Liska  <mliska@suse.cz>
>>
>>     * gcc.dg/profile-update-warning.c: New test.
>> ok
>>
>> libgcc/ChangeLog:
>>
>> 2016-08-16  Martin Liska  <mliska@suse.cz>
>>
>>     * acinclude.m4: New file.
>>     * config.in: New macro defines.
>>     * configure: Regenerated.
>>     * configure.ac: Detect atomic operations.
>> need GWM or similar review
>>
>>     * libgcov-profiler.c: Detect GCOV_SUPPORTS_ATOMIC and
>>     conditionaly enable/disable *_atomic functions.
>> OK.
>>
>> nathan
>
> Hi Nathan.
>
> Thanks for review, I'm CCing Jakub and Richard for the review.
> The patch should fix very similar issue spotted on AIX target by David.

What about Jakub's comment in the PR?

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6

The proposed patch seems wrong or at least incomplete.  The recent
change is imposing a 64 bit DImode counter when producing 32 bit code.
PowerPC does support 64 bit atomic operations in 32 bit mode.

Was there a design decision that profile counters always should be 64
bits?  Either 32 bit targets won't support multi-threaded profiling or
32 bit targets can overflow the counter sooner.  Which is worse?
Which is more likely?

Thanks, David

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 11:17                               ` David Edelsohn
@ 2016-09-06 12:15                                 ` Nathan Sidwell
  2016-09-06 12:39                                   ` Jakub Jelinek
  2016-09-06 12:41                                   ` David Edelsohn
  0 siblings, 2 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-09-06 12:15 UTC (permalink / raw)
  To: David Edelsohn, Martin Liška, Jakub Jelinek
  Cc: GCC Patches, Jan Hubicka, Andreas Schwab, Richard Biener

On 09/06/16 06:57, David Edelsohn wrote:

> What about Jakub's comment in the PR?
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6

This needs addressing.  Can you clarify PPC behaviour, because I may have 
misunderstood:

1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops in 
32 bit mode.

2) PPC currently has 32 bit counters anyway.

I had interpreted the comment to be implying #2, but now I'm not so sure.

> The proposed patch seems wrong or at least incomplete.  The recent
> change is imposing a 64 bit DImode counter when producing 32 bit code.
> PowerPC does support 64 bit atomic operations in 32 bit mode.

I'm presuming you've missed a 'NOT' in that sentence.

> Was there a design decision that profile counters always should be 64
> bits?  Either 32 bit targets won't support multi-threaded profiling or
> 32 bit targets can overflow the counter sooner.


>  Which is worse?
> Which is more likely?

My initial thought is that it is probably awkward to support 2 different sized 
counter types in the 'same' config.  I.e. 64-bit single-threaded counters and 
32-bit threaded counters.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 12:15                                 ` Nathan Sidwell
@ 2016-09-06 12:39                                   ` Jakub Jelinek
  2016-09-06 12:43                                     ` David Edelsohn
  2016-09-06 12:41                                   ` David Edelsohn
  1 sibling, 1 reply; 95+ messages in thread
From: Jakub Jelinek @ 2016-09-06 12:39 UTC (permalink / raw)
  To: Nathan Sidwell
  Cc: David Edelsohn, Martin Liška, GCC Patches, Jan Hubicka,
	Andreas Schwab, Richard Biener

On Tue, Sep 06, 2016 at 08:14:58AM -0400, Nathan Sidwell wrote:
> On 09/06/16 06:57, David Edelsohn wrote:
> 
> >What about Jakub's comment in the PR?
> >
> >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6
> 
> This needs addressing.  Can you clarify PPC behaviour, because I may have
> misunderstood:
> 
> 1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops
> in 32 bit mode.
> 
> 2) PPC currently has 32 bit counters anyway.
> 
> I had interpreted the comment to be implying #2, but now I'm not so sure.

Aren't the counters 64-bit everywhere?
Even with 32-bit atomics, as the only operation needed is addition, can't it
be implemented anyway?  Instead of:
  __atomic_fetch_add_8 (&val, 1, __ATOMIC_RELAXED);
one could for e.g. little endian:
  if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == 0)
    __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED);
There is a small risk that e.g. SIGTERM happens in between the two and the
counters are written into the file without the upper half being bumped, but
compared to the non-atomic updates it is much less likely (and, if you
actually never overflow the counters, it won't be an issue anyway).

	Jakub

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 12:15                                 ` Nathan Sidwell
  2016-09-06 12:39                                   ` Jakub Jelinek
@ 2016-09-06 12:41                                   ` David Edelsohn
  2016-09-06 12:51                                     ` Martin Liška
  1 sibling, 1 reply; 95+ messages in thread
From: David Edelsohn @ 2016-09-06 12:41 UTC (permalink / raw)
  To: Nathan Sidwell
  Cc: Martin Liška, Jakub Jelinek, GCC Patches, Jan Hubicka,
	Andreas Schwab, Richard Biener

On Tue, Sep 6, 2016 at 8:14 AM, Nathan Sidwell <nathan@acm.org> wrote:
> On 09/06/16 06:57, David Edelsohn wrote:
>
>> What about Jakub's comment in the PR?
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6
>
>
> This needs addressing.  Can you clarify PPC behaviour, because I may have
> misunderstood:
>
> 1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops
> in 32 bit mode.
>
> 2) PPC currently has 32 bit counters anyway.

The rs6000 port ABIs implement 64 bit "long long" type.  The current
code uses LONG_LONG_TYPE_SIZE for the counters.  I assume that most
ports don't support native 64 bit atomic operations in 32 bit ABI --
PowerPC does not.

The previous code allowed gcov type to be overridden, but I don't
think that it was 32 bit on most targets.

>
> I had interpreted the comment to be implying #2, but now I'm not so sure.
>
>> The proposed patch seems wrong or at least incomplete.  The recent
>> change is imposing a 64 bit DImode counter when producing 32 bit code.
>> PowerPC does support 64 bit atomic operations in 32 bit mode.
>
>
> I'm presuming you've missed a 'NOT' in that sentence.

Yes, I omitted a "NOT".

PowerPC64 has 64 bit atomics, but PowerPC32 subset only provides 32
bit atomics in the ISA.

If the counters always should be 64 bit, then a poor-man's 64 bit
atomic operation proposed by Jakub seems like a better solution.

Thanks, David

>
>> Was there a design decision that profile counters always should be 64
>> bits?  Either 32 bit targets won't support multi-threaded profiling or
>> 32 bit targets can overflow the counter sooner.
>
>
>
>>  Which is worse?
>> Which is more likely?
>
>
> My initial thought is that it is probably awkward to support 2 different
> sized counter types in the 'same' config.  I.e. 64-bit single-threaded
> counters and 32-bit threaded counters.
>
> nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 12:39                                   ` Jakub Jelinek
@ 2016-09-06 12:43                                     ` David Edelsohn
  0 siblings, 0 replies; 95+ messages in thread
From: David Edelsohn @ 2016-09-06 12:43 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Nathan Sidwell, Martin Liška, GCC Patches, Jan Hubicka,
	Andreas Schwab, Richard Biener

On Tue, Sep 6, 2016 at 8:26 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Sep 06, 2016 at 08:14:58AM -0400, Nathan Sidwell wrote:
>> On 09/06/16 06:57, David Edelsohn wrote:
>>
>> >What about Jakub's comment in the PR?
>> >
>> >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6
>>
>> This needs addressing.  Can you clarify PPC behaviour, because I may have
>> misunderstood:
>>
>> 1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops
>> in 32 bit mode.
>>
>> 2) PPC currently has 32 bit counters anyway.
>>
>> I had interpreted the comment to be implying #2, but now I'm not so sure.
>
> Aren't the counters 64-bit everywhere?
> Even with 32-bit atomics, as the only operation needed is addition, can't it
> be implemented anyway?  Instead of:
>   __atomic_fetch_add_8 (&val, 1, __ATOMIC_RELAXED);
> one could for e.g. little endian:
>   if (__atomic_add_fetch_4 ((unsigned int *) &val, 1, __ATOMIC_RELAXED) == 0)
>     __atomic_fetch_add_4 (((unsigned int *) &val) + 1, 1, __ATOMIC_RELAXED);

Is this correct for either endianness?

- David

> There is a small risk that e.g. SIGTERM happens in between the two and the
> counters are written into the file without the upper half being bumped, but
> compared to the non-atomic updates it is much less likely (and, if you
> actually never overflow the counters, it won't be an issue anyway).
>
>         Jakub

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 12:41                                   ` David Edelsohn
@ 2016-09-06 12:51                                     ` Martin Liška
  2016-09-06 13:13                                       ` Jakub Jelinek
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-09-06 12:51 UTC (permalink / raw)
  To: David Edelsohn, Nathan Sidwell
  Cc: Jakub Jelinek, GCC Patches, Jan Hubicka, Andreas Schwab, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2344 bytes --]

On 09/06/2016 02:38 PM, David Edelsohn wrote:
> On Tue, Sep 6, 2016 at 8:14 AM, Nathan Sidwell <nathan@acm.org> wrote:
>> On 09/06/16 06:57, David Edelsohn wrote:
>>
>>> What about Jakub's comment in the PR?
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378#c6
>>
>>
>> This needs addressing.  Can you clarify PPC behaviour, because I may have
>> misunderstood:
>>
>> 1) PPC currently has 64 bit counters -- but cannot support 64bit atomic ops
>> in 32 bit mode.
>>
>> 2) PPC currently has 32 bit counters anyway.
> 
> The rs6000 port ABIs implement 64 bit "long long" type.  The current
> code uses LONG_LONG_TYPE_SIZE for the counters.  I assume that most
> ports don't support native 64 bit atomic operations in 32 bit ABI --
> PowerPC does not.
> 
> The previous code allowed gcov type to be overridden, but I don't
> think that it was 32 bit on most targets.
> 
>>
>> I had interpreted the comment to be implying #2, but now I'm not so sure.
>>
>>> The proposed patch seems wrong or at least incomplete.  The recent
>>> change is imposing a 64 bit DImode counter when producing 32 bit code.
>>> PowerPC does support 64 bit atomic operations in 32 bit mode.
>>
>>
>> I'm presuming you've missed a 'NOT' in that sentence.
> 
> Yes, I omitted a "NOT".
> 
> PowerPC64 has 64 bit atomics, but PowerPC32 subset only provides 32
> bit atomics in the ISA.
> 
> If the counters always should be 64 bit, then a poor-man's 64 bit
> atomic operation proposed by Jakub seems like a better solution.
> 
> Thanks, David

Hi David.

I sent the previous email before I read the Jakub's comment.
I'm attaching simplified patch (based of the comment), which works for i386
target. I'm testing that on on m68k target.

I prefer to have a single type for one config, same what Nathan suggested.
I like the idea of poor-man's atomics, I can make an incremental patch.

Martin

> 
>>
>>> Was there a design decision that profile counters always should be 64
>>> bits?  Either 32 bit targets won't support multi-threaded profiling or
>>> 32 bit targets can overflow the counter sooner.
>>
>>
>>
>>>  Which is worse?
>>> Which is more likely?
>>
>>
>> My initial thought is that it is probably awkward to support 2 different
>> sized counter types in the 'same' config.  I.e. 64-bit single-threaded
>> counters and 32-bit threaded counters.
>>
>> nathan


[-- Attachment #2: 0001-PATCH-Detect-whether-target-can-use-fprofile-update-.patch --]
[-- Type: text/x-patch, Size: 5586 bytes --]

From 44289abf2e3ecfb7e17c6f204b280af06bf20b0e Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 6 Sep 2016 14:35:52 +0200
Subject: [PATCH] [PATCH] Detect whether target can use -fprofile-update=atomic

libgcc/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* libgcov-profiler.c: Use __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{4,8} to
	conditionaly enable/disable *_atomic functions.

gcc/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* tree-profile.c (tree_profiling): Detect whether target can use
	-fprofile-update=atomic.

gcc/testsuite/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* gcc.dg/profile-update-warning.c: New test.
---
 gcc/testsuite/gcc.dg/profile-update-warning.c |  7 +++++++
 gcc/tree-profile.c                            | 17 +++++++++++++++++
 libgcc/libgcov-profiler.c                     | 24 +++++++++++++++++++-----
 3 files changed, 43 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/profile-update-warning.c

diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
new file mode 100644
index 0000000..0614fad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
+
+int main(int argc, char *argv[])
+{
+  return 0;
+} /* { dg-warning "target does not support atomic profile update, single mode is selected" } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 622869e..8ce35be 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -528,6 +528,13 @@ gimple_gen_ior_profiler (histogram_value value, unsigned tag, unsigned base)
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
+#ifndef HAVE_sync_compare_and_swapsi
+#define HAVE_sync_compare_and_swapsi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapsi
+#define HAVE_atomic_compare_and_swapsi 0
+#endif
+
 /* Profile all functions in the callgraph.  */
 
 static unsigned int
@@ -535,6 +542,16 @@ tree_profiling (void)
 {
   struct cgraph_node *node;
 
+  /* Verify whether we can utilize atomic update operations.  */
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
+      && !HAVE_sync_compare_and_swapsi
+      && !HAVE_atomic_compare_and_swapsi)
+    {
+      warning (0, "target does not support atomic profile update, "
+	       "single mode is selected");
+      flag_profile_update = PROFILE_UPDATE_SINGLE;
+    }
+
   /* This is a small-ipa pass that gets called only once, from
      cgraphunit.c:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 70a821d..887041f 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -24,8 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
 #include "libgcov.h"
+#include "auto-target.h"
 #if !defined(inhibit_libc)
 
+/* Detect whether target can support atomic update of profilers.  */
+#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#define GCOV_SUPPORTS_ATOMIC 0
+#endif
+#endif
+
 #ifdef L_gcov_interval_profiler
 /* If VALUE is in interval <START, START + STEPS - 1>, then increases the
    corresponding counter in COUNTERS.  If the VALUE is above or below
@@ -46,7 +58,7 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 }
 #endif
 
-#ifdef L_gcov_interval_profiler_atomic
+#if defined(L_gcov_interval_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is in interval <START, START + STEPS - 1>, then increases the
    corresponding counter in COUNTERS.  If the VALUE is above or below
    the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased
@@ -80,7 +92,7 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_pow2_profiler_atomic
+#if defined(L_gcov_pow2_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
    COUNTERS[0] is incremented.  Function is thread-safe.  */
 
@@ -134,7 +146,7 @@ __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_one_value_profiler_atomic
+#if defined(L_gcov_one_value_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 
 /* Update one value profilers (COUNTERS) for a given VALUE.
 
@@ -342,6 +354,7 @@ __gcov_time_profiler (gcov_type* counters)
     counters[0] = ++function_counter;
 }
 
+#if GCOV_SUPPORTS_ATOMIC
 /* Sets corresponding COUNTERS if there is no value.
    Function is thread-safe.  */
 
@@ -352,6 +365,7 @@ __gcov_time_profiler_atomic (gcov_type* counters)
     counters[0] = __atomic_add_fetch (&function_counter, 1, MEMMODEL_RELAXED);
 }
 #endif
+#endif
 
 
 #ifdef L_gcov_average_profiler
@@ -366,7 +380,7 @@ __gcov_average_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_average_profiler_atomic
+#if defined(L_gcov_average_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
    to saturate up.  Function is thread-safe.  */
 
@@ -388,7 +402,7 @@ __gcov_ior_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_ior_profiler_atomic
+#if defined(L_gcov_ior_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* Bitwise-OR VALUE into COUNTER.  Function is thread-safe.  */
 
 void
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 12:51                                     ` Martin Liška
@ 2016-09-06 13:13                                       ` Jakub Jelinek
  2016-09-06 13:15                                         ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Jakub Jelinek @ 2016-09-06 13:13 UTC (permalink / raw)
  To: Martin Liška
  Cc: David Edelsohn, Nathan Sidwell, GCC Patches, Jan Hubicka,
	Andreas Schwab, Richard Biener

On Tue, Sep 06, 2016 at 02:45:32PM +0200, Martin Liška wrote:
> --- a/gcc/tree-profile.c
> +++ b/gcc/tree-profile.c
> @@ -528,6 +528,13 @@ gimple_gen_ior_profiler (histogram_value value, unsigned tag, unsigned base)
>    gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>  }
>  
> +#ifndef HAVE_sync_compare_and_swapsi
> +#define HAVE_sync_compare_and_swapsi 0
> +#endif
> +#ifndef HAVE_atomic_compare_and_swapsi
> +#define HAVE_atomic_compare_and_swapsi 0
> +#endif
> +
>  /* Profile all functions in the callgraph.  */
>  
>  static unsigned int
> @@ -535,6 +542,16 @@ tree_profiling (void)
>  {
>    struct cgraph_node *node;
>  
> +  /* Verify whether we can utilize atomic update operations.  */
> +  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
> +      && !HAVE_sync_compare_and_swapsi
> +      && !HAVE_atomic_compare_and_swapsi)

This isn't in sync with:

> +/* Detect whether target can support atomic update of profilers.  */
> +#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
> +#define GCOV_SUPPORTS_ATOMIC 1
> +#else
> +#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
> +#define GCOV_SUPPORTS_ATOMIC 1
> +#else
> +#define GCOV_SUPPORTS_ATOMIC 0
> +#endif
> +#endif

this.  Either you implement the poor man's 64-bit atomics with 32-bit cas
and adjust the latter, or the former needs to look at the target's gcov type
(long long always?) and depending on its size either test the HAVE_*si or
HAVE_*di macros.

	Jakub

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 13:13                                       ` Jakub Jelinek
@ 2016-09-06 13:15                                         ` Martin Liška
  2016-09-06 13:45                                           ` Jakub Jelinek
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-09-06 13:15 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: David Edelsohn, Nathan Sidwell, GCC Patches, Jan Hubicka,
	Andreas Schwab, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1735 bytes --]

On 09/06/2016 02:51 PM, Jakub Jelinek wrote:
> On Tue, Sep 06, 2016 at 02:45:32PM +0200, Martin Liška wrote:
>> --- a/gcc/tree-profile.c
>> +++ b/gcc/tree-profile.c
>> @@ -528,6 +528,13 @@ gimple_gen_ior_profiler (histogram_value value, unsigned tag, unsigned base)
>>    gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>>  }
>>  
>> +#ifndef HAVE_sync_compare_and_swapsi
>> +#define HAVE_sync_compare_and_swapsi 0
>> +#endif
>> +#ifndef HAVE_atomic_compare_and_swapsi
>> +#define HAVE_atomic_compare_and_swapsi 0
>> +#endif
>> +
>>  /* Profile all functions in the callgraph.  */
>>  
>>  static unsigned int
>> @@ -535,6 +542,16 @@ tree_profiling (void)
>>  {
>>    struct cgraph_node *node;
>>  
>> +  /* Verify whether we can utilize atomic update operations.  */
>> +  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
>> +      && !HAVE_sync_compare_and_swapsi
>> +      && !HAVE_atomic_compare_and_swapsi)
> 
> This isn't in sync with:
> 
>> +/* Detect whether target can support atomic update of profilers.  */
>> +#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
>> +#define GCOV_SUPPORTS_ATOMIC 1
>> +#else
>> +#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
>> +#define GCOV_SUPPORTS_ATOMIC 1
>> +#else
>> +#define GCOV_SUPPORTS_ATOMIC 0
>> +#endif
>> +#endif
> 
> this.  Either you implement the poor man's 64-bit atomics with 32-bit cas
> and adjust the latter, or the former needs to look at the target's gcov type
> (long long always?) and depending on its size either test the HAVE_*si or
> HAVE_*di macros.
> 
> 	Jakub
> 

Ok, thanks, this should be the proper patch, where I distinguish sizeof(gcov_type) and
use appropriate GAVE_*{s,d}i macros.

Ready for trunk?
Thanks,
Martin

[-- Attachment #2: 0001-PATCH-Detect-whether-target-can-use-fprofile-update-.patch --]
[-- Type: text/x-patch, Size: 6027 bytes --]

From 41bef1e975042071c973c3cb733a0e0d9a59fec6 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 6 Sep 2016 14:35:52 +0200
Subject: [PATCH] [PATCH] Detect whether target can use -fprofile-update=atomic

libgcc/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* libgcov-profiler.c: Use __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{4,8} to
	conditionaly enable/disable *_atomic functions.

gcc/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* tree-profile.c (tree_profiling): Detect whether target can use
	-fprofile-update=atomic.

gcc/testsuite/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* gcc.dg/profile-update-warning.c: New test.
---
 gcc/testsuite/gcc.dg/profile-update-warning.c |  7 ++++++
 gcc/tree-profile.c                            | 35 +++++++++++++++++++++++++++
 libgcc/libgcov-profiler.c                     | 24 ++++++++++++++----
 3 files changed, 61 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/profile-update-warning.c

diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
new file mode 100644
index 0000000..0614fad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
+
+int main(int argc, char *argv[])
+{
+  return 0;
+} /* { dg-warning "target does not support atomic profile update, single mode is selected" } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 622869e..a3e6dca 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -528,6 +528,20 @@ gimple_gen_ior_profiler (histogram_value value, unsigned tag, unsigned base)
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
+#ifndef HAVE_sync_compare_and_swapsi
+#define HAVE_sync_compare_and_swapsi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapsi
+#define HAVE_atomic_compare_and_swapsi 0
+#endif
+
+#ifndef HAVE_sync_compare_and_swapdi
+#define HAVE_sync_compare_and_swapdi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapdi
+#define HAVE_atomic_compare_and_swapdi 0
+#endif
+
 /* Profile all functions in the callgraph.  */
 
 static unsigned int
@@ -535,6 +549,27 @@ tree_profiling (void)
 {
   struct cgraph_node *node;
 
+  /* Verify whether we can utilize atomic update operations.  */
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+    {
+      bool can_support = false;
+      if (sizeof (gcov_type) == 4)
+	can_support
+	  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
+      else if (sizeof (gcov_type) == 8)
+	can_support
+	  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+      else
+	gcc_unreachable ();
+
+      if (!can_support)
+      {
+	warning (0, "target does not support atomic profile update, "
+		 "single mode is selected");
+	flag_profile_update = PROFILE_UPDATE_SINGLE;
+      }
+    }
+
   /* This is a small-ipa pass that gets called only once, from
      cgraphunit.c:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 70a821d..887041f 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -24,8 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
 #include "libgcov.h"
+#include "auto-target.h"
 #if !defined(inhibit_libc)
 
+/* Detect whether target can support atomic update of profilers.  */
+#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#define GCOV_SUPPORTS_ATOMIC 0
+#endif
+#endif
+
 #ifdef L_gcov_interval_profiler
 /* If VALUE is in interval <START, START + STEPS - 1>, then increases the
    corresponding counter in COUNTERS.  If the VALUE is above or below
@@ -46,7 +58,7 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 }
 #endif
 
-#ifdef L_gcov_interval_profiler_atomic
+#if defined(L_gcov_interval_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is in interval <START, START + STEPS - 1>, then increases the
    corresponding counter in COUNTERS.  If the VALUE is above or below
    the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased
@@ -80,7 +92,7 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_pow2_profiler_atomic
+#if defined(L_gcov_pow2_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
    COUNTERS[0] is incremented.  Function is thread-safe.  */
 
@@ -134,7 +146,7 @@ __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_one_value_profiler_atomic
+#if defined(L_gcov_one_value_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 
 /* Update one value profilers (COUNTERS) for a given VALUE.
 
@@ -342,6 +354,7 @@ __gcov_time_profiler (gcov_type* counters)
     counters[0] = ++function_counter;
 }
 
+#if GCOV_SUPPORTS_ATOMIC
 /* Sets corresponding COUNTERS if there is no value.
    Function is thread-safe.  */
 
@@ -352,6 +365,7 @@ __gcov_time_profiler_atomic (gcov_type* counters)
     counters[0] = __atomic_add_fetch (&function_counter, 1, MEMMODEL_RELAXED);
 }
 #endif
+#endif
 
 
 #ifdef L_gcov_average_profiler
@@ -366,7 +380,7 @@ __gcov_average_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_average_profiler_atomic
+#if defined(L_gcov_average_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
    to saturate up.  Function is thread-safe.  */
 
@@ -388,7 +402,7 @@ __gcov_ior_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_ior_profiler_atomic
+#if defined(L_gcov_ior_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* Bitwise-OR VALUE into COUNTER.  Function is thread-safe.  */
 
 void
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 13:15                                         ` Martin Liška
@ 2016-09-06 13:45                                           ` Jakub Jelinek
  2016-09-06 13:50                                             ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Jakub Jelinek @ 2016-09-06 13:45 UTC (permalink / raw)
  To: Martin Liška
  Cc: David Edelsohn, Nathan Sidwell, GCC Patches, Jan Hubicka,
	Andreas Schwab, Richard Biener

On Tue, Sep 06, 2016 at 03:13:09PM +0200, Martin Liška wrote:
> @@ -535,6 +549,27 @@ tree_profiling (void)
>  {
>    struct cgraph_node *node;
>  
> +  /* Verify whether we can utilize atomic update operations.  */
> +  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
> +    {
> +      bool can_support = false;
> +      if (sizeof (gcov_type) == 4)
> +	can_support
> +	  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
> +      else if (sizeof (gcov_type) == 8)
> +	can_support
> +	  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
> +      else
> +	gcc_unreachable ();

sizeof (gcov_type) talks about the host gcov type, you want instead the
target gcov type.  So
TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
== 4 vs. 8).
As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
And I wouldn't add gcc_unreachable, just warn for weirdo arches always.

	Jakub

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 13:45                                           ` Jakub Jelinek
@ 2016-09-06 13:50                                             ` Martin Liška
  2016-09-06 14:06                                               ` Jakub Jelinek
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 95+ messages in thread
From: Martin Liška @ 2016-09-06 13:50 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: David Edelsohn, Nathan Sidwell, GCC Patches, Jan Hubicka,
	Andreas Schwab, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 553 bytes --]

On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
> sizeof (gcov_type) talks about the host gcov type, you want instead the
> target gcov type.  So
> TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
> == 4 vs. 8).
> As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
> TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
> And I wouldn't add gcc_unreachable, just warn for weirdo arches always.
> 
> 	Jakub

Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.

Ready for trunk?
Martin

[-- Attachment #2: 0001-PATCH-Detect-whether-target-can-use-fprofile-update-.patch --]
[-- Type: text/x-patch, Size: 6084 bytes --]

From 744d1688fee0359314d87d948323f58fbca6172e Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 6 Sep 2016 14:35:52 +0200
Subject: [PATCH] [PATCH] Detect whether target can use -fprofile-update=atomic

libgcc/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* libgcov-profiler.c: Use __GCC_HAVE_SYNC_COMPARE_AND_SWAP_{4,8} to
	conditionaly enable/disable *_atomic functions.

gcc/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* tree-profile.c (tree_profiling): Detect whether target can use
	-fprofile-update=atomic.

gcc/testsuite/ChangeLog:

2016-09-06  Martin Liska  <mliska@suse.cz>

	* gcc.dg/profile-update-warning.c: New test.
---
 gcc/testsuite/gcc.dg/profile-update-warning.c |  7 ++++++
 gcc/tree-profile.c                            | 35 +++++++++++++++++++++++++++
 libgcc/libgcov-profiler.c                     | 24 ++++++++++++++----
 3 files changed, 61 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/profile-update-warning.c

diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
new file mode 100644
index 0000000..0614fad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
+
+int main(int argc, char *argv[])
+{
+  return 0;
+} /* { dg-warning "target does not support atomic profile update, single mode is selected" } */
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 622869e..69b48e5 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -528,6 +528,20 @@ gimple_gen_ior_profiler (histogram_value value, unsigned tag, unsigned base)
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
+#ifndef HAVE_sync_compare_and_swapsi
+#define HAVE_sync_compare_and_swapsi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapsi
+#define HAVE_atomic_compare_and_swapsi 0
+#endif
+
+#ifndef HAVE_sync_compare_and_swapdi
+#define HAVE_sync_compare_and_swapdi 0
+#endif
+#ifndef HAVE_atomic_compare_and_swapdi
+#define HAVE_atomic_compare_and_swapdi 0
+#endif
+
 /* Profile all functions in the callgraph.  */
 
 static unsigned int
@@ -535,6 +549,27 @@ tree_profiling (void)
 {
   struct cgraph_node *node;
 
+  /* Verify whether we can utilize atomic update operations.  */
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+    {
+      bool can_support = false;
+      unsigned HOST_WIDE_INT gcov_type_size
+	= tree_to_uhwi (TYPE_SIZE_UNIT (get_gcov_type ()));
+      if (gcov_type_size == 4)
+	can_support
+	  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
+      else if (gcov_type_size == 8)
+	can_support
+	  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+
+      if (!can_support)
+      {
+	warning (0, "target does not support atomic profile update, "
+		 "single mode is selected");
+	flag_profile_update = PROFILE_UPDATE_SINGLE;
+      }
+    }
+
   /* This is a small-ipa pass that gets called only once, from
      cgraphunit.c:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 70a821d..887041f 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -24,8 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
 #include "libgcov.h"
+#include "auto-target.h"
 #if !defined(inhibit_libc)
 
+/* Detect whether target can support atomic update of profilers.  */
+#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#define GCOV_SUPPORTS_ATOMIC 0
+#endif
+#endif
+
 #ifdef L_gcov_interval_profiler
 /* If VALUE is in interval <START, START + STEPS - 1>, then increases the
    corresponding counter in COUNTERS.  If the VALUE is above or below
@@ -46,7 +58,7 @@ __gcov_interval_profiler (gcov_type *counters, gcov_type value,
 }
 #endif
 
-#ifdef L_gcov_interval_profiler_atomic
+#if defined(L_gcov_interval_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is in interval <START, START + STEPS - 1>, then increases the
    corresponding counter in COUNTERS.  If the VALUE is above or below
    the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased
@@ -80,7 +92,7 @@ __gcov_pow2_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_pow2_profiler_atomic
+#if defined(L_gcov_pow2_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* If VALUE is a power of two, COUNTERS[1] is incremented.  Otherwise
    COUNTERS[0] is incremented.  Function is thread-safe.  */
 
@@ -134,7 +146,7 @@ __gcov_one_value_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_one_value_profiler_atomic
+#if defined(L_gcov_one_value_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 
 /* Update one value profilers (COUNTERS) for a given VALUE.
 
@@ -342,6 +354,7 @@ __gcov_time_profiler (gcov_type* counters)
     counters[0] = ++function_counter;
 }
 
+#if GCOV_SUPPORTS_ATOMIC
 /* Sets corresponding COUNTERS if there is no value.
    Function is thread-safe.  */
 
@@ -352,6 +365,7 @@ __gcov_time_profiler_atomic (gcov_type* counters)
     counters[0] = __atomic_add_fetch (&function_counter, 1, MEMMODEL_RELAXED);
 }
 #endif
+#endif
 
 
 #ifdef L_gcov_average_profiler
@@ -366,7 +380,7 @@ __gcov_average_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_average_profiler_atomic
+#if defined(L_gcov_average_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* Increase corresponding COUNTER by VALUE.  FIXME: Perhaps we want
    to saturate up.  Function is thread-safe.  */
 
@@ -388,7 +402,7 @@ __gcov_ior_profiler (gcov_type *counters, gcov_type value)
 }
 #endif
 
-#ifdef L_gcov_ior_profiler_atomic
+#if defined(L_gcov_ior_profiler_atomic) && GCOV_SUPPORTS_ATOMIC
 /* Bitwise-OR VALUE into COUNTER.  Function is thread-safe.  */
 
 void
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 13:50                                             ` Martin Liška
@ 2016-09-06 14:06                                               ` Jakub Jelinek
  2016-09-07  7:52                                               ` Christophe Lyon
  2016-09-29  8:31                                               ` Rainer Orth
  2 siblings, 0 replies; 95+ messages in thread
From: Jakub Jelinek @ 2016-09-06 14:06 UTC (permalink / raw)
  To: Martin Liška
  Cc: David Edelsohn, Nathan Sidwell, GCC Patches, Jan Hubicka,
	Andreas Schwab, Richard Biener

On Tue, Sep 06, 2016 at 03:45:09PM +0200, Martin Liška wrote:
> --- a/libgcc/libgcov-profiler.c
> +++ b/libgcc/libgcov-profiler.c
> @@ -24,8 +24,20 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  <http://www.gnu.org/licenses/>.  */
>  
>  #include "libgcov.h"
> +#include "auto-target.h"
>  #if !defined(inhibit_libc)
>  
> +/* Detect whether target can support atomic update of profilers.  */
> +#if LONG_LONG_TYPE_SIZE <= 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
> +#define GCOV_SUPPORTS_ATOMIC 1
> +#else
> +#if LONG_LONG_TYPE_SIZE > 32 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
> +#define GCOV_SUPPORTS_ATOMIC 1
> +#else
> +#define GCOV_SUPPORTS_ATOMIC 0
> +#endif
> +#endif

One more thing.  This is always compiled by gcc, so I think you don't want
to include auto-target.h and use __SIZEOF_LONG_LONG__ == 4 and
__SIZEOF_LONG_LONG__ == 8 tests instead of LONG_LONG_TYPE_SIZE <= 32
or LONG_LONG_TYPE_SIZE > 32.  Ok with that change.

	Jakub

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 13:50                                             ` Martin Liška
  2016-09-06 14:06                                               ` Jakub Jelinek
@ 2016-09-07  7:52                                               ` Christophe Lyon
  2016-09-07  9:35                                                 ` Martin Liška
  2016-09-29  8:31                                               ` Rainer Orth
  2 siblings, 1 reply; 95+ messages in thread
From: Christophe Lyon @ 2016-09-07  7:52 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, David Edelsohn, Nathan Sidwell, GCC Patches,
	Jan Hubicka, Andreas Schwab, Richard Biener

On 6 September 2016 at 15:45, Martin Liška <mliska@suse.cz> wrote:
> On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
>> sizeof (gcov_type) talks about the host gcov type, you want instead the
>> target gcov type.  So
>> TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
>> == 4 vs. 8).
>> As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
>> TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
>> And I wouldn't add gcc_unreachable, just warn for weirdo arches always.
>>
>>       Jakub
>
> Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.
>
> Ready for trunk?
> Martin

Hi Martin,

On targets which do not support atomic profile update, your patch generates a
warning on gcc.dg/tree-prof/val-profiler-threads-1.c, making it fail.

Do we need a new effective-target ?

Christophe

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-07  7:52                                               ` Christophe Lyon
@ 2016-09-07  9:35                                                 ` Martin Liška
  2016-09-07 16:06                                                   ` Christophe Lyon
  2016-09-12 20:20                                                   ` Jeff Law
  0 siblings, 2 replies; 95+ messages in thread
From: Martin Liška @ 2016-09-07  9:35 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: Jakub Jelinek, David Edelsohn, Nathan Sidwell, GCC Patches,
	Jan Hubicka, Andreas Schwab, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

On 09/07/2016 09:45 AM, Christophe Lyon wrote:
> On 6 September 2016 at 15:45, Martin Liška <mliska@suse.cz> wrote:
>> On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
>>> sizeof (gcov_type) talks about the host gcov type, you want instead the
>>> target gcov type.  So
>>> TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
>>> == 4 vs. 8).
>>> As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
>>> TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
>>> And I wouldn't add gcc_unreachable, just warn for weirdo arches always.
>>>
>>>       Jakub
>>
>> Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.
>>
>> Ready for trunk?
>> Martin
> 
> Hi Martin,
> 
> On targets which do not support atomic profile update, your patch generates a
> warning on gcc.dg/tree-prof/val-profiler-threads-1.c, making it fail.
> 
> Do we need a new effective-target ?
> 
> Christophe
> 

Hi.

Thanks for observation, I'm sending a patch that does that.
Can you please test it?

Thanks,
Martin

[-- Attachment #2: 0001-Add-new-effective-target-profile_update_atomic.patch --]
[-- Type: text/x-patch, Size: 2261 bytes --]

From 9a68f2fbf2b5cb547aee7860926c846d5f15d398 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 7 Sep 2016 11:28:13 +0200
Subject: [PATCH] Add new effective target: profile_update_atomic

gcc/testsuite/ChangeLog:

2016-09-07  Martin Liska  <mliska@suse.cz>

	* g++.dg/gcov/gcov-threads-1.C: Use profile_update_atomic
	effective target.
	* gcc.dg/tree-prof/val-profiler-threads-1.c: Likewise.
	* lib/target-supports.exp: Define the new target.
---
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C              | 1 +
 gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c | 2 ++
 gcc/testsuite/lib/target-supports.exp                   | 7 +++++++
 3 files changed, 10 insertions(+)

diff --git a/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
index a4a6f0a..cc9266a 100644
--- a/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
+++ b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
@@ -1,5 +1,6 @@
 /* { dg-options "-fprofile-arcs -ftest-coverage -pthread -fprofile-update=atomic" } */
 /* { dg-do run { target native } } */
+/* { dg-require-effective-target profile_update_atomic } */
 
 #include <stdint.h>
 #include <pthread.h>
diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
index e9b04a0..95d6ee3 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
@@ -1,4 +1,6 @@
 /* { dg-options "-O0 -pthread -fprofile-update=atomic" } */
+/* { dg-require-effective-target profile_update_atomic } */
+
 #include <pthread.h>
 
 #define NUM_THREADS	8
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 545b3dc..6724a7f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7699,3 +7699,10 @@ proc check_effective_target_offload_hsa { } {
 	int main () {return 0;}
     } "-foffload=hsa" ]
 }
+
+# Return 1 if the target support -fprofile-update=atomic
+proc check_effective_target_profile_update_atomic {} {
+    return [check_no_compiler_messages profile_update_atomic assembly {
+	int main (void) { return 0; }
+    } "-fprofile-update=atomic -fprofile-generate"]
+}
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-18 16:06                                     ` Richard Biener
@ 2016-09-07 11:41                                       ` Martin Liška
       [not found]                                         ` <CAFiYyc0UaSzXhZmyG9QRkHGT4JFowxBfE2yb-NvXE=hR1xafdA@mail.gmail.com>
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-09-07 11:41 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek, Andi Kleen
  Cc: Jeff Law, Nathan Sidwell, gcc-patches, jh

On 08/18/2016 06:06 PM, Richard Biener wrote:
> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>> The best proxy we have for that is -pthread.
>>>>
>>>> Is it slower, most definitely, but odds are we're giving folks
>>>> garbage data otherwise, which in many ways is even worse.
>>>
>>> It will likely be catastrophically slower in some cases. 
>>>
>>> Catastrophically as in too slow to be usable.
>>>
>>> An atomic instruction is a lot more expensive than a single
>> increment. Also
>>> they sometimes are really slow depending on the state of the machine.
>>
>> Can't we just have thread-local copies of all the counters (perhaps
>> using
>> __thread pointer as base) and just atomically merge at thread
>> termination?
> 
> I suggested that as well but of course it'll have its own class of issues (short lived threads, so we need to somehow re-use counters from terminated threads, large number of threads and thus using too much memory for the counters)
> 
> Richard.

Hello.

I've got written the approach on my TODO list, let's see whether it would be doable in a reasonable amount of time.

I've just finished some measurements to illustrate slow-down of -fprofile-update=atomic approach.
All numbers are: no profile, -fprofile-generate, -fprofile-generate -fprofile-update=atomic
c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
tramp3d (1 thread, -O3): 18.0, 46.6, 168s

So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having much experience with default option
selection, but these numbers can probably help.

Thoughts?
Martin

> 
>> 	Jakub
> 
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-07  9:35                                                 ` Martin Liška
@ 2016-09-07 16:06                                                   ` Christophe Lyon
  2016-09-12 20:20                                                   ` Jeff Law
  1 sibling, 0 replies; 95+ messages in thread
From: Christophe Lyon @ 2016-09-07 16:06 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, David Edelsohn, Nathan Sidwell, GCC Patches,
	Jan Hubicka, Andreas Schwab, Richard Biener

On 7 September 2016 at 11:34, Martin Liška <mliska@suse.cz> wrote:
> On 09/07/2016 09:45 AM, Christophe Lyon wrote:
>> On 6 September 2016 at 15:45, Martin Liška <mliska@suse.cz> wrote:
>>> On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
>>>> sizeof (gcov_type) talks about the host gcov type, you want instead the
>>>> target gcov type.  So
>>>> TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
>>>> == 4 vs. 8).
>>>> As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
>>>> TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
>>>> And I wouldn't add gcc_unreachable, just warn for weirdo arches always.
>>>>
>>>>       Jakub
>>>
>>> Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.
>>>
>>> Ready for trunk?
>>> Martin
>>
>> Hi Martin,
>>
>> On targets which do not support atomic profile update, your patch generates a
>> warning on gcc.dg/tree-prof/val-profiler-threads-1.c, making it fail.
>>
>> Do we need a new effective-target ?
>>
>> Christophe
>>
>
> Hi.
>
> Thanks for observation, I'm sending a patch that does that.
> Can you please test it?
>
It does work indeed, thanks.
(tested on arm* targets)

Christophe

> Thanks,
> Martin

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-07  9:35                                                 ` Martin Liška
  2016-09-07 16:06                                                   ` Christophe Lyon
@ 2016-09-12 20:20                                                   ` Jeff Law
  1 sibling, 0 replies; 95+ messages in thread
From: Jeff Law @ 2016-09-12 20:20 UTC (permalink / raw)
  To: Martin Liška, Christophe Lyon
  Cc: Jakub Jelinek, David Edelsohn, Nathan Sidwell, GCC Patches,
	Jan Hubicka, Andreas Schwab, Richard Biener

On 09/07/2016 03:34 AM, Martin Liška wrote:
> On 09/07/2016 09:45 AM, Christophe Lyon wrote:
>> > On 6 September 2016 at 15:45, Martin Liška <mliska@suse.cz> wrote:
>>> >> On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
>>>> >>> sizeof (gcov_type) talks about the host gcov type, you want instead the
>>>> >>> target gcov type.  So
>>>> >>> TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
>>>> >>> == 4 vs. 8).
>>>> >>> As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
>>>> >>> TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
>>>> >>> And I wouldn't add gcc_unreachable, just warn for weirdo arches always.
>>>> >>>
>>>> >>>       Jakub
>>> >>
>>> >> Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.
>>> >>
>>> >> Ready for trunk?
>>> >> Martin
>> >
>> > Hi Martin,
>> >
>> > On targets which do not support atomic profile update, your patch generates a
>> > warning on gcc.dg/tree-prof/val-profiler-threads-1.c, making it fail.
>> >
>> > Do we need a new effective-target ?
>> >
>> > Christophe
>> >
> Hi.
>
> Thanks for observation, I'm sending a patch that does that.
> Can you please test it?
>
> Thanks,
> Martin
>
>
> 0001-Add-new-effective-target-profile_update_atomic.patch
>
>
> From 9a68f2fbf2b5cb547aee7860926c846d5f15d398 Mon Sep 17 00:00:00 2001
> From: marxin <mliska@suse.cz>
> Date: Wed, 7 Sep 2016 11:28:13 +0200
> Subject: [PATCH] Add new effective target: profile_update_atomic
>
> gcc/testsuite/ChangeLog:
>
> 2016-09-07  Martin Liska  <mliska@suse.cz>
>
> 	* g++.dg/gcov/gcov-threads-1.C: Use profile_update_atomic
> 	effective target.
> 	* gcc.dg/tree-prof/val-profiler-threads-1.c: Likewise.
> 	* lib/target-supports.exp: Define the new target.
OK.
jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [RFC] Speed-up -fprofile-update=atomic
       [not found]                                         ` <CAFiYyc0UaSzXhZmyG9QRkHGT4JFowxBfE2yb-NvXE=hR1xafdA@mail.gmail.com>
@ 2016-09-15 10:18                                           ` Martin Liška
  2016-10-04  9:45                                             ` Richard Biener
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-09-15 10:18 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

[-- Attachment #1: Type: text/plain, Size: 3361 bytes --]

On 09/07/2016 02:09 PM, Richard Biener wrote:
> On Wed, Sep 7, 2016 at 1:37 PM, Martin Liška <mliska@suse.cz> wrote:
>> On 08/18/2016 06:06 PM, Richard Biener wrote:
>>> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>>>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>>> The best proxy we have for that is -pthread.
>>>>>>
>>>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>>> garbage data otherwise, which in many ways is even worse.
>>>>>
>>>>> It will likely be catastrophically slower in some cases.
>>>>>
>>>>> Catastrophically as in too slow to be usable.
>>>>>
>>>>> An atomic instruction is a lot more expensive than a single
>>>> increment. Also
>>>>> they sometimes are really slow depending on the state of the machine.
>>>>
>>>> Can't we just have thread-local copies of all the counters (perhaps
>>>> using
>>>> __thread pointer as base) and just atomically merge at thread
>>>> termination?
>>>
>>> I suggested that as well but of course it'll have its own class of issues (short lived threads, so we need to somehow re-use counters from terminated threads, large number of threads and thus using too much memory for the counters)
>>>
>>> Richard.
>>
>> Hello.
>>
>> I've got written the approach on my TODO list, let's see whether it would be doable in a reasonable amount of time.
>>
>> I've just finished some measurements to illustrate slow-down of -fprofile-update=atomic approach.
>> All numbers are: no profile, -fprofile-generate, -fprofile-generate -fprofile-update=atomic
>> c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
>> unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
>> tramp3d (1 thread, -O3): 18.0, 46.6, 168s
>>
>> So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having much experience with default option
>> selection, but these numbers can probably help.
>>
>> Thoughts?
> 
> Look at the generated code for an instrumented simple loop and see that for
> the non-atomic updates we happily apply store-motion to the counter update
> and thus we only get one counter update per loop exit rather than one per
> loop iteration.  Now see what happens for the atomic case (I suspect you
> get one per iteration).
> 
> I'll bet this accounts for most of the slowdown.
> 
> Back in time ICC which had atomic counter updates (but using function
> calls - ugh!) had a > 1000% overhead with FDO for tramp3d (they also
> didn't have early inlining -- removing abstraction helps reducing the number
> of counters significantly).
> 
> Richard.

Hi.

During Cauldron I discussed with Richi approaches how to speed-up ARCS
profile counter updates. My first attempt is to utilize TLS storage, where
every function is accumulating arcs counters. These are eventually added
(using atomic operations) to the global one at the very end of a function.
Currently I rely on target support of TLS, which is questionable whether
to have such a requirement for -fprofile-update=atomic, or to add a new option value
like -fprofile-update=atomic-tls?

Running the patch on tramp3d, compared to previous numbers, it takes 88s to finish.
Time shrinks to 50%, compared to the current implementation.

Thoughts?
Martin

> 
>> Martin
>>
>>>
>>>>      Jakub
>>>
>>>
>>


[-- Attachment #2: 0001-Improve-implementation-of-fprofile-update-atomic.patch --]
[-- Type: text/x-patch, Size: 10419 bytes --]

From 91b5342c422950b32d1ba7d616bda418c7993a84 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Thu, 15 Sep 2016 09:49:41 +0200
Subject: [PATCH] Improve implementation of -fprofile-update=atomic

---
 gcc/coverage.c     | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 gcc/coverage.h     |  6 ++--
 gcc/profile.c      |  2 ++
 gcc/tree-profile.c | 45 +++++++++----------------
 4 files changed, 114 insertions(+), 38 deletions(-)

diff --git a/gcc/coverage.c b/gcc/coverage.c
index a6a888a..1e0052f 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -48,6 +48,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "params.h"
 #include "auto-profile.h"
+#include "varasm.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "tree-vrp.h"
+#include "tree-ssanames.h"
 
 #include "gcov-io.c"
 
@@ -93,6 +98,9 @@ static GTY(()) tree fn_v_ctrs[GCOV_COUNTERS];   /* counter variables.  */
 static unsigned fn_n_ctrs[GCOV_COUNTERS]; /* Counters allocated.  */
 static unsigned fn_b_ctrs[GCOV_COUNTERS]; /* Allocation base.  */
 
+/* Thread-local storage variable of ARCS counters.  */
+static GTY(()) tree arcs_tls_ctr;
+
 /* Coverage info VAR_DECL and function info type nodes.  */
 static GTY(()) tree gcov_info_var;
 static GTY(()) tree gcov_fn_info_type;
@@ -127,7 +135,8 @@ static const char *const ctr_names[GCOV_COUNTERS] = {
 
 /* Forward declarations.  */
 static void read_counts_file (void);
-static tree build_var (tree, tree, int);
+static tree build_var (tree fn_decl, tree type, int counter,
+		       bool is_tls = false);
 static void build_fn_info_type (tree, unsigned, tree);
 static void build_info_type (tree, tree);
 static tree build_fn_info (const struct coverage_data *, tree, tree);
@@ -442,6 +451,10 @@ coverage_counter_alloc (unsigned counter, unsigned num)
 
       fn_v_ctrs[counter]
 	= build_var (current_function_decl, array_type, counter);
+
+      if (counter == GCOV_COUNTER_ARCS)
+	arcs_tls_ctr = build_var (current_function_decl, array_type, counter,
+				  true);
     }
 
   fn_b_ctrs[counter] = fn_n_ctrs[counter];
@@ -454,7 +467,8 @@ coverage_counter_alloc (unsigned counter, unsigned num)
 /* Generate a tree to access COUNTER NO.  */
 
 tree
-tree_coverage_counter (unsigned counter, unsigned no, coverage_usage_type type)
+tree_coverage_counter (unsigned counter, unsigned no, coverage_usage_type type,
+		       bool is_tls)
 {
   tree gcov_type_node = get_gcov_type ();
 
@@ -463,7 +477,8 @@ tree_coverage_counter (unsigned counter, unsigned no, coverage_usage_type type)
   no += fn_b_ctrs[counter];
   
   /* "no" here is an array index, scaled to bytes later.  */
-  tree v = build4 (ARRAY_REF, gcov_type_node, fn_v_ctrs[counter],
+  tree array = is_tls ? arcs_tls_ctr : fn_v_ctrs[counter];
+  tree v = build4 (ARRAY_REF, gcov_type_node, array,
 		   build_int_cst (integer_type_node, no), NULL, NULL);
 
   if (type == COVERAGE_ADDR)
@@ -471,6 +486,59 @@ tree_coverage_counter (unsigned counter, unsigned no, coverage_usage_type type)
 
   return v;
 }
+
+/* Generate profile counter update code emission at the end of a function.  */
+
+void
+generate_arcs_tls_update (void)
+{
+  edge_iterator ei;
+  edge e;
+
+  if (flag_profile_update != PROFILE_UPDATE_ATOMIC)
+    return;
+
+  unsigned counter_count = fn_n_ctrs[GCOV_COUNTER_ARCS];
+  size_t mode_size = LONG_LONG_TYPE_SIZE > 32 ? 8 : 4;
+
+  tree atomic_add_fn = builtin_decl_explicit (mode_size == 8
+					      ? BUILT_IN_ATOMIC_FETCH_ADD_8:
+					      BUILT_IN_ATOMIC_FETCH_ADD_4);
+  /* Zero the TLS ARCS counters.  */
+  basic_block entry_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+  FOR_EACH_EDGE (e, ei, entry_bb->succs)
+    {
+      size_t size = counter_count * mode_size;
+      tree fn = builtin_decl_explicit (BUILT_IN_BZERO);
+      tree addr = tree_coverage_counter (GCOV_COUNTER_ARCS, 0, COVERAGE_ADDR);
+      gcall *call = gimple_build_call (fn, 2, addr,
+				       build_int_cst  (integer_type_node, size));
+      gsi_insert_on_edge (e, call);
+    }
+
+  /* Update global counters with the values stored in TLS.  */
+  basic_block exit_bb = EXIT_BLOCK_PTR_FOR_FN (cfun);
+  FOR_EACH_EDGE (e, ei, exit_bb->preds)
+    {
+      for (unsigned i = 0; i < counter_count; i++)
+	{
+	  tree ref = tree_coverage_counter (GCOV_COUNTER_ARCS, i, COVERAGE_REF,
+					    true);
+	  tree gcov_tmp_var = make_temp_ssa_name (get_gcov_type (),
+						  NULL, "PROF_edge_counter");
+	  gassign *stmt = gimple_build_assign (gcov_tmp_var, ref);
+	  gsi_insert_on_edge (e, stmt);
+	  tree addr = tree_coverage_counter (GCOV_COUNTER_ARCS, i,
+					     COVERAGE_ADDR);
+
+	  /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
+	  gcall *call = gimple_build_call (atomic_add_fn, 3, addr, gcov_tmp_var,
+					   build_int_cst (integer_type_node,
+							  MEMMODEL_RELAXED));
+	  gsi_insert_on_edge (e, call);
+	}
+    }
+}
 \f
 
 /* Generate a checksum for a string.  CHKSUM is the current
@@ -706,6 +774,14 @@ coverage_end_function (unsigned lineno_checksum, unsigned cfg_checksum)
 	      DECL_SIZE (var) = TYPE_SIZE (array_type);
 	      DECL_SIZE_UNIT (var) = TYPE_SIZE_UNIT (array_type);
 	      varpool_node::finalize_decl (var);
+
+	      if (i == GCOV_COUNTER_ARCS)
+		{
+		  TREE_TYPE (arcs_tls_ctr) = array_type;
+		  DECL_SIZE (arcs_tls_ctr) = TYPE_SIZE (array_type);
+		  DECL_SIZE_UNIT (arcs_tls_ctr) = TYPE_SIZE_UNIT (array_type);
+		  varpool_node::finalize_decl (arcs_tls_ctr);
+		}
 	    }
 	  
 	  fn_b_ctrs[i] = fn_n_ctrs[i] = 0;
@@ -717,10 +793,12 @@ coverage_end_function (unsigned lineno_checksum, unsigned cfg_checksum)
 }
 
 /* Build a coverage variable of TYPE for function FN_DECL.  If COUNTER
-   >= 0 it is a counter array, otherwise it is the function structure.  */
+   >= 0 it is a counter array, otherwise it is the function structure.
+   If IS_TLS is true, the newly added variable will live in thread-local
+   storage.  */
 
 static tree
-build_var (tree fn_decl, tree type, int counter)
+build_var (tree fn_decl, tree type, int counter, bool is_tls)
 {
   tree var = build_decl (BUILTINS_LOCATION, VAR_DECL, NULL_TREE, type);
   const char *fn_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fn_decl));
@@ -729,13 +807,18 @@ build_var (tree fn_decl, tree type, int counter)
 
   fn_name = targetm.strip_name_encoding (fn_name);
   fn_name_len = strlen (fn_name);
-  buf = XALLOCAVEC (char, fn_name_len + 8 + sizeof (int) * 3);
+  buf = XALLOCAVEC (char, fn_name_len + 13 + sizeof (int) * 3);
 
   if (counter < 0)
     strcpy (buf, "__gcov__");
   else
     sprintf (buf, "__gcov%u_", counter);
+
+  len = strlen (buf);
+  if (is_tls)
+    strcpy (buf + len, "tls_");
   len = strlen (buf);
+
   buf[len - 1] = symbol_table::symbol_suffix_separator ();
   memcpy (buf + len, fn_name, fn_name_len + 1);
   DECL_NAME (var) = get_identifier (buf);
@@ -744,6 +827,10 @@ build_var (tree fn_decl, tree type, int counter)
   DECL_NONALIASED (var) = 1;
   SET_DECL_ALIGN (var, TYPE_ALIGN (type));
 
+  // TODO
+  if (is_tls && targetm.have_tls)
+    set_decl_tls_model (var, decl_default_tls_model (var));
+
   return var;
 }
 
diff --git a/gcc/coverage.h b/gcc/coverage.h
index 253aca4..49a332a 100644
--- a/gcc/coverage.h
+++ b/gcc/coverage.h
@@ -53,8 +53,10 @@ enum coverage_usage_type
 };
 
 /* Use a counter from the most recent allocation.  */
-extern tree tree_coverage_counter (unsigned, unsigned, coverage_usage_type);
-
+extern tree tree_coverage_counter (unsigned counter, unsigned no,
+				   coverage_usage_type type,
+				   bool is_tls = false);
+extern void generate_arcs_tls_update (void);
 /* Get all the counters for the current function.  */
 extern gcov_type *get_coverage_counts (unsigned /*counter*/,
 				       unsigned /*expected*/,
diff --git a/gcc/profile.c b/gcc/profile.c
index 4519e7d..de20293 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -143,6 +143,8 @@ instrument_edges (struct edge_list *el)
 	}
     }
 
+  generate_arcs_tls_update ();
+
   total_num_blocks_created += num_edges;
   if (dump_file)
     fprintf (dump_file, "%d edges instrumented\n", num_instr_edges);
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 7b69338..a43a117 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -255,36 +255,21 @@ gimple_gen_edge_profiler (int edgeno, edge e)
 
   one = build_int_cst (gcov_type_node, 1);
 
-  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
-    {
-      /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
-      tree addr = tree_coverage_counter (GCOV_COUNTER_ARCS, edgeno,
-					 COVERAGE_ADDR);
-      tree f = builtin_decl_explicit (LONG_LONG_TYPE_SIZE > 32
-				      ? BUILT_IN_ATOMIC_FETCH_ADD_8:
-				      BUILT_IN_ATOMIC_FETCH_ADD_4);
-      gcall *stmt = gimple_build_call (f, 3, addr, one,
-				       build_int_cst (integer_type_node,
-						      MEMMODEL_RELAXED));
-      gsi_insert_on_edge (e, stmt);
-    }
-  else
-    {
-      tree ref = tree_coverage_counter (GCOV_COUNTER_ARCS, edgeno,
-					COVERAGE_REF);
-      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-						   NULL, "PROF_edge_counter");
-      gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
-      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					      NULL, "PROF_edge_counter");
-      gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
-					    gimple_assign_lhs (stmt1), one);
-      gassign *stmt3 = gimple_build_assign (unshare_expr (ref),
-					    gimple_assign_lhs (stmt2));
-      gsi_insert_on_edge (e, stmt1);
-      gsi_insert_on_edge (e, stmt2);
-      gsi_insert_on_edge (e, stmt3);
-    }
+  tree ref
+    = tree_coverage_counter (GCOV_COUNTER_ARCS, edgeno, COVERAGE_REF,
+			     flag_profile_update == PROFILE_UPDATE_ATOMIC);
+  tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
+					       NULL, "PROF_edge_counter");
+  gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
+  gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
+					  NULL, "PROF_edge_counter");
+  gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
+					gimple_assign_lhs (stmt1), one);
+  gassign *stmt3 = gimple_build_assign (unshare_expr (ref),
+					gimple_assign_lhs (stmt2));
+  gsi_insert_on_edge (e, stmt1);
+  gsi_insert_on_edge (e, stmt2);
+  gsi_insert_on_edge (e, stmt3);
 }
 
 /* Emits code to get VALUE to instrument at GSI, and returns the
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Detect whether target can use -fprofile-update=atomic
  2016-09-06 13:50                                             ` Martin Liška
  2016-09-06 14:06                                               ` Jakub Jelinek
  2016-09-07  7:52                                               ` Christophe Lyon
@ 2016-09-29  8:31                                               ` Rainer Orth
  2 siblings, 0 replies; 95+ messages in thread
From: Rainer Orth @ 2016-09-29  8:31 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, David Edelsohn, Nathan Sidwell, GCC Patches,
	Jan Hubicka, Andreas Schwab, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]

Hi Martin,

> 2016-09-06  Martin Liska  <mliska@suse.cz>
>
> 	* gcc.dg/profile-update-warning.c: New test.
[...]
> diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
> new file mode 100644
> index 0000000..0614fad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
> +/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
> +
> +int main(int argc, char *argv[])
> +{
> +  return 0;
> +} /* { dg-warning "target does not support atomic profile update, single mode is selected" } */

this test FAILs on 32-bit-default x86 configurations like
i386-pc-solaris2.* and i686-pc-linux-gnu for the 64-bit multilib:

FAIL: gcc.dg/profile-update-warning.c  (test for warnings, line 7)
FAIL: gcc.dg/profile-update-warning.c (test for excess errors)

Excess errors:
cc1: error: CPU you selected does not support x86-64 instruction set
cc1: error: CPU you selected does not support x86-64 instruction set

What happens here is that -m64 is added after -march=i386 -m32, causing
the error above.  This doesn't happen for 64-bit-default targets: in the
64-bit case there's just the -m32 from the testcase, for the 32-bit
multilib just another -m32 is added, so the 32-bit case ist tested
twice.

Fixed like this, tested on i386-pc-solaris2.12 and x86_64-pc-linux-gnu,
installed.

	Rainer


2016-09-28  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	* gcc.dg/profile-update-warning.c: Restrict to ia32.
	(dg-options): Remove -m32.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: testsuite-profile-update-warning.patch --]
[-- Type: text/x-patch, Size: 677 bytes --]

# HG changeset patch
# Parent  b421ce4362a3675d9fc4aa98f23e3bfc85b5c4c6
Fix 64-bit gcc.dg/profile-update-warning.c

diff --git a/gcc/testsuite/gcc.dg/profile-update-warning.c b/gcc/testsuite/gcc.dg/profile-update-warning.c
--- a/gcc/testsuite/gcc.dg/profile-update-warning.c
+++ b/gcc/testsuite/gcc.dg/profile-update-warning.c
@@ -1,5 +1,5 @@
-/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
-/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386 -m32" } */
+/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && ia32 } } } */
+/* { dg-options "-fprofile-update=atomic -fprofile-generate -march=i386" } */
 
 int main(int argc, char *argv[])
 {

[-- Attachment #3: Type: text/plain, Size: 143 bytes --]


-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-08-18 15:53                                   ` Jeff Law
@ 2016-10-03 12:13                                     ` Martin Liška
  2016-10-03 12:26                                       ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-10-03 12:13 UTC (permalink / raw)
  To: Jeff Law, Andi Kleen; +Cc: Nathan Sidwell, gcc-patches, jh

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On 08/18/2016 05:53 PM, Jeff Law wrote:
> On 08/18/2016 09:51 AM, Andi Kleen wrote:
>>> I'd prefer to make updates atomic in multi-threaded applications.
>>> The best proxy we have for that is -pthread.
>>>
>>> Is it slower, most definitely, but odds are we're giving folks
>>> garbage data otherwise, which in many ways is even worse.
>>
>> It will likely be catastrophically slower in some cases.
>>
>> Catastrophically as in too slow to be usable.
>>
>> An atomic instruction is a lot more expensive than a single increment. Also
>> they sometimes are really slow depending on the state of the machine.
> And for those cases there's a way to override.
> 
> The default should be set for correctness.
> 
> jeff

I would to somehow resolve the discussion related to default value selection.
Is the prevailing consensus that we should set -fprofile-update=atomic when
-pthread is set? If so, I'll prepare a patch. I tend to do it this way.

Moreover, I also have a patch that provides a warning, which can be also useful
even though we would change the default behavior:

$ ./xgcc -B. /tmp/a.c -fprofile-update=single -pthread -fprofile-generate
xgcc: warning: -profile-update=atomic should be used to generate a valid profile for a multithreaded application

Ideas?
Martin


[-- Attachment #2: 0001-Warn-about-fprofile-update-single-and-pthread.patch --]
[-- Type: text/x-patch, Size: 3719 bytes --]

From d5a8097dd07d1a3f4263da7ccad970543d92f3e9 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 3 Oct 2016 14:02:14 +0200
Subject: [PATCH] Warn about -fprofile-update=single and -pthread

gcc/ChangeLog:

2016-10-03  Martin Liska  <mliska@suse.cz>

	* common.opt: Mark couple of flags with 'Driver' keyword.
	* gcc.c (driver_handle_option): Handle these options.
	(process_command): Generate the warning.
---
 gcc/common.opt |  8 ++++----
 gcc/gcc.c      | 31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 0e01577..3af9c64 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1920,7 +1920,7 @@ Common Report Var(profile_flag)
 Enable basic program profiling code.
 
 fprofile-arcs
-Common Report Var(profile_arc_flag)
+Common Driver Report Var(profile_arc_flag)
 Insert arc-based program profiling code.
 
 fprofile-dir=
@@ -1933,7 +1933,7 @@ Common Report Var(flag_profile_correction)
 Enable correction of flow inconsistent profile data input.
 
 fprofile-update=
-Common Joined RejectNegative Enum(profile_update) Var(flag_profile_update) Init(PROFILE_UPDATE_SINGLE)
+Common Driver Joined RejectNegative Enum(profile_update) Var(flag_profile_update) Init(PROFILE_UPDATE_SINGLE)
 -fprofile-update=[single|atomic]	Set the profile update method.
 
 Enum
@@ -1946,11 +1946,11 @@ EnumValue
 Enum(profile_update) String(atomic) Value(PROFILE_UPDATE_ATOMIC)
 
 fprofile-generate
-Common
+Common Driver
 Enable common options for generating profile info for profile feedback directed optimizations.
 
 fprofile-generate=
-Common Joined RejectNegative
+Common Driver Joined RejectNegative
 Enable common options for generating profile info for profile feedback directed optimizations, and set -fprofile-dir=.
 
 fprofile-use
diff --git a/gcc/gcc.c b/gcc/gcc.c
index d3e8c88..b023013 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -233,6 +233,16 @@ static int print_subprocess_help;
 /* Linker suffix passed to -fuse-ld=... */
 static const char *use_ld;
 
+/* Flag indicating whether pthread is provided as a command line option.  */
+static bool pthread_set = false;
+
+/* Flag indicating whether profiling is enabled by an option  */
+static bool profiling_enabled = false;
+
+/* Flag indicating whether profile-update=atomic is provided as a command
+   line option.  */
+static bool profile_update_atomic = false;
+
 /* Whether we should report subprocess execution times to a file.  */
 
 FILE *report_times_to_file = NULL;
@@ -4112,6 +4122,22 @@ driver_handle_option (struct gcc_options *opts,
       handle_foffload_option (arg);
       break;
 
+    case OPT_fprofile_update_:
+      if ((profile_update)value == PROFILE_UPDATE_ATOMIC)
+	profile_update_atomic = true;
+      break;
+
+    case OPT_pthread:
+      pthread_set = true;
+      break;
+
+    case OPT_fprofile_generate:
+    case OPT_fprofile_generate_:
+    case OPT_fprofile_arcs:
+    case OPT_coverage:
+      profiling_enabled = true;
+      break;
+
     default:
       /* Various driver options need no special processing at this
 	 point, having been handled in a prescan above or being
@@ -4580,6 +4606,11 @@ process_command (unsigned int decoded_options_count,
       add_infile ("help-dummy", "c");
     }
 
+  /* Warn about multi-threaded program that do not use -profile=atomic.  */
+  if (profiling_enabled && pthread_set && !profile_update_atomic)
+    warning (0, "-profile-update=atomic should be used to generate a valid"
+	     " profile for a multithreaded application");
+
   /* Decide if undefined variable references are allowed in specs.  */
 
   /* --version and --help alone or together are safe.  Note that -v would
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-10-03 12:13                                     ` Martin Liška
@ 2016-10-03 12:26                                       ` Nathan Sidwell
  2016-10-03 16:46                                         ` Jeff Law
  2016-10-04 12:05                                         ` Martin Liška
  0 siblings, 2 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-10-03 12:26 UTC (permalink / raw)
  To: Martin Liška, Jeff Law, Andi Kleen; +Cc: gcc-patches, jh

On 10/03/16 08:13, Martin Liška wrote:
> On 08/18/2016 05:53 PM, Jeff Law wrote:
>> On 08/18/2016 09:51 AM, Andi Kleen wrote:
>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>> The best proxy we have for that is -pthread.
>>>>
>>>> Is it slower, most definitely, but odds are we're giving folks
>>>> garbage data otherwise, which in many ways is even worse.
>>>
>>> It will likely be catastrophically slower in some cases.
>>>
>>> Catastrophically as in too slow to be usable.
>>>
>>> An atomic instruction is a lot more expensive than a single increment. Also
>>> they sometimes are really slow depending on the state of the machine.
>> And for those cases there's a way to override.
>>
>> The default should be set for correctness.
>>
>> jeff
>
> I would to somehow resolve the discussion related to default value selection.
> Is the prevailing consensus that we should set -fprofile-update=atomic when
> -pthread is set? If so, I'll prepare a patch. I tend to do it this way.

This is my preference.

nathan

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-10-03 12:26                                       ` Nathan Sidwell
@ 2016-10-03 16:46                                         ` Jeff Law
  2016-10-03 17:52                                           ` Andi Kleen
  2016-10-04 12:05                                         ` Martin Liška
  1 sibling, 1 reply; 95+ messages in thread
From: Jeff Law @ 2016-10-03 16:46 UTC (permalink / raw)
  To: Nathan Sidwell, Martin Liška, Andi Kleen; +Cc: gcc-patches, jh

On 10/03/2016 06:26 AM, Nathan Sidwell wrote:
> On 10/03/16 08:13, Martin Liška wrote:
>> On 08/18/2016 05:53 PM, Jeff Law wrote:
>>> On 08/18/2016 09:51 AM, Andi Kleen wrote:
>>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>> The best proxy we have for that is -pthread.
>>>>>
>>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>> garbage data otherwise, which in many ways is even worse.
>>>>
>>>> It will likely be catastrophically slower in some cases.
>>>>
>>>> Catastrophically as in too slow to be usable.
>>>>
>>>> An atomic instruction is a lot more expensive than a single
>>>> increment. Also
>>>> they sometimes are really slow depending on the state of the machine.
>>> And for those cases there's a way to override.
>>>
>>> The default should be set for correctness.
>>>
>>> jeff
>>
>> I would to somehow resolve the discussion related to default value
>> selection.
>> Is the prevailing consensus that we should set -fprofile-update=atomic
>> when
>> -pthread is set? If so, I'll prepare a patch. I tend to do it this way.
>
> This is my preference.
Likewise.
jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-10-03 16:46                                         ` Jeff Law
@ 2016-10-03 17:52                                           ` Andi Kleen
  0 siblings, 0 replies; 95+ messages in thread
From: Andi Kleen @ 2016-10-03 17:52 UTC (permalink / raw)
  To: Jeff Law; +Cc: Nathan Sidwell, Martin Liška, Andi Kleen, gcc-patches, jh

> >>I would to somehow resolve the discussion related to default value
> >>selection.
> >>Is the prevailing consensus that we should set -fprofile-update=atomic
> >>when
> >>-pthread is set? If so, I'll prepare a patch. I tend to do it this way.
> >
> >This is my preference.
> Likewise.

I still think it shouldn't be default even with -pthread because it could dramatically
degrade performance in these cases. People likely have -pthread in their Makefiles
without realizing it. Such changes should be explict opt-in.

Often severe performance decreases lead to incorrectness in practice
("is now too slow to finish training workload in rebuild cycle") 

-Andi

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC] Speed-up -fprofile-update=atomic
  2016-09-15 10:18                                           ` [RFC] Speed-up -fprofile-update=atomic Martin Liška
@ 2016-10-04  9:45                                             ` Richard Biener
  2016-10-12 13:53                                               ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Richard Biener @ 2016-10-04  9:45 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

On Thu, Sep 15, 2016 at 12:00 PM, Martin Liška <mliska@suse.cz> wrote:
> On 09/07/2016 02:09 PM, Richard Biener wrote:
>> On Wed, Sep 7, 2016 at 1:37 PM, Martin Liška <mliska@suse.cz> wrote:
>>> On 08/18/2016 06:06 PM, Richard Biener wrote:
>>>> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>>>>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>>>> The best proxy we have for that is -pthread.
>>>>>>>
>>>>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>>>> garbage data otherwise, which in many ways is even worse.
>>>>>>
>>>>>> It will likely be catastrophically slower in some cases.
>>>>>>
>>>>>> Catastrophically as in too slow to be usable.
>>>>>>
>>>>>> An atomic instruction is a lot more expensive than a single
>>>>> increment. Also
>>>>>> they sometimes are really slow depending on the state of the machine.
>>>>>
>>>>> Can't we just have thread-local copies of all the counters (perhaps
>>>>> using
>>>>> __thread pointer as base) and just atomically merge at thread
>>>>> termination?
>>>>
>>>> I suggested that as well but of course it'll have its own class of issues (short lived threads, so we need to somehow re-use counters from terminated threads, large number of threads and thus using too much memory for the counters)
>>>>
>>>> Richard.
>>>
>>> Hello.
>>>
>>> I've got written the approach on my TODO list, let's see whether it would be doable in a reasonable amount of time.
>>>
>>> I've just finished some measurements to illustrate slow-down of -fprofile-update=atomic approach.
>>> All numbers are: no profile, -fprofile-generate, -fprofile-generate -fprofile-update=atomic
>>> c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
>>> unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
>>> tramp3d (1 thread, -O3): 18.0, 46.6, 168s
>>>
>>> So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having much experience with default option
>>> selection, but these numbers can probably help.
>>>
>>> Thoughts?
>>
>> Look at the generated code for an instrumented simple loop and see that for
>> the non-atomic updates we happily apply store-motion to the counter update
>> and thus we only get one counter update per loop exit rather than one per
>> loop iteration.  Now see what happens for the atomic case (I suspect you
>> get one per iteration).
>>
>> I'll bet this accounts for most of the slowdown.
>>
>> Back in time ICC which had atomic counter updates (but using function
>> calls - ugh!) had a > 1000% overhead with FDO for tramp3d (they also
>> didn't have early inlining -- removing abstraction helps reducing the number
>> of counters significantly).
>>
>> Richard.
>
> Hi.
>
> During Cauldron I discussed with Richi approaches how to speed-up ARCS
> profile counter updates. My first attempt is to utilize TLS storage, where
> every function is accumulating arcs counters. These are eventually added
> (using atomic operations) to the global one at the very end of a function.
> Currently I rely on target support of TLS, which is questionable whether
> to have such a requirement for -fprofile-update=atomic, or to add a new option value
> like -fprofile-update=atomic-tls?
>
> Running the patch on tramp3d, compared to previous numbers, it takes 88s to finish.
> Time shrinks to 50%, compared to the current implementation.
>
> Thoughts?

Hmm, I thought I suggested that you can simply use automatic storage
(which effectively
is TLS...) for regions that are not forked or abnormally left (which
means SESE regions
that have no calls that eventually terminate or throw externally).

So why did you end up with TLS?

Richard.

> Martin
>
>>
>>> Martin
>>>
>>>>
>>>>>      Jakub
>>>>
>>>>
>>>
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-10-03 12:26                                       ` Nathan Sidwell
  2016-10-03 16:46                                         ` Jeff Law
@ 2016-10-04 12:05                                         ` Martin Liška
  2016-10-05 17:54                                           ` Jeff Law
  2016-10-13 15:34                                           ` [PATCH] Introduce -fprofile-update=maybe-atomic Martin Liška
  1 sibling, 2 replies; 95+ messages in thread
From: Martin Liška @ 2016-10-04 12:05 UTC (permalink / raw)
  To: Nathan Sidwell, Jeff Law, Andi Kleen; +Cc: gcc-patches, jh

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

On 10/03/2016 02:26 PM, Nathan Sidwell wrote:
> On 10/03/16 08:13, Martin Liška wrote:
>> On 08/18/2016 05:53 PM, Jeff Law wrote:
>>> On 08/18/2016 09:51 AM, Andi Kleen wrote:
>>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>> The best proxy we have for that is -pthread.
>>>>>
>>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>> garbage data otherwise, which in many ways is even worse.
>>>>
>>>> It will likely be catastrophically slower in some cases.
>>>>
>>>> Catastrophically as in too slow to be usable.
>>>>
>>>> An atomic instruction is a lot more expensive than a single increment. Also
>>>> they sometimes are really slow depending on the state of the machine.
>>> And for those cases there's a way to override.
>>>
>>> The default should be set for correctness.
>>>
>>> jeff
>>
>> I would to somehow resolve the discussion related to default value selection.
>> Is the prevailing consensus that we should set -fprofile-update=atomic when
>> -pthread is set? If so, I'll prepare a patch. I tend to do it this way.
> 
> This is my preference.
> 
> nathan

Ok, this is final version of patch which implements both the warning and appending -fprofile-update
to a command line options.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

[-- Attachment #2: 0001-Add-fprofile-update-option-juggling.patch --]
[-- Type: text/x-patch, Size: 1253 bytes --]

From 343d64a3c6b515053459a8ece6f9b0ad6ce86273 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 4 Oct 2016 10:46:48 +0200
Subject: [PATCH] Add -fprofile-update option juggling

gcc/ChangeLog:

2016-10-04  Martin Liska  <mliska@suse.cz>

	* gcc.c: Set -fprofile-update=atomic when profiling is
	enabled and -pthread is set.  Warn when one combines
	-pthread and -fprofile-update=single for an app using
	profiling code.
---
 gcc/gcc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index fd2b182..5213cb0 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1141,7 +1141,14 @@ static const char *cc1_options =
  %{-help=*:--help=%*}\
  %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %b.s}}}\
  %{fsyntax-only:-o %j} %{-param*}\
- %{coverage:-fprofile-arcs -ftest-coverage}";
+ %{coverage:-fprofile-arcs -ftest-coverage}\
+ %{fprofile-arcs|fprofile-generate*|coverage:\
+   %{!fprofile-update=single:\
+     %{pthread:-fprofile-update=atomic}}}\
+ %{fprofile-update=single:\
+   %{fprofile-arcs|fprofile-generate*|coverage:\
+     %{pthread:%n-fprofile-update=atomic should be used\
+ for a multithreaded application}}}";
 
 static const char *asm_options =
 "%{-target-help:%:print-asm-header()} "
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Set -fprofile-update=atomic when -pthread is present
  2016-10-04 12:05                                         ` Martin Liška
@ 2016-10-05 17:54                                           ` Jeff Law
  2016-10-13 15:34                                           ` [PATCH] Introduce -fprofile-update=maybe-atomic Martin Liška
  1 sibling, 0 replies; 95+ messages in thread
From: Jeff Law @ 2016-10-05 17:54 UTC (permalink / raw)
  To: Martin Liška, Nathan Sidwell, Andi Kleen; +Cc: gcc-patches, jh

On 10/04/2016 06:05 AM, Martin Liška wrote:
> On 10/03/2016 02:26 PM, Nathan Sidwell wrote:
>> > On 10/03/16 08:13, Martin Liška wrote:
>>> >> On 08/18/2016 05:53 PM, Jeff Law wrote:
>>>> >>> On 08/18/2016 09:51 AM, Andi Kleen wrote:
>>>>>> >>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>>> >>>>> The best proxy we have for that is -pthread.
>>>>>> >>>>>
>>>>>> >>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>>> >>>>> garbage data otherwise, which in many ways is even worse.
>>>>> >>>>
>>>>> >>>> It will likely be catastrophically slower in some cases.
>>>>> >>>>
>>>>> >>>> Catastrophically as in too slow to be usable.
>>>>> >>>>
>>>>> >>>> An atomic instruction is a lot more expensive than a single increment. Also
>>>>> >>>> they sometimes are really slow depending on the state of the machine.
>>>> >>> And for those cases there's a way to override.
>>>> >>>
>>>> >>> The default should be set for correctness.
>>>> >>>
>>>> >>> jeff
>>> >>
>>> >> I would to somehow resolve the discussion related to default value selection.
>>> >> Is the prevailing consensus that we should set -fprofile-update=atomic when
>>> >> -pthread is set? If so, I'll prepare a patch. I tend to do it this way.
>> >
>> > This is my preference.
>> >
>> > nathan
> Ok, this is final version of patch which implements both the warning and appending -fprofile-update
> to a command line options.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?
> Martin
>
>
> 0001-Add-fprofile-update-option-juggling.patch
>
>
> From 343d64a3c6b515053459a8ece6f9b0ad6ce86273 Mon Sep 17 00:00:00 2001
> From: marxin <mliska@suse.cz>
> Date: Tue, 4 Oct 2016 10:46:48 +0200
> Subject: [PATCH] Add -fprofile-update option juggling
>
> gcc/ChangeLog:
>
> 2016-10-04  Martin Liska  <mliska@suse.cz>
>
> 	* gcc.c: Set -fprofile-update=atomic when profiling is
> 	enabled and -pthread is set.  Warn when one combines
> 	-pthread and -fprofile-update=single for an app using
> 	profiling code.
OK.
jeff

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC] Speed-up -fprofile-update=atomic
  2016-10-04  9:45                                             ` Richard Biener
@ 2016-10-12 13:53                                               ` Martin Liška
  2016-10-13  9:43                                                 ` Richard Biener
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-10-12 13:53 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

On 10/04/2016 11:45 AM, Richard Biener wrote:
> On Thu, Sep 15, 2016 at 12:00 PM, Martin Liška <mliska@suse.cz> wrote:
>> On 09/07/2016 02:09 PM, Richard Biener wrote:
>>> On Wed, Sep 7, 2016 at 1:37 PM, Martin Liška <mliska@suse.cz> wrote:
>>>> On 08/18/2016 06:06 PM, Richard Biener wrote:
>>>>> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>>>>>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>>>>> The best proxy we have for that is -pthread.
>>>>>>>>
>>>>>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>>>>> garbage data otherwise, which in many ways is even worse.
>>>>>>>
>>>>>>> It will likely be catastrophically slower in some cases.
>>>>>>>
>>>>>>> Catastrophically as in too slow to be usable.
>>>>>>>
>>>>>>> An atomic instruction is a lot more expensive than a single
>>>>>> increment. Also
>>>>>>> they sometimes are really slow depending on the state of the machine.
>>>>>>
>>>>>> Can't we just have thread-local copies of all the counters (perhaps
>>>>>> using
>>>>>> __thread pointer as base) and just atomically merge at thread
>>>>>> termination?
>>>>>
>>>>> I suggested that as well but of course it'll have its own class of issues (short lived threads, so we need to somehow re-use counters from terminated threads, large number of threads and thus using too much memory for the counters)
>>>>>
>>>>> Richard.
>>>>
>>>> Hello.
>>>>
>>>> I've got written the approach on my TODO list, let's see whether it would be doable in a reasonable amount of time.
>>>>
>>>> I've just finished some measurements to illustrate slow-down of -fprofile-update=atomic approach.
>>>> All numbers are: no profile, -fprofile-generate, -fprofile-generate -fprofile-update=atomic
>>>> c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
>>>> unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
>>>> tramp3d (1 thread, -O3): 18.0, 46.6, 168s
>>>>
>>>> So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having much experience with default option
>>>> selection, but these numbers can probably help.
>>>>
>>>> Thoughts?
>>>
>>> Look at the generated code for an instrumented simple loop and see that for
>>> the non-atomic updates we happily apply store-motion to the counter update
>>> and thus we only get one counter update per loop exit rather than one per
>>> loop iteration.  Now see what happens for the atomic case (I suspect you
>>> get one per iteration).
>>>
>>> I'll bet this accounts for most of the slowdown.
>>>
>>> Back in time ICC which had atomic counter updates (but using function
>>> calls - ugh!) had a > 1000% overhead with FDO for tramp3d (they also
>>> didn't have early inlining -- removing abstraction helps reducing the number
>>> of counters significantly).
>>>
>>> Richard.
>>
>> Hi.
>>
>> During Cauldron I discussed with Richi approaches how to speed-up ARCS
>> profile counter updates. My first attempt is to utilize TLS storage, where
>> every function is accumulating arcs counters. These are eventually added
>> (using atomic operations) to the global one at the very end of a function.
>> Currently I rely on target support of TLS, which is questionable whether
>> to have such a requirement for -fprofile-update=atomic, or to add a new option value
>> like -fprofile-update=atomic-tls?
>>
>> Running the patch on tramp3d, compared to previous numbers, it takes 88s to finish.
>> Time shrinks to 50%, compared to the current implementation.
>>
>> Thoughts?
> 
> Hmm, I thought I suggested that you can simply use automatic storage
> (which effectively
> is TLS...) for regions that are not forked or abnormally left (which
> means SESE regions
> that have no calls that eventually terminate or throw externally).
> 
> So why did you end up with TLS?

Hi.

Usage for TLS does not makes sense, stupid mistake ;)

By using SESE regions, do you mean the infrastructure that is utilized
by Graphite machinery?

Thanks,
Martin

> 
> Richard.
> 
>> Martin
>>
>>>
>>>> Martin
>>>>
>>>>>
>>>>>>      Jakub
>>>>>
>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC] Speed-up -fprofile-update=atomic
  2016-10-12 13:53                                               ` Martin Liška
@ 2016-10-13  9:43                                                 ` Richard Biener
  2016-10-17 11:47                                                   ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Richard Biener @ 2016-10-13  9:43 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

On Wed, Oct 12, 2016 at 3:52 PM, Martin Liška <mliska@suse.cz> wrote:
> On 10/04/2016 11:45 AM, Richard Biener wrote:
>> On Thu, Sep 15, 2016 at 12:00 PM, Martin Liška <mliska@suse.cz> wrote:
>>> On 09/07/2016 02:09 PM, Richard Biener wrote:
>>>> On Wed, Sep 7, 2016 at 1:37 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>> On 08/18/2016 06:06 PM, Richard Biener wrote:
>>>>>> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>>>>>>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>>>>>> The best proxy we have for that is -pthread.
>>>>>>>>>
>>>>>>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>>>>>> garbage data otherwise, which in many ways is even worse.
>>>>>>>>
>>>>>>>> It will likely be catastrophically slower in some cases.
>>>>>>>>
>>>>>>>> Catastrophically as in too slow to be usable.
>>>>>>>>
>>>>>>>> An atomic instruction is a lot more expensive than a single
>>>>>>> increment. Also
>>>>>>>> they sometimes are really slow depending on the state of the machine.
>>>>>>>
>>>>>>> Can't we just have thread-local copies of all the counters (perhaps
>>>>>>> using
>>>>>>> __thread pointer as base) and just atomically merge at thread
>>>>>>> termination?
>>>>>>
>>>>>> I suggested that as well but of course it'll have its own class of issues (short lived threads, so we need to somehow re-use counters from terminated threads, large number of threads and thus using too much memory for the counters)
>>>>>>
>>>>>> Richard.
>>>>>
>>>>> Hello.
>>>>>
>>>>> I've got written the approach on my TODO list, let's see whether it would be doable in a reasonable amount of time.
>>>>>
>>>>> I've just finished some measurements to illustrate slow-down of -fprofile-update=atomic approach.
>>>>> All numbers are: no profile, -fprofile-generate, -fprofile-generate -fprofile-update=atomic
>>>>> c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
>>>>> unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
>>>>> tramp3d (1 thread, -O3): 18.0, 46.6, 168s
>>>>>
>>>>> So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having much experience with default option
>>>>> selection, but these numbers can probably help.
>>>>>
>>>>> Thoughts?
>>>>
>>>> Look at the generated code for an instrumented simple loop and see that for
>>>> the non-atomic updates we happily apply store-motion to the counter update
>>>> and thus we only get one counter update per loop exit rather than one per
>>>> loop iteration.  Now see what happens for the atomic case (I suspect you
>>>> get one per iteration).
>>>>
>>>> I'll bet this accounts for most of the slowdown.
>>>>
>>>> Back in time ICC which had atomic counter updates (but using function
>>>> calls - ugh!) had a > 1000% overhead with FDO for tramp3d (they also
>>>> didn't have early inlining -- removing abstraction helps reducing the number
>>>> of counters significantly).
>>>>
>>>> Richard.
>>>
>>> Hi.
>>>
>>> During Cauldron I discussed with Richi approaches how to speed-up ARCS
>>> profile counter updates. My first attempt is to utilize TLS storage, where
>>> every function is accumulating arcs counters. These are eventually added
>>> (using atomic operations) to the global one at the very end of a function.
>>> Currently I rely on target support of TLS, which is questionable whether
>>> to have such a requirement for -fprofile-update=atomic, or to add a new option value
>>> like -fprofile-update=atomic-tls?
>>>
>>> Running the patch on tramp3d, compared to previous numbers, it takes 88s to finish.
>>> Time shrinks to 50%, compared to the current implementation.
>>>
>>> Thoughts?
>>
>> Hmm, I thought I suggested that you can simply use automatic storage
>> (which effectively
>> is TLS...) for regions that are not forked or abnormally left (which
>> means SESE regions
>> that have no calls that eventually terminate or throw externally).
>>
>> So why did you end up with TLS?
>
> Hi.
>
> Usage for TLS does not makes sense, stupid mistake ;)
>
> By using SESE regions, do you mean the infrastructure that is utilized
> by Graphite machinery?

No, just as "single-entry single-exit region" which means placing of
initializations of the internal counters to zero and the updates of the
actual counters is "obvious".

Note that this "optimization" isn't one if the SESE region does not contain
cycle(s).  Unless there is a way to do an atomic update of a bunch of
counters faster than doing them separately.  This optimization will also
increase register pressure (or force the internal counters to the stack).
Thus selecting which counters to "optimize" and which ones to leave in place
might be necessary.

Richard.

> Thanks,
> Martin
>
>>
>> Richard.
>>
>>> Martin
>>>
>>>>
>>>>> Martin
>>>>>
>>>>>>
>>>>>>>      Jakub
>>>>>>
>>>>>>
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-10-04 12:05                                         ` Martin Liška
  2016-10-05 17:54                                           ` Jeff Law
@ 2016-10-13 15:34                                           ` Martin Liška
  2016-10-31  9:13                                             ` Martin Liška
  1 sibling, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-10-13 15:34 UTC (permalink / raw)
  To: Nathan Sidwell, Jeff Law, Andi Kleen; +Cc: gcc-patches, jh, David Edelsohn

[-- Attachment #1: Type: text/plain, Size: 510 bytes --]

Hello.

As it's very hard to guess from GCC driver whether a target supports atomic updates
for GCOV counter or not, I decided to come up with a new option value (maybe-atomic),
that would be transformed in a corresponding value (single or atomic) in tree-profile.c.
The GCC driver selects the option when -pthread is present in the command line.

That should fix all tests failures seen on AIX target.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

[-- Attachment #2: 0001-Introduce-fprofile-update-maybe-atomic.patch --]
[-- Type: text/x-patch, Size: 7389 bytes --]

From 1d00b7b4d42d080fe4d6cd51a03829b0fe525c9d Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 12 Oct 2016 15:05:49 +0200
Subject: [PATCH] Introduce -fprofile-update=maybe-atomic

gcc/ChangeLog:

2016-10-12  Martin Liska  <mliska@suse.cz>

	* common.opt: Add maybe-atomic as a new enum value for
	-fprofile-update.
	* coretypes.h: Likewise.
	* doc/invoke.texi: Document the new option value.
	* gcc.c: Replace atomic with maybe-atomic.  Remove warning.
	* tree-profile.c (tree_profiling): Select default value
	of -fprofile-update when 'maybe-atomic' is selected.

gcc/testsuite/ChangeLog:

2016-10-12  Martin Liska  <mliska@suse.cz>

	* gcc.dg/no_profile_instrument_function-attr-1.c: Update test
	to match scanned pattern.
	* gcc.dg/tree-ssa/ssa-lim-11.c: Likewise.
---
 gcc/common.opt                                     |  5 +++-
 gcc/coretypes.h                                    |  3 +-
 gcc/doc/invoke.texi                                | 11 +++++--
 gcc/gcc.c                                          |  6 +---
 .../gcc.dg/no_profile_instrument_function-attr-1.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c         |  2 +-
 gcc/tree-profile.c                                 | 35 +++++++++++-----------
 7 files changed, 35 insertions(+), 29 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 15679c5..d6c5acd 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1937,7 +1937,7 @@ Enable correction of flow inconsistent profile data input.
 
 fprofile-update=
 Common Joined RejectNegative Enum(profile_update) Var(flag_profile_update) Init(PROFILE_UPDATE_SINGLE)
--fprofile-update=[single|atomic]	Set the profile update method.
+-fprofile-update=[single|atomic|maybe-atomic]	Set the profile update method.
 
 Enum
 Name(profile_update) Type(enum profile_update) UnknownError(unknown profile update method %qs)
@@ -1948,6 +1948,9 @@ Enum(profile_update) String(single) Value(PROFILE_UPDATE_SINGLE)
 EnumValue
 Enum(profile_update) String(atomic) Value(PROFILE_UPDATE_ATOMIC)
 
+EnumValue
+Enum(profile_update) String(maybe-atomic) Value(PROFILE_UPDATE_MAYBE_ATOMIC)
+
 fprofile-generate
 Common
 Enable common options for generating profile info for profile feedback directed optimizations.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index fe1e984..aec2a6e 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -177,7 +177,8 @@ enum offload_abi {
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
-  PROFILE_UPDATE_ATOMIC
+  PROFILE_UPDATE_ATOMIC,
+  PROFILE_UPDATE_MAYBE_ATOMIC
 };
 
 /* Types of unwind/exception handling info that can be generated.  */
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c11f1d5..eb6cae3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10315,13 +10315,18 @@ To optimize the program based on the collected profile information, use
 
 Alter the update method for an application instrumented for profile
 feedback based optimization.  The @var{method} argument should be one of
-@samp{single} or @samp{atomic}.  The first one is useful for single-threaded
-applications, while the second one prevents profile corruption by emitting
-thread-safe code.
+@samp{single}, @samp{atomic} or @samp{maybe-atomic}.
+The first one is useful for single-threaded applications,
+while the second one prevents profile corruption by emitting thread-safe code.
 
 @strong{Warning:} When an application does not properly join all threads
 (or creates an detached thread), a profile file can be still corrupted.
 
+Using @samp{maybe-atomic} would be transformed either to @samp{atomic},
+when supported by a target, or to @samp{single} otherwise. The GCC driver
+automatically selects @samp{maybe-atomic} when @option{-pthread}
+is present in the command line.
+
 @item -fsanitize=address
 @opindex fsanitize=address
 Enable AddressSanitizer, a fast memory error detector.
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 5213cb0..1959fc7 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1144,11 +1144,7 @@ static const char *cc1_options =
  %{coverage:-fprofile-arcs -ftest-coverage}\
  %{fprofile-arcs|fprofile-generate*|coverage:\
    %{!fprofile-update=single:\
-     %{pthread:-fprofile-update=atomic}}}\
- %{fprofile-update=single:\
-   %{fprofile-arcs|fprofile-generate*|coverage:\
-     %{pthread:%n-fprofile-update=atomic should be used\
- for a multithreaded application}}}";
+     %{pthread:-fprofile-update=maybe-atomic}}}";
 
 static const char *asm_options =
 "%{-target-help:%:print-asm-header()} "
diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
index c93d171..29bffd90 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-1.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -fprofile-generate -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fprofile-generate -fprofile-update=single -fdump-tree-optimized" } */
 
 __attribute__ ((no_profile_instrument_function))
 int foo()
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
index e4c11aa..4c38982 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fprofile-arcs -fdump-tree-lim2-details" } */
+/* { dg-options "-O -fprofile-arcs -fprofile-update=single -fdump-tree-lim2-details" } */
 /* { dg-require-profiling "-fprofile-generate" } */
 
 struct thread_param
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index 1f45b99..fcef2e5 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -534,25 +534,26 @@ tree_profiling (void)
   struct cgraph_node *node;
 
   /* Verify whether we can utilize atomic update operations.  */
-  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+  bool can_support_atomic = false;
+  unsigned HOST_WIDE_INT gcov_type_size
+    = tree_to_uhwi (TYPE_SIZE_UNIT (get_gcov_type ()));
+  if (gcov_type_size == 4)
+    can_support_atomic
+      = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
+  else if (gcov_type_size == 8)
+    can_support_atomic
+      = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
+      && !can_support_atomic)
     {
-      bool can_support = false;
-      unsigned HOST_WIDE_INT gcov_type_size
-	= tree_to_uhwi (TYPE_SIZE_UNIT (get_gcov_type ()));
-      if (gcov_type_size == 4)
-	can_support
-	  = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
-      else if (gcov_type_size == 8)
-	can_support
-	  = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
-
-      if (!can_support)
-      {
-	warning (0, "target does not support atomic profile update, "
-		 "single mode is selected");
-	flag_profile_update = PROFILE_UPDATE_SINGLE;
-      }
+      warning (0, "target does not support atomic profile update, "
+	       "single mode is selected");
+      flag_profile_update = PROFILE_UPDATE_SINGLE;
     }
+  else if (flag_profile_update == PROFILE_UPDATE_MAYBE_ATOMIC)
+    flag_profile_update = can_support_atomic
+      ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE;
 
   /* This is a small-ipa pass that gets called only once, from
      cgraphunit.c:ipa_passes().  */
-- 
2.9.2


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC] Speed-up -fprofile-update=atomic
  2016-10-13  9:43                                                 ` Richard Biener
@ 2016-10-17 11:47                                                   ` Martin Liška
       [not found]                                                     ` <CAFiYyc3eDT4g926PPZuktz5fEW=k-PibAcxhigx4GBcxoXNJFQ@mail.gmail.com>
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-10-17 11:47 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

On 10/13/2016 11:43 AM, Richard Biener wrote:
> On Wed, Oct 12, 2016 at 3:52 PM, Martin Liška <mliska@suse.cz> wrote:
>> On 10/04/2016 11:45 AM, Richard Biener wrote:
>>> On Thu, Sep 15, 2016 at 12:00 PM, Martin Liška <mliska@suse.cz> wrote:
>>>> On 09/07/2016 02:09 PM, Richard Biener wrote:
>>>>> On Wed, Sep 7, 2016 at 1:37 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>>> On 08/18/2016 06:06 PM, Richard Biener wrote:
>>>>>>> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>>>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>>>>>>>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>>>>>>> The best proxy we have for that is -pthread.
>>>>>>>>>>
>>>>>>>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>>>>>>> garbage data otherwise, which in many ways is even worse.
>>>>>>>>>
>>>>>>>>> It will likely be catastrophically slower in some cases.
>>>>>>>>>
>>>>>>>>> Catastrophically as in too slow to be usable.
>>>>>>>>>
>>>>>>>>> An atomic instruction is a lot more expensive than a single
>>>>>>>> increment. Also
>>>>>>>>> they sometimes are really slow depending on the state of the machine.
>>>>>>>>
>>>>>>>> Can't we just have thread-local copies of all the counters (perhaps
>>>>>>>> using
>>>>>>>> __thread pointer as base) and just atomically merge at thread
>>>>>>>> termination?
>>>>>>>
>>>>>>> I suggested that as well but of course it'll have its own class of issues (short lived threads, so we need to somehow re-use counters from terminated threads, large number of threads and thus using too much memory for the counters)
>>>>>>>
>>>>>>> Richard.
>>>>>>
>>>>>> Hello.
>>>>>>
>>>>>> I've got written the approach on my TODO list, let's see whether it would be doable in a reasonable amount of time.
>>>>>>
>>>>>> I've just finished some measurements to illustrate slow-down of -fprofile-update=atomic approach.
>>>>>> All numbers are: no profile, -fprofile-generate, -fprofile-generate -fprofile-update=atomic
>>>>>> c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
>>>>>> unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
>>>>>> tramp3d (1 thread, -O3): 18.0, 46.6, 168s
>>>>>>
>>>>>> So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having much experience with default option
>>>>>> selection, but these numbers can probably help.
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> Look at the generated code for an instrumented simple loop and see that for
>>>>> the non-atomic updates we happily apply store-motion to the counter update
>>>>> and thus we only get one counter update per loop exit rather than one per
>>>>> loop iteration.  Now see what happens for the atomic case (I suspect you
>>>>> get one per iteration).
>>>>>
>>>>> I'll bet this accounts for most of the slowdown.
>>>>>
>>>>> Back in time ICC which had atomic counter updates (but using function
>>>>> calls - ugh!) had a > 1000% overhead with FDO for tramp3d (they also
>>>>> didn't have early inlining -- removing abstraction helps reducing the number
>>>>> of counters significantly).
>>>>>
>>>>> Richard.
>>>>
>>>> Hi.
>>>>
>>>> During Cauldron I discussed with Richi approaches how to speed-up ARCS
>>>> profile counter updates. My first attempt is to utilize TLS storage, where
>>>> every function is accumulating arcs counters. These are eventually added
>>>> (using atomic operations) to the global one at the very end of a function.
>>>> Currently I rely on target support of TLS, which is questionable whether
>>>> to have such a requirement for -fprofile-update=atomic, or to add a new option value
>>>> like -fprofile-update=atomic-tls?
>>>>
>>>> Running the patch on tramp3d, compared to previous numbers, it takes 88s to finish.
>>>> Time shrinks to 50%, compared to the current implementation.
>>>>
>>>> Thoughts?
>>>
>>> Hmm, I thought I suggested that you can simply use automatic storage
>>> (which effectively
>>> is TLS...) for regions that are not forked or abnormally left (which
>>> means SESE regions
>>> that have no calls that eventually terminate or throw externally).
>>>
>>> So why did you end up with TLS?
>>
>> Hi.
>>
>> Usage for TLS does not makes sense, stupid mistake ;)
>>
>> By using SESE regions, do you mean the infrastructure that is utilized
>> by Graphite machinery?
> 
> No, just as "single-entry single-exit region" which means placing of
> initializations of the internal counters to zero and the updates of the
> actual counters is "obvious".
> 
> Note that this "optimization" isn't one if the SESE region does not contain
> cycle(s).  Unless there is a way to do an atomic update of a bunch of
> counters faster than doing them separately.  This optimization will also
> increase register pressure (or force the internal counters to the stack).
> Thus selecting which counters to "optimize" and which ones to leave in place
> might be necessary.

Ok, I must admit the selection which counters to optimize is crucial. Current implementation
(atomic increments at places where a BB is reached) has advantage that it does not increase
register pressure and it does not update a global arcs counter when the BB is not visited.
On the other hand, having a local counters which are updated at function exit (my current implementation)
possibly updates counters for BB that not seen and it creates a huge memory locking spot if multiple threads
call a function very often. This is perf report of cray benchmark:

       │             if((d = SQ(b) - 4.0 * a * c) < 0.0) return 0;
  0.01 │3c0:   xor    %ecx,%ecx
  0.00 │3c2:   mov    0x110(%rsp),%rdx
  0.00 │       lock   add    %rdx,__gcov0.ray_sphere
  7.96 │       mov    0x118(%rsp),%rdx
       │       lock   add    %rdx,__gcov0.ray_sphere+0x8
 11.39 │       mov    0x120(%rsp),%rdx
       │       lock   add    %rdx,__gcov0.ray_sphere+0x10
 11.09 │       mov    0x128(%rsp),%rdx
  0.00 │       lock   add    %rdx,__gcov0.ray_sphere+0x18
 11.27 │       mov    0x130(%rsp),%rdx
       │       lock   add    %rdx,__gcov0.ray_sphere+0x20
 11.02 │       mov    0x138(%rsp),%rdx
       │       lock   add    %rdx,__gcov0.ray_sphere+0x28
 11.46 │       mov    0x140(%rsp),%rdx
  0.00 │       lock   add    %rdx,__gcov0.ray_sphere+0x30
 11.84 │       mov    0x148(%rsp),%rdx
  0.00 │       lock   add    %rdx,__gcov0.ray_sphere+0x38
 11.57 │       mov    0x150(%rsp),%rdx
       │       lock   add    %rdx,__gcov0.ray_sphere+0x40
  6.86 │       mov    0x158(%rsp),%rdx
       │       lock   add    %rdx,__gcov0.ray_sphere+0x48

My current approach does atomic increment when maybe_hot_bb_p return false
and local counters are used otherwise. Question is how to find a better place
where to initialize and store local counter values?

Ideas welcomed.
Thanks,
Martin

> 
> Richard.
> 
>> Thanks,
>> Martin
>>
>>>
>>> Richard.
>>>
>>>> Martin
>>>>
>>>>>
>>>>>> Martin
>>>>>>
>>>>>>>
>>>>>>>>      Jakub
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC] Speed-up -fprofile-update=atomic
       [not found]                                                     ` <CAFiYyc3eDT4g926PPZuktz5fEW=k-PibAcxhigx4GBcxoXNJFQ@mail.gmail.com>
@ 2016-10-24 12:09                                                       ` Martin Liška
       [not found]                                                         ` <CAFiYyc1tSdTdqqkHcMp+dgE43+8tHL6kY8E07TCHoZBeUT-ggQ@mail.gmail.com>
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-10-24 12:09 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

[-- Attachment #1: Type: text/plain, Size: 9272 bytes --]

On 10/17/2016 02:03 PM, Richard Biener wrote:
> On Mon, Oct 17, 2016 at 1:46 PM, Martin Liška <mliska@suse.cz> wrote:
>> On 10/13/2016 11:43 AM, Richard Biener wrote:
>>> On Wed, Oct 12, 2016 at 3:52 PM, Martin Liška <mliska@suse.cz> wrote:
>>>> On 10/04/2016 11:45 AM, Richard Biener wrote:
>>>>> On Thu, Sep 15, 2016 at 12:00 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>>> On 09/07/2016 02:09 PM, Richard Biener wrote:
>>>>>>> On Wed, Sep 7, 2016 at 1:37 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>>>>> On 08/18/2016 06:06 PM, Richard Biener wrote:
>>>>>>>>> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>>>>>>>>>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>>>>>>>>>>>> I'd prefer to make updates atomic in multi-threaded applications.
>>>>>>>>>>>> The best proxy we have for that is -pthread.
>>>>>>>>>>>>
>>>>>>>>>>>> Is it slower, most definitely, but odds are we're giving folks
>>>>>>>>>>>> garbage data otherwise, which in many ways is even worse.
>>>>>>>>>>>
>>>>>>>>>>> It will likely be catastrophically slower in some cases.
>>>>>>>>>>>
>>>>>>>>>>> Catastrophically as in too slow to be usable.
>>>>>>>>>>>
>>>>>>>>>>> An atomic instruction is a lot more expensive than a single
>>>>>>>>>> increment. Also
>>>>>>>>>>> they sometimes are really slow depending on the state of the machine.
>>>>>>>>>>
>>>>>>>>>> Can't we just have thread-local copies of all the counters (perhaps
>>>>>>>>>> using
>>>>>>>>>> __thread pointer as base) and just atomically merge at thread
>>>>>>>>>> termination?
>>>>>>>>>
>>>>>>>>> I suggested that as well but of course it'll have its own class of issues (short lived threads, so we need to somehow re-use counters from terminated threads, large number of threads and thus using too much memory for the counters)
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>
>>>>>>>> Hello.
>>>>>>>>
>>>>>>>> I've got written the approach on my TODO list, let's see whether it would be doable in a reasonable amount of time.
>>>>>>>>
>>>>>>>> I've just finished some measurements to illustrate slow-down of -fprofile-update=atomic approach.
>>>>>>>> All numbers are: no profile, -fprofile-generate, -fprofile-generate -fprofile-update=atomic
>>>>>>>> c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
>>>>>>>> unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
>>>>>>>> tramp3d (1 thread, -O3): 18.0, 46.6, 168s
>>>>>>>>
>>>>>>>> So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having much experience with default option
>>>>>>>> selection, but these numbers can probably help.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Look at the generated code for an instrumented simple loop and see that for
>>>>>>> the non-atomic updates we happily apply store-motion to the counter update
>>>>>>> and thus we only get one counter update per loop exit rather than one per
>>>>>>> loop iteration.  Now see what happens for the atomic case (I suspect you
>>>>>>> get one per iteration).
>>>>>>>
>>>>>>> I'll bet this accounts for most of the slowdown.
>>>>>>>
>>>>>>> Back in time ICC which had atomic counter updates (but using function
>>>>>>> calls - ugh!) had a > 1000% overhead with FDO for tramp3d (they also
>>>>>>> didn't have early inlining -- removing abstraction helps reducing the number
>>>>>>> of counters significantly).
>>>>>>>
>>>>>>> Richard.
>>>>>>
>>>>>> Hi.
>>>>>>
>>>>>> During Cauldron I discussed with Richi approaches how to speed-up ARCS
>>>>>> profile counter updates. My first attempt is to utilize TLS storage, where
>>>>>> every function is accumulating arcs counters. These are eventually added
>>>>>> (using atomic operations) to the global one at the very end of a function.
>>>>>> Currently I rely on target support of TLS, which is questionable whether
>>>>>> to have such a requirement for -fprofile-update=atomic, or to add a new option value
>>>>>> like -fprofile-update=atomic-tls?
>>>>>>
>>>>>> Running the patch on tramp3d, compared to previous numbers, it takes 88s to finish.
>>>>>> Time shrinks to 50%, compared to the current implementation.
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> Hmm, I thought I suggested that you can simply use automatic storage
>>>>> (which effectively
>>>>> is TLS...) for regions that are not forked or abnormally left (which
>>>>> means SESE regions
>>>>> that have no calls that eventually terminate or throw externally).
>>>>>
>>>>> So why did you end up with TLS?
>>>>
>>>> Hi.
>>>>
>>>> Usage for TLS does not makes sense, stupid mistake ;)
>>>>
>>>> By using SESE regions, do you mean the infrastructure that is utilized
>>>> by Graphite machinery?
>>>
>>> No, just as "single-entry single-exit region" which means placing of
>>> initializations of the internal counters to zero and the updates of the
>>> actual counters is "obvious".
>>>
>>> Note that this "optimization" isn't one if the SESE region does not contain
>>> cycle(s).  Unless there is a way to do an atomic update of a bunch of
>>> counters faster than doing them separately.  This optimization will also
>>> increase register pressure (or force the internal counters to the stack).
>>> Thus selecting which counters to "optimize" and which ones to leave in place
>>> might be necessary.
>>
>> Ok, I must admit the selection which counters to optimize is crucial. Current implementation
>> (atomic increments at places where a BB is reached) has advantage that it does not increase
>> register pressure and it does not update a global arcs counter when the BB is not visited.
>> On the other hand, having a local counters which are updated at function exit (my current implementation)
>> possibly updates counters for BB that not seen and it creates a huge memory locking spot if multiple threads
>> call a function very often. This is perf report of cray benchmark:
>>
>>        │             if((d = SQ(b) - 4.0 * a * c) < 0.0) return 0;
>>   0.01 │3c0:   xor    %ecx,%ecx
>>   0.00 │3c2:   mov    0x110(%rsp),%rdx
>>   0.00 │       lock   add    %rdx,__gcov0.ray_sphere
>>   7.96 │       mov    0x118(%rsp),%rdx
>>        │       lock   add    %rdx,__gcov0.ray_sphere+0x8
>>  11.39 │       mov    0x120(%rsp),%rdx
>>        │       lock   add    %rdx,__gcov0.ray_sphere+0x10
>>  11.09 │       mov    0x128(%rsp),%rdx
>>   0.00 │       lock   add    %rdx,__gcov0.ray_sphere+0x18
>>  11.27 │       mov    0x130(%rsp),%rdx
>>        │       lock   add    %rdx,__gcov0.ray_sphere+0x20
>>  11.02 │       mov    0x138(%rsp),%rdx
>>        │       lock   add    %rdx,__gcov0.ray_sphere+0x28
>>  11.46 │       mov    0x140(%rsp),%rdx
>>   0.00 │       lock   add    %rdx,__gcov0.ray_sphere+0x30
>>  11.84 │       mov    0x148(%rsp),%rdx
>>   0.00 │       lock   add    %rdx,__gcov0.ray_sphere+0x38
>>  11.57 │       mov    0x150(%rsp),%rdx
>>        │       lock   add    %rdx,__gcov0.ray_sphere+0x40
>>   6.86 │       mov    0x158(%rsp),%rdx
>>        │       lock   add    %rdx,__gcov0.ray_sphere+0x48
>>
>> My current approach does atomic increment when maybe_hot_bb_p return false
>> and local counters are used otherwise. Question is how to find a better place
>> where to initialize and store local counter values?
> 
> Given the main reason for using local counters is loops you can use
> loop information
> and put counter initializations in the loop preheader and stores on
> the loop exit edges
> (basically what loop store motion does w/o atomic updates).  Store motion knows
> to ignore any aliasing with counters and thus is _very_ aggressive
> with doing this
> (including all loop nests but of course not across possible (abnormal)
> function exits).
> 
> While profile instrumentation is quite late it is of course still
> before IPA inlining.
> Thus loop store motion w/o atomic updates possibly still sees more opportunities
> than this.  I wonder if we might want to leave the optimization to the regular
> optimizers by only lowering the actual counter kind late and start with some
> IFN_UPDATE_COVERAGE_COUNTER which passes could handle (and merge).
> 
> Anyway, I'd go for the loop trick and simply handle counter motion
> from innermost
> loops only.  The final update place would be the common post-dominator of all
> exits (if you want to handle multiple exits).  As said you need to watch out for
> function termination that is not reflected by the CFG (external throws, exits in
> callees, etc.).
> 
> Richard.

Hello Richard.

I've just finished my patch, where I come up with a new internal fn (UPDATE_COVERAGE_COUNTER).
The function is generated by profile pass, and is handled by lim pass. I originally tried to
support internal fn in the lim machinery, but doing specific loop motion looks easier to work with.

With the patch applied, tramp3d runs 1.8x slower with -fprofile-update=atomic compared to -fprofile-update=single.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
Thoughts?

Thanks,
Martin
> 
>> Ideas welcomed.
>> Thanks,
>> Martin
>>
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> Martin
>>>>
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Martin
>>>>>>
>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>      Jakub
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>


[-- Attachment #2: 0001-Introduce-loop-store-motion-for-UPDATE_COVERAGE_COUN.patch --]
[-- Type: text/x-patch, Size: 17878 bytes --]

From a9e76ff1bc8b69e23a50d8cdc9556270cf7b269d Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Fri, 21 Oct 2016 13:26:19 +0200
Subject: [PATCH] Introduce loop store motion for UPDATE_COVERAGE_COUNTER

gcc/testsuite/ChangeLog:

2016-10-24  Martin Liska  <mliska@suse.cz>

	* gcc.dg/tree-ssa/ssa-lim-11.c: Update scanned pattern.

gcc/ChangeLog:

2016-10-24  Martin Liska  <mliska@suse.cz>

	* Makefile.in: Add new file profile-expand.c.
	* internal-fn.c (expand_UPDATE_COVERAGE_COUNTER): New IFN.
	* internal-fn.def: Likewise.
	* passes.def: Add new pass profile_expand.
	* profile-expand.c: New file.
	* tree-pass.h (make_pass_profile_expand): Declare a new
	function.
	* tree-profile.c (gimple_gen_edge_profiler): Generate the new
	internal fn.
	* tree-ssa-loop-im.c (loop_suitable_for_sm): Move to header
	file.
	(move_coverage_counter_update): New function.
	(process_sm_for_coverage_counter): Likewise.
	(pass_lim::execute): Call invariant motion for
	UPDATE_COVERAGE_COUNTER internal functions.
	* tree-ssa-loop.h: Move the function here from
	tree-ssa-loop-im.c.
	* value-prof.h: Declare expand_coverage_counter_ifns.
---
 gcc/Makefile.in                            |   1 +
 gcc/internal-fn.c                          |   6 ++
 gcc/internal-fn.def                        |   2 +
 gcc/passes.def                             |   1 +
 gcc/profile-expand.c                       | 143 ++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c |   3 +-
 gcc/tree-pass.h                            |   1 +
 gcc/tree-profile.c                         |  37 +------
 gcc/tree-ssa-loop-im.c                     | 157 +++++++++++++++++++++++++----
 gcc/tree-ssa-loop.h                        |  18 ++++
 gcc/value-prof.h                           |   1 +
 11 files changed, 315 insertions(+), 55 deletions(-)
 create mode 100644 gcc/profile-expand.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c512cd7..9bdb406 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1404,6 +1404,7 @@ OBJS = \
 	print-rtl-function.o \
 	print-tree.o \
 	profile.o \
+	profile-expand.o \
 	real.o \
 	realmpfr.o \
 	recog.o \
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0b32d5f..557d373 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -254,6 +254,12 @@ expand_FALLTHROUGH (internal_fn, gcall *call)
 	    "invalid use of attribute %<fallthrough%>");
 }
 
+static void
+expand_UPDATE_COVERAGE_COUNTER (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Helper function for expand_addsub_overflow.  Return 1
    if ARG interpreted as signed in its precision is known to be always
    positive or 2 if ARG is known to be always negative, or 3 if ARG may
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index d4fbdb2..348fc2f 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -198,6 +198,8 @@ DEF_INTERNAL_FN (ATOMIC_COMPARE_EXCHANGE, ECF_LEAF | ECF_NOTHROW, NULL)
 /* To implement [[fallthrough]].  */
 DEF_INTERNAL_FN (FALLTHROUGH, ECF_LEAF | ECF_NOTHROW, NULL)
 
+DEF_INTERNAL_FN (UPDATE_COVERAGE_COUNTER, ECF_LEAF | ECF_NOTHROW, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/passes.def b/gcc/passes.def
index 1375254..4a22860 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -386,6 +386,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_asan_O0);
   NEXT_PASS (pass_tsan_O0);
   NEXT_PASS (pass_sanopt);
+  NEXT_PASS (pass_profile_expand);
   NEXT_PASS (pass_cleanup_eh);
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
diff --git a/gcc/profile-expand.c b/gcc/profile-expand.c
new file mode 100644
index 0000000..317fe1f
--- /dev/null
+++ b/gcc/profile-expand.c
@@ -0,0 +1,143 @@
+/* Profile expand pass.
+   Copyright (C) 2003-2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "memmodel.h"
+#include "backend.h"
+#include "target.h"
+#include "tree.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "coverage.h"
+#include "varasm.h"
+#include "tree-nested.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimplify-me.h"
+#include "tree-cfg.h"
+#include "tree-into-ssa.h"
+#include "value-prof.h"
+#include "profile.h"
+#include "tree-cfgcleanup.h"
+#include "params.h"
+
+void
+expand_coverage_counter_ifns (void)
+{
+  basic_block bb;
+  tree f = builtin_decl_explicit (LONG_LONG_TYPE_SIZE > 32
+				  ? BUILT_IN_ATOMIC_FETCH_ADD_8:
+				  BUILT_IN_ATOMIC_FETCH_ADD_4);
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  if (gimple_call_internal_p (stmt, IFN_UPDATE_COVERAGE_COUNTER))
+	    {
+	      tree addr = gimple_call_arg (stmt, 0);
+	      tree value = gimple_call_arg (stmt, 1);
+	      if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+		{
+		  gcall *stmt
+		    = gimple_build_call (f, 3, addr, value,
+					 build_int_cst (integer_type_node,
+							MEMMODEL_RELAXED));
+		  gsi_replace (&gsi, stmt, true);
+		}
+	      else
+		{
+		  gcc_assert (TREE_CODE (addr) == ADDR_EXPR);
+		  tree ref = TREE_OPERAND (addr, 0);
+		  tree gcov_type_tmp_var
+		    = make_temp_ssa_name (get_gcov_type (), NULL,
+					  "PROF_edge_counter");
+		  gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
+		  gcov_type_tmp_var
+		    = make_temp_ssa_name (get_gcov_type (), NULL,
+					  "PROF_edge_counter");
+		  gassign *stmt2
+		    = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
+					   gimple_assign_lhs (stmt1), value);
+		  gassign *stmt3
+		    = gimple_build_assign (unshare_expr (ref),
+					   gimple_assign_lhs (stmt2));
+		  gsi_insert_seq_before (&gsi, stmt1, GSI_SAME_STMT);
+		  gsi_insert_seq_before (&gsi, stmt2, GSI_SAME_STMT);
+		  gsi_replace (&gsi, stmt3, GSI_SAME_STMT);
+		}
+	    }
+	}
+    }
+}
+
+/* Profile expand pass.  */
+
+namespace {
+
+const pass_data pass_data_profile_expand =
+{
+  GIMPLE_PASS, /* type */
+  "profile_expand", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_LIM, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_profile_expand : public gimple_opt_pass
+{
+public:
+  pass_profile_expand (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_profile_expand, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  opt_pass * clone () { return new pass_profile_expand (m_ctxt); }
+  virtual bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_profile_expand
+
+unsigned int
+pass_profile_expand::execute (function *)
+{
+  expand_coverage_counter_ifns ();
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_profile_expand (gcc::context *ctxt)
+{
+  return new pass_profile_expand (ctxt);
+}
+
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
index e4c11aa..4c14e24 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c
@@ -22,4 +22,5 @@ void access_buf(struct thread_param* p)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Executing store motion of __gcov0.access_buf\\\[\[01\]\\\] from loop 1" 2 "lim2" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of &__gcov0.access_buf\\\[\[01\]\\\] from loop 1" 1 "lim2" } } */
+/* { dg-final { scan-tree-dump-times "Executing store motion of &__gcov0.access_buf\\\[\[01\]\\\] from loop 2" 1 "lim2" } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 5903fde..ac919f8 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -365,6 +365,7 @@ extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_no_loop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_profile_expand (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
diff --git a/gcc/tree-profile.c b/gcc/tree-profile.c
index abeee92..288d38c 100644
--- a/gcc/tree-profile.c
+++ b/gcc/tree-profile.c
@@ -252,40 +252,13 @@ gimple_init_edge_profiler (void)
 void
 gimple_gen_edge_profiler (int edgeno, edge e)
 {
-  tree one;
+  tree one = build_int_cst (gcov_type_node, 1);
 
-  one = build_int_cst (gcov_type_node, 1);
-
-  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
-    {
-      /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
-      tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
-      tree f = builtin_decl_explicit (LONG_LONG_TYPE_SIZE > 32
-				      ? BUILT_IN_ATOMIC_FETCH_ADD_8:
-				      BUILT_IN_ATOMIC_FETCH_ADD_4);
-      gcall *stmt = gimple_build_call (f, 3, addr, one,
-				       build_int_cst (integer_type_node,
-						      MEMMODEL_RELAXED));
-      gsi_insert_on_edge (e, stmt);
-    }
-  else
-    {
-      tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
-      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-						   NULL, "PROF_edge_counter");
-      gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
-      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					      NULL, "PROF_edge_counter");
-      gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
-					    gimple_assign_lhs (stmt1), one);
-      gassign *stmt3 = gimple_build_assign (unshare_expr (ref),
-					    gimple_assign_lhs (stmt2));
-      gsi_insert_on_edge (e, stmt1);
-      gsi_insert_on_edge (e, stmt2);
-      gsi_insert_on_edge (e, stmt3);
-    }
+  tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
+  gcall *stmt = gimple_build_call_internal (IFN_UPDATE_COVERAGE_COUNTER,
+					    2, addr, one);
+  gsi_insert_on_edge (e, stmt);
 }
-
 /* Emits code to get VALUE to instrument at GSI, and returns the
    variable containing the value.  */
 
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 463db04..0e04250 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-mem.h"
 #include "gimple-fold.h"
 #include "tree-scalar-evolution.h"
+#include "coverage.h"
 
 /* TODO:  Support for predicated code motion.  I.e.
 
@@ -2281,24 +2282,6 @@ find_refs_for_sm (struct loop *loop, bitmap sm_executed, bitmap refs_to_sm)
     }
 }
 
-/* Checks whether LOOP (with exits stored in EXITS array) is suitable
-   for a store motion optimization (i.e. whether we can insert statement
-   on its exits).  */
-
-static bool
-loop_suitable_for_sm (struct loop *loop ATTRIBUTE_UNUSED,
-		      vec<edge> exits)
-{
-  unsigned i;
-  edge ex;
-
-  FOR_EACH_VEC_ELT (exits, i, ex)
-    if (ex->flags & (EDGE_ABNORMAL | EDGE_EH))
-      return false;
-
-  return true;
-}
-
 /* Try to perform store motion for all memory references modified inside
    LOOP.  SM_EXECUTED is the bitmap of the memory references for that
    store motion was executed in one of the outer loops.  */
@@ -2556,6 +2539,132 @@ tree_ssa_lim (void)
   return todo;
 }
 
+/* Move coverage counter update internal function, pointed by GSI iterator,
+   out of a loop LOOP.  */
+
+static void
+move_coverage_counter_update (gimple_stmt_iterator *gsi, struct loop *loop)
+{
+  gimple *call = gsi_stmt (*gsi);
+  tree type = get_gcov_type ();
+
+  vec<edge> exits = get_loop_exit_edges (loop);
+  if (!loop_suitable_for_sm (loop, exits))
+    {
+      exits.release ();
+      return;
+    }
+
+  /* Verify that BB of the CALL statement post-dominates all exits.  */
+  for (unsigned i = 0; i < exits.length (); i++)
+    {
+      edge exit = exits[i];
+      if (!dominated_by_p (CDI_POST_DOMINATORS, call->bb, exit->src))
+	{
+	  exits.release ();
+	  return;
+	}
+    }
+
+  if (exits.is_empty ())
+    return;
+
+  edge preheader = loop_preheader_edge (loop);
+  if (!single_succ_p (preheader->src)
+      || preheader->dest != call->bb)
+    return;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Executing store motion of ");
+      print_generic_expr (dump_file, gimple_call_arg (call, 0), 0);
+      fprintf (dump_file, " from loop %d\n", loop->num);
+    }
+
+  tree preheader_var = make_temp_ssa_name (type, NULL, "PROF_edge_counter");
+  gimple *stmt = gimple_build_assign (preheader_var, build_int_cst (type, 0));
+  gimple_stmt_iterator it = gsi_last_bb (preheader->src);
+  gsi_insert_after (&it, stmt, GSI_NEW_STMT);
+
+  tree loop_var1 = make_temp_ssa_name (type, NULL, "PROF_edge_counter");
+  tree loop_var2 = make_temp_ssa_name (type, NULL, "PROF_edge_counter");
+
+  gphi *phi = create_phi_node (loop_var1, call->bb);
+  add_phi_arg (phi, preheader_var, preheader, UNKNOWN_LOCATION);
+
+  stmt = gimple_build_assign (loop_var2, PLUS_EXPR, loop_var1,
+			      build_int_cst (type, 1));
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, call->bb->preds)
+    if (e != preheader)
+      add_phi_arg (phi, loop_var2, e, UNKNOWN_LOCATION);
+
+  tree updated_value = make_temp_ssa_name (type, NULL, "PROF_edge_counter");
+
+  for (unsigned i = 0; i < exits.length (); i++)
+    {
+      edge exit = exits[i];
+      if (!dominated_by_p (CDI_DOMINATORS, exit->dest, exit->src))
+	{
+	  basic_block new_bb = split_edge (exit);
+	  set_immediate_dominator (CDI_DOMINATORS, new_bb, exit->src);
+	  e = single_pred_edge (new_bb);
+	}
+      else
+	e = exit;
+
+      basic_block bb = e->dest;
+      phi = create_phi_node (updated_value, e->dest);
+      add_phi_arg (phi, loop_var2, e, UNKNOWN_LOCATION);
+
+      tree addr = unshare_expr (gimple_call_arg (call, 0));
+      call = gimple_build_call_internal (IFN_UPDATE_COVERAGE_COUNTER,
+					 2, addr, gimple_phi_result (phi));
+
+      it = gsi_start_bb (bb);
+      gsi_insert_before (&it, call, GSI_NEW_STMT);
+    }
+
+  exits.release ();
+  gsi_remove (gsi, true);
+}
+
+/* Process store motion for coverage counter update internal function.  */
+
+static void
+process_sm_for_coverage_counter (void)
+{
+  bool has_dom_info = dom_info_available_p (CDI_POST_DOMINATORS);
+  if (!has_dom_info)
+    calculate_dominance_info (CDI_POST_DOMINATORS);
+
+  struct loop *loop;
+  FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+    {
+      basic_block *body = get_loop_body (loop);
+      for (unsigned i = 0; i < loop->num_nodes; i++)
+	{
+	  gimple_stmt_iterator gsi;
+	  for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);)
+	    {
+	      gimple *stmt = gsi_stmt (gsi);
+	      if (gimple_call_internal_p (stmt, IFN_UPDATE_COVERAGE_COUNTER)
+		  && integer_onep (gimple_call_arg (stmt, 1)))
+		move_coverage_counter_update (&gsi, loop);
+
+	      if (!gsi_end_p (gsi))
+		gsi_next (&gsi);
+	    }
+	}
+    }
+
+  if (!has_dom_info)
+    free_dominance_info (CDI_POST_DOMINATORS);
+}
+
 /* Loop invariant motion pass.  */
 
 namespace {
@@ -2592,11 +2701,15 @@ pass_lim::execute (function *fun)
 {
   bool in_loop_pipeline = scev_initialized_p ();
   if (!in_loop_pipeline)
-    loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS);
+    loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS
+			 | LOOPS_HAVE_PREHEADERS);
 
-  if (number_of_loops (fun) <= 1)
-    return 0;
-  unsigned int todo = tree_ssa_lim ();
+  unsigned int todo = 0;
+  if (number_of_loops (fun) > 1)
+    todo = tree_ssa_lim ();
+
+  process_sm_for_coverage_counter ();
+  todo |= TODO_update_ssa;
 
   if (!in_loop_pipeline)
     loop_optimizer_finalize ();
diff --git a/gcc/tree-ssa-loop.h b/gcc/tree-ssa-loop.h
index b2f37ab..88bcad7 100644
--- a/gcc/tree-ssa-loop.h
+++ b/gcc/tree-ssa-loop.h
@@ -79,4 +79,22 @@ loop_containing_stmt (gimple *stmt)
   return bb->loop_father;
 }
 
+/* Checks whether LOOP (with exits stored in EXITS array) is suitable
+   for a store motion optimization (i.e. whether we can insert statement
+   on its exits).  */
+
+static inline bool
+loop_suitable_for_sm (struct loop *loop ATTRIBUTE_UNUSED,
+		      vec<edge> exits)
+{
+  unsigned i;
+  edge ex;
+
+  FOR_EACH_VEC_ELT (exits, i, ex)
+    if (ex->flags & (EDGE_ABNORMAL | EDGE_EH))
+      return false;
+
+  return true;
+}
+
 #endif /* GCC_TREE_SSA_LOOP_H */
diff --git a/gcc/value-prof.h b/gcc/value-prof.h
index 07e2b3b..cb3ae1d 100644
--- a/gcc/value-prof.h
+++ b/gcc/value-prof.h
@@ -98,6 +98,7 @@ bool check_ic_target (gcall *, struct cgraph_node *);
 /* In tree-profile.c.  */
 extern void gimple_init_edge_profiler (void);
 extern void gimple_gen_edge_profiler (int, edge);
+extern void expand_coverage_counter_ifns (void);
 extern void gimple_gen_interval_profiler (histogram_value, unsigned, unsigned);
 extern void gimple_gen_pow2_profiler (histogram_value, unsigned, unsigned);
 extern void gimple_gen_one_value_profiler (histogram_value, unsigned, unsigned);
-- 
2.10.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC] Speed-up -fprofile-update=atomic
       [not found]                                                         ` <CAFiYyc1tSdTdqqkHcMp+dgE43+8tHL6kY8E07TCHoZBeUT-ggQ@mail.gmail.com>
@ 2016-10-25 14:32                                                           ` Martin Liška
  2016-10-26  9:29                                                             ` Richard Biener
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-10-25 14:32 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

On 10/24/2016 03:51 PM, Richard Biener wrote:

> It's quite ad-hoc :/  The IFN will also be a memory optimization
> barrier unless you add special support
> for it in the alias oracle - so the performance measurement needs to
> be taken with a grain of salt
> (same is true for all atomics of course... - I have some local patches
> to improve things here).

Good, thus please ping me with the patches you have and I'll integrate it.

> 
> The way you implement process_sm_for_coverage_counter is more like a
> final value replacement.
> You could maybe use create_iv for the loop counter or even wind up
> computing the final value
> (number of iterations) only after the loop, avoiding the IV completely
> (eventually SCEV cprop
> saves you here afterwards).

Or maybe we can basically assign loop->niter as the argument of UPDATE_COVERAGE_COUNTER
function?

Martin

> 
> Richard.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC] Speed-up -fprofile-update=atomic
  2016-10-25 14:32                                                           ` Martin Liška
@ 2016-10-26  9:29                                                             ` Richard Biener
  2016-10-26  9:32                                                               ` Richard Biener
  0 siblings, 1 reply; 95+ messages in thread
From: Richard Biener @ 2016-10-26  9:29 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

On Tue, Oct 25, 2016 at 4:31 PM, Martin Liška <mliska@suse.cz> wrote:
> On 10/24/2016 03:51 PM, Richard Biener wrote:
>
>> It's quite ad-hoc :/  The IFN will also be a memory optimization
>> barrier unless you add special support
>> for it in the alias oracle - so the performance measurement needs to
>> be taken with a grain of salt
>> (same is true for all atomics of course... - I have some local patches
>> to improve things here).
>
> Good, thus please ping me with the patches you have and I'll integrate it.
>
>>
>> The way you implement process_sm_for_coverage_counter is more like a
>> final value replacement.
>> You could maybe use create_iv for the loop counter or even wind up
>> computing the final value
>> (number of iterations) only after the loop, avoiding the IV completely
>> (eventually SCEV cprop
>> saves you here afterwards).
>
> Or maybe we can basically assign loop->niter as the argument of UPDATE_COVERAGE_COUNTER
> function?

Yes, that's what I said.

Richard.

> Martin
>
>>
>> Richard.
>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [RFC] Speed-up -fprofile-update=atomic
  2016-10-26  9:29                                                             ` Richard Biener
@ 2016-10-26  9:32                                                               ` Richard Biener
  0 siblings, 0 replies; 95+ messages in thread
From: Richard Biener @ 2016-10-26  9:32 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, Andi Kleen, Jeff Law, Nathan Sidwell, GCC Patches,
	Hubicha, Jan

[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]

On Wed, Oct 26, 2016 at 11:28 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Oct 25, 2016 at 4:31 PM, Martin Liška <mliska@suse.cz> wrote:
>> On 10/24/2016 03:51 PM, Richard Biener wrote:
>>
>>> It's quite ad-hoc :/  The IFN will also be a memory optimization
>>> barrier unless you add special support
>>> for it in the alias oracle - so the performance measurement needs to
>>> be taken with a grain of salt
>>> (same is true for all atomics of course... - I have some local patches
>>> to improve things here).
>>
>> Good, thus please ping me with the patches you have and I'll integrate it.

This is what I have in my tree (appearantly only points-to changes, I
suppose general
alias changes will be controversical as the builtins would lose their
"compiler memory
barrier" behavior).

Richard.

>>>
>>> The way you implement process_sm_for_coverage_counter is more like a
>>> final value replacement.
>>> You could maybe use create_iv for the loop counter or even wind up
>>> computing the final value
>>> (number of iterations) only after the loop, avoiding the IV completely
>>> (eventually SCEV cprop
>>> saves you here afterwards).
>>
>> Or maybe we can basically assign loop->niter as the argument of UPDATE_COVERAGE_COUNTER
>> function?
>
> Yes, that's what I said.
>
> Richard.
>
>> Martin
>>
>>>
>>> Richard.
>>

[-- Attachment #2: p --]
[-- Type: application/octet-stream, Size: 3541 bytes --]

Index: gcc/tree-ssa-structalias.c
===================================================================
--- gcc/tree-ssa-structalias.c	(revision 241509)
+++ gcc/tree-ssa-structalias.c	(working copy)
@@ -4561,6 +4561,102 @@ find_func_aliases_for_builtin_call (stru
 	  process_all_all_constraints (lhsc, rhsc);
 	  return true;
 	}
+#define CASE_N(X) \
+   case X ## _N: \
+   case X ## _1: \
+   case X ## _2: \
+   case X ## _4: \
+   case X ## _8: \
+   case X ## _16
+      CASE_N(BUILT_IN_SYNC_FETCH_AND_ADD):
+      CASE_N(BUILT_IN_SYNC_FETCH_AND_SUB):
+      CASE_N(BUILT_IN_SYNC_FETCH_AND_OR):
+      CASE_N(BUILT_IN_SYNC_FETCH_AND_AND):
+      CASE_N(BUILT_IN_SYNC_FETCH_AND_XOR):
+      CASE_N(BUILT_IN_SYNC_FETCH_AND_NAND):
+      CASE_N(BUILT_IN_SYNC_ADD_AND_FETCH):
+      CASE_N(BUILT_IN_SYNC_SUB_AND_FETCH):
+      CASE_N(BUILT_IN_SYNC_OR_AND_FETCH):
+      CASE_N(BUILT_IN_SYNC_AND_AND_FETCH):
+      CASE_N(BUILT_IN_SYNC_XOR_AND_FETCH):
+      CASE_N(BUILT_IN_SYNC_NAND_AND_FETCH):
+      CASE_N(BUILT_IN_ATOMIC_ADD_FETCH):
+      CASE_N(BUILT_IN_ATOMIC_SUB_FETCH):
+      CASE_N(BUILT_IN_ATOMIC_AND_FETCH):
+      CASE_N(BUILT_IN_ATOMIC_NAND_FETCH):
+      CASE_N(BUILT_IN_ATOMIC_XOR_FETCH):
+      CASE_N(BUILT_IN_ATOMIC_OR_FETCH):
+      CASE_N(BUILT_IN_ATOMIC_FETCH_ADD):
+      CASE_N(BUILT_IN_ATOMIC_FETCH_SUB):
+      CASE_N(BUILT_IN_ATOMIC_FETCH_AND):
+      CASE_N(BUILT_IN_ATOMIC_FETCH_NAND):
+      CASE_N(BUILT_IN_ATOMIC_FETCH_XOR):
+      CASE_N(BUILT_IN_ATOMIC_FETCH_OR):
+	{
+	  tree dest = gimple_call_lhs (t);
+	  tree addr = gimple_call_arg (t, 0);
+	  tree val = gimple_call_arg (t, 1);
+	  get_constraint_for (addr, &rhsc);
+	  do_deref (&rhsc);
+	  /* The result is *dest.  */
+	  if (dest)
+	    {
+	      get_constraint_for (dest, &lhsc);
+	      process_all_all_constraints (lhsc, rhsc);
+	    }
+	  /* And *dest also receives all pointers from val.  */
+	  lhsc.truncate (0);
+	  get_constraint_for (val, &lhsc);
+	  process_all_all_constraints (rhsc, lhsc);
+	  /* But nothing escapes.  */
+	  return true;
+	}
+      CASE_N(BUILT_IN_SYNC_BOOL_COMPARE_AND_SWAP):
+      CASE_N(BUILT_IN_SYNC_VAL_COMPARE_AND_SWAP):
+      CASE_N(BUILT_IN_SYNC_LOCK_TEST_AND_SET):
+      CASE_N(BUILT_IN_SYNC_LOCK_RELEASE):
+      case BUILT_IN_SYNC_SYNCHRONIZE:
+      case BUILT_IN_ATOMIC_TEST_AND_SET:
+      case BUILT_IN_ATOMIC_CLEAR:
+      case BUILT_IN_ATOMIC_EXCHANGE:
+      CASE_N(BUILT_IN_ATOMIC_EXCHANGE):
+	break;
+      case BUILT_IN_ATOMIC_LOAD:
+        break;
+      CASE_N(BUILT_IN_ATOMIC_LOAD):
+	{
+	  tree dest = gimple_call_lhs (t);
+	  if (dest)
+	    {
+	      tree addr = gimple_call_arg (t, 0);
+	      get_constraint_for (dest, &lhsc);
+	      get_constraint_for (addr, &rhsc);
+	      do_deref (&rhsc);
+	      process_all_all_constraints (lhsc, rhsc);
+	    }
+	  return true;
+	}
+      case BUILT_IN_ATOMIC_COMPARE_EXCHANGE:
+      CASE_N(BUILT_IN_ATOMIC_COMPARE_EXCHANGE):
+	break;
+      case BUILT_IN_ATOMIC_STORE:
+        break;
+      CASE_N(BUILT_IN_ATOMIC_STORE):
+	{
+	  tree dest = gimple_call_arg (t, 0);
+	  tree val = gimple_call_arg (t, 1);
+	  get_constraint_for (dest, &lhsc);
+	  get_constraint_for (val, &rhsc);
+	  do_deref (&lhsc);
+	  process_all_all_constraints (lhsc, rhsc);
+	  return true;
+	}
+      case BUILT_IN_ATOMIC_THREAD_FENCE:
+      case BUILT_IN_ATOMIC_SIGNAL_FENCE:
+      case BUILT_IN_ATOMIC_FERAISEEXCEPT:
+	break;
+#undef CASE_N
+
       /* Variadic argument handling needs to be handled in IPA
 	 mode as well.  */
       case BUILT_IN_VA_START:

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-10-13 15:34                                           ` [PATCH] Introduce -fprofile-update=maybe-atomic Martin Liška
@ 2016-10-31  9:13                                             ` Martin Liška
  2016-11-10 13:19                                               ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-10-31  9:13 UTC (permalink / raw)
  To: Nathan Sidwell, Jeff Law, Andi Kleen; +Cc: gcc-patches, jh, David Edelsohn

PING^1

On 10/13/2016 05:34 PM, Martin Liška wrote:
> Hello.
> 
> As it's very hard to guess from GCC driver whether a target supports atomic updates
> for GCOV counter or not, I decided to come up with a new option value (maybe-atomic),
> that would be transformed in a corresponding value (single or atomic) in tree-profile.c.
> The GCC driver selects the option when -pthread is present in the command line.
> 
> That should fix all tests failures seen on AIX target.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-10-31  9:13                                             ` Martin Liška
@ 2016-11-10 13:19                                               ` Martin Liška
  2016-11-10 15:43                                                 ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-11-10 13:19 UTC (permalink / raw)
  To: Nathan Sidwell, Jeff Law, Andi Kleen; +Cc: gcc-patches, jh, David Edelsohn

PING^2

On 10/31/2016 10:13 AM, Martin Liška wrote:
> PING^1
> 
> On 10/13/2016 05:34 PM, Martin Liška wrote:
>> Hello.
>>
>> As it's very hard to guess from GCC driver whether a target supports atomic updates
>> for GCOV counter or not, I decided to come up with a new option value (maybe-atomic),
>> that would be transformed in a corresponding value (single or atomic) in tree-profile.c.
>> The GCC driver selects the option when -pthread is present in the command line.
>>
>> That should fix all tests failures seen on AIX target.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
>>

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 13:19                                               ` Martin Liška
@ 2016-11-10 15:43                                                 ` Nathan Sidwell
  2016-11-10 15:55                                                   ` David Edelsohn
                                                                     ` (2 more replies)
  0 siblings, 3 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-11-10 15:43 UTC (permalink / raw)
  To: Martin Liška, Jeff Law, Andi Kleen; +Cc: gcc-patches, jh, David Edelsohn

On 11/10/2016 05:19 AM, Martin Liška wrote:

>> On 10/13/2016 05:34 PM, Martin Liška wrote:
>>> Hello.
>>>
>>> As it's very hard to guess from GCC driver whether a target supports atomic updates
>>> for GCOV counter or not, I decided to come up with a new option value (maybe-atomic),
>>> that would be transformed in a corresponding value (single or atomic) in tree-profile.c.
>>> The GCC driver selects the option when -pthread is present in the command line.
>>>
>>> That should fix all tests failures seen on AIX target.
>>>
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?

I dislike this.  If it's hard for gcc itself to know, how much harder 
for the user must it be?   (does gcc have another instance of an option 
that behaves 'prefer-A-or-B-if-you-can't'?

It's also not clear what problem it's solving for the user?  If the user 
needs atomic update, they should get a hard error if the target doesn't 
support it.  If they don't need atomic, why ask for it?

But as ever, I'm not going to veto it.

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 15:43                                                 ` Nathan Sidwell
@ 2016-11-10 15:55                                                   ` David Edelsohn
  2016-11-10 16:18                                                     ` Nathan Sidwell
  2016-11-10 15:58                                                   ` Martin Liška
  2016-11-10 16:14                                                   ` Nathan Sidwell
  2 siblings, 1 reply; 95+ messages in thread
From: David Edelsohn @ 2016-11-10 15:55 UTC (permalink / raw)
  To: Nathan Sidwell, Martin Liška
  Cc: Jeff Law, Andi Kleen, GCC Patches, Jan Hubicka

On Thu, Nov 10, 2016 at 10:43 AM, Nathan Sidwell <nathan@acm.org> wrote:
> On 11/10/2016 05:19 AM, Martin Liška wrote:
>
>>> On 10/13/2016 05:34 PM, Martin Liška wrote:
>>>>
>>>> Hello.
>>>>
>>>> As it's very hard to guess from GCC driver whether a target supports
>>>> atomic updates
>>>> for GCOV counter or not, I decided to come up with a new option value
>>>> (maybe-atomic),
>>>> that would be transformed in a corresponding value (single or atomic) in
>>>> tree-profile.c.
>>>> The GCC driver selects the option when -pthread is present in the
>>>> command line.
>>>>
>>>> That should fix all tests failures seen on AIX target.
>>>>
>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression
>>>> tests.
>>>>
>>>> Ready to be installed?
>
>
> I dislike this.  If it's hard for gcc itself to know, how much harder for
> the user must it be?   (does gcc have another instance of an option that
> behaves 'prefer-A-or-B-if-you-can't'?
>
> It's also not clear what problem it's solving for the user?  If the user
> needs atomic update, they should get a hard error if the target doesn't
> support it.  If they don't need atomic, why ask for it?
>
> But as ever, I'm not going to veto it.

Do you have a better suggestion?

gcc.c now imposes profile-update=atomic if -pthread is used, even if
the target does not support profile-update=atomic.

Either gcc.c must not impose profile-update=atomic or we need some way
of differentiating between when the request should fail because the
user really expects it and when the request should silently and gently
be ignored.

The atomic update feature is nice, but currently GCC is trying to be
too smart to guess how important the feature is to the user.

Thanks, David

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 15:43                                                 ` Nathan Sidwell
  2016-11-10 15:55                                                   ` David Edelsohn
@ 2016-11-10 15:58                                                   ` Martin Liška
  2016-11-10 16:17                                                     ` David Edelsohn
  2016-11-10 16:14                                                   ` Nathan Sidwell
  2 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-11-10 15:58 UTC (permalink / raw)
  To: Nathan Sidwell, Jeff Law, Andi Kleen; +Cc: gcc-patches, jh, David Edelsohn

On 11/10/2016 04:43 PM, Nathan Sidwell wrote:
> On 11/10/2016 05:19 AM, Martin Liška wrote:
> 
>>> On 10/13/2016 05:34 PM, Martin Liška wrote:
>>>> Hello.
>>>>
>>>> As it's very hard to guess from GCC driver whether a target supports atomic updates
>>>> for GCOV counter or not, I decided to come up with a new option value (maybe-atomic),
>>>> that would be transformed in a corresponding value (single or atomic) in tree-profile.c.
>>>> The GCC driver selects the option when -pthread is present in the command line.
>>>>
>>>> That should fix all tests failures seen on AIX target.
>>>>
>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>>
>>>> Ready to be installed?
> 
> I dislike this.  If it's hard for gcc itself to know, how much harder for the user must it be?   (does gcc have another instance of an option that behaves 'prefer-A-or-B-if-you-can't'?
> 
> It's also not clear what problem it's solving for the user?  If the user needs atomic update, they should get a hard error if the target doesn't support it.  If they don't need atomic, why ask for it?

My initial motivation was to automatically selected -fprofile-update=atomic if supported by a target and when '-pthread' is present on command line.
As it's very problematic to identify (from GCC driver) whether a target supports or not atomic updates, 'maybe' option is the only possible we can guess.

> 
> But as ever, I'm not going to veto it.

Other option is to disable selection of -fprofile-update=atomic automatically.

Martin

> 
> nathan
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 15:43                                                 ` Nathan Sidwell
  2016-11-10 15:55                                                   ` David Edelsohn
  2016-11-10 15:58                                                   ` Martin Liška
@ 2016-11-10 16:14                                                   ` Nathan Sidwell
  2016-11-10 16:16                                                     ` David Edelsohn
  2 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-11-10 16:14 UTC (permalink / raw)
  To: Martin Liška, Jeff Law, Andi Kleen; +Cc: gcc-patches, jh, David Edelsohn

On 11/10/2016 07:43 AM, Nathan Sidwell wrote:
> On 11/10/2016 05:19 AM, Martin Liška wrote:
>
>>> On 10/13/2016 05:34 PM, Martin Liška wrote:
>>>> Hello.
>>>>
>>>> As it's very hard to guess from GCC driver whether a target supports
>>>> atomic updates
>>>> for GCOV counter or not, I decided to come up with a new option
>>>> value (maybe-atomic),
>>>> that would be transformed in a corresponding value (single or
>>>> atomic) in tree-profile.c.
>>>> The GCC driver selects the option when -pthread is present in the
>>>> command line.
>>>>
>>>> That should fix all tests failures seen on AIX target.
>>>>
>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression
>>>> tests.
>>>>
>>>> Ready to be installed?
>
> I dislike this.  If it's hard for gcc itself to know, how much harder
> for the user must it be?   (does gcc have another instance of an option
> that behaves 'prefer-A-or-B-if-you-can't'?

Thinking further.  why isn't the right solution for 
-fprofile-update=atomic when faced with a target that cannot support it to:
a) issue an error and bail out at the first opportunity
b) or issue a warning and fall back to single threaded update?

For #b presumably there'll be the capability of suppressing that 
particular warning?

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 16:14                                                   ` Nathan Sidwell
@ 2016-11-10 16:16                                                     ` David Edelsohn
  0 siblings, 0 replies; 95+ messages in thread
From: David Edelsohn @ 2016-11-10 16:16 UTC (permalink / raw)
  To: Nathan Sidwell
  Cc: Martin Liška, Jeff Law, Andi Kleen, GCC Patches, Jan Hubicka

On Thu, Nov 10, 2016 at 11:14 AM, Nathan Sidwell <nathan@acm.org> wrote:
> On 11/10/2016 07:43 AM, Nathan Sidwell wrote:
>>
>> On 11/10/2016 05:19 AM, Martin Liška wrote:
>>
>>>> On 10/13/2016 05:34 PM, Martin Liška wrote:
>>>>>
>>>>> Hello.
>>>>>
>>>>> As it's very hard to guess from GCC driver whether a target supports
>>>>> atomic updates
>>>>> for GCOV counter or not, I decided to come up with a new option
>>>>> value (maybe-atomic),
>>>>> that would be transformed in a corresponding value (single or
>>>>> atomic) in tree-profile.c.
>>>>> The GCC driver selects the option when -pthread is present in the
>>>>> command line.
>>>>>
>>>>> That should fix all tests failures seen on AIX target.
>>>>>
>>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression
>>>>> tests.
>>>>>
>>>>> Ready to be installed?
>>
>>
>> I dislike this.  If it's hard for gcc itself to know, how much harder
>> for the user must it be?   (does gcc have another instance of an option
>> that behaves 'prefer-A-or-B-if-you-can't'?
>
>
> Thinking further.  why isn't the right solution for -fprofile-update=atomic
> when faced with a target that cannot support it to:
> a) issue an error and bail out at the first opportunity
> b) or issue a warning and fall back to single threaded update?
>
> For #b presumably there'll be the capability of suppressing that particular
> warning?

Because that incorrectly breaks a huge portion of the testsuite.
that's not what the user intended.

- David

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 15:58                                                   ` Martin Liška
@ 2016-11-10 16:17                                                     ` David Edelsohn
  2016-11-10 16:24                                                       ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: David Edelsohn @ 2016-11-10 16:17 UTC (permalink / raw)
  To: Martin Liška
  Cc: Nathan Sidwell, Jeff Law, Andi Kleen, GCC Patches, Jan Hubicka

On Thu, Nov 10, 2016 at 10:58 AM, Martin Liška <mliska@suse.cz> wrote:
> On 11/10/2016 04:43 PM, Nathan Sidwell wrote:
>> On 11/10/2016 05:19 AM, Martin Liška wrote:
>>
>>>> On 10/13/2016 05:34 PM, Martin Liška wrote:
>>>>> Hello.
>>>>>
>>>>> As it's very hard to guess from GCC driver whether a target supports atomic updates
>>>>> for GCOV counter or not, I decided to come up with a new option value (maybe-atomic),
>>>>> that would be transformed in a corresponding value (single or atomic) in tree-profile.c.
>>>>> The GCC driver selects the option when -pthread is present in the command line.
>>>>>
>>>>> That should fix all tests failures seen on AIX target.
>>>>>
>>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>>>
>>>>> Ready to be installed?
>>
>> I dislike this.  If it's hard for gcc itself to know, how much harder for the user must it be?   (does gcc have another instance of an option that behaves 'prefer-A-or-B-if-you-can't'?
>>
>> It's also not clear what problem it's solving for the user?  If the user needs atomic update, they should get a hard error if the target doesn't support it.  If they don't need atomic, why ask for it?
>
> My initial motivation was to automatically selected -fprofile-update=atomic if supported by a target and when '-pthread' is present on command line.
> As it's very problematic to identify (from GCC driver) whether a target supports or not atomic updates, 'maybe' option is the only possible we can guess.
>
>>
>> But as ever, I'm not going to veto it.
>
> Other option is to disable selection of -fprofile-update=atomic automatically.

Unfortunately, this cannot use a configure test or manually set value
based on target because the same gcc.c driver is invoked with
different options that may provide atomic update in some variants and
no atomic update in other (e.g., -m64) because the profile counter is
64 bits.

Maybe instead of adding "maybe", we need to change the severity of the
warning so that the warning is not emitted by default.

- David

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 15:55                                                   ` David Edelsohn
@ 2016-11-10 16:18                                                     ` Nathan Sidwell
  0 siblings, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-11-10 16:18 UTC (permalink / raw)
  To: David Edelsohn, Martin Liška
  Cc: Jeff Law, Andi Kleen, GCC Patches, Jan Hubicka

On 11/10/2016 07:55 AM, David Edelsohn wrote:

> gcc.c now imposes profile-update=atomic if -pthread is used, even if
> the target does not support profile-update=atomic.

ah, that's where this is coming from.

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 16:17                                                     ` David Edelsohn
@ 2016-11-10 16:24                                                       ` Martin Liška
  2016-11-10 17:31                                                         ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-11-10 16:24 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Nathan Sidwell, Jeff Law, Andi Kleen, GCC Patches, Jan Hubicka

On 11/10/2016 05:17 PM, David Edelsohn wrote:
> Maybe instead of adding "maybe", we need to change the severity of the
> warning so that the warning is not emitted by default.

Adding the warning option to -Wextra can be solution. Is it acceptable
approach?

Martin

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 16:24                                                       ` Martin Liška
@ 2016-11-10 17:31                                                         ` Nathan Sidwell
  2016-11-11 10:48                                                           ` Martin Liška
  0 siblings, 1 reply; 95+ messages in thread
From: Nathan Sidwell @ 2016-11-10 17:31 UTC (permalink / raw)
  To: Martin Liška, David Edelsohn
  Cc: Jeff Law, Andi Kleen, GCC Patches, Jan Hubicka

On 11/10/2016 08:24 AM, Martin Liška wrote:
> On 11/10/2016 05:17 PM, David Edelsohn wrote:
>> Maybe instead of adding "maybe", we need to change the severity of the
>> warning so that the warning is not emitted by default.
>
> Adding the warning option to -Wextra can be solution. Is it acceptable
> approach?

I don't think that's good.  Now I understand the -pthreads thing, we 
have different use cases.

1) user explicitly said -fprofile-update=FOO.  They shouldn't have to 
enable something else to get a diagnostic that FOO doesn't work.

2) driver implicitly said -fprofile-update=FOO, because the user said 
-pthreads but the driver doesn't know if FOO is acceptable.  We want to 
silently fallback to the old behaviour.

The proposed solution addresses #2 by having the driver say 
-fprofile-update=META-FOO.  My dislike is that we're exposing this to 
the user and they're going to start using it.  That strikes me as 
undesirable.

How hard is it to implement the fprofile-update option value as a list. 
I.e. '-fprofile-update=atomic,single', with semantics of 'pick the first 
one you can do'? If that's straightforwards, then that seems to me as a 
better solution for #2. [flyby-thought, have 'atomic,single' as an 
acceptable single option value?]

Failing that, Martin's solution is probably the sanest available 
solution, but I'd like to rename 'maybe-atomic' to the more meaningful 
'prefer-atomic'.  With 'maybe-atomic', I'm left wondering if it looks at 
the phase of the moon.

nathan

-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-10 17:31                                                         ` Nathan Sidwell
@ 2016-11-11 10:48                                                           ` Martin Liška
  2016-11-11 15:11                                                             ` Nathan Sidwell
  0 siblings, 1 reply; 95+ messages in thread
From: Martin Liška @ 2016-11-11 10:48 UTC (permalink / raw)
  To: Nathan Sidwell, David Edelsohn
  Cc: Jeff Law, Andi Kleen, GCC Patches, Jan Hubicka

On 11/10/2016 06:31 PM, Nathan Sidwell wrote:
> On 11/10/2016 08:24 AM, Martin Liška wrote:
>> On 11/10/2016 05:17 PM, David Edelsohn wrote:
>>> Maybe instead of adding "maybe", we need to change the severity of the
>>> warning so that the warning is not emitted by default.
>>
>> Adding the warning option to -Wextra can be solution. Is it acceptable
>> approach?
> 
> I don't think that's good.  Now I understand the -pthreads thing, we have different use cases.
> 
> 1) user explicitly said -fprofile-update=FOO.  They shouldn't have to enable something else to get a diagnostic that FOO doesn't work.
> 
> 2) driver implicitly said -fprofile-update=FOO, because the user said -pthreads but the driver doesn't know if FOO is acceptable.  We want to silently fallback to the old behaviour.
> 
> The proposed solution addresses #2 by having the driver say -fprofile-update=META-FOO.  My dislike is that we're exposing this to the user and they're going to start using it.  That strikes me as undesirable.
> 
> How hard is it to implement the fprofile-update option value as a list. I.e. '-fprofile-update=atomic,single', with semantics of 'pick the first one you can do'? If that's straightforwards, then that seems to me as a better solution for #2. [flyby-thought, have 'atomic,single' as an acceptable single option value?]

Hello.

We use lists like for -fsanitize=address,undefined, however as -fprofile-update has only 3 (and passing 'single,atomic' does not make sense), I would prefer
to s/maybe-atomic/prefer-atomic. I guess handling the option list in gcc.c and doing substitutions would be very inconvenient.

Thanks,
MArtin

> 
> Failing that, Martin's solution is probably the sanest available solution, but I'd like to rename 'maybe-atomic' to the more meaningful 'prefer-atomic'.  With 'maybe-atomic', I'm left wondering if it looks at the phase of the moon.
> 
> nathan
> 

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH] Introduce -fprofile-update=maybe-atomic
  2016-11-11 10:48                                                           ` Martin Liška
@ 2016-11-11 15:11                                                             ` Nathan Sidwell
  0 siblings, 0 replies; 95+ messages in thread
From: Nathan Sidwell @ 2016-11-11 15:11 UTC (permalink / raw)
  To: Martin Liška, David Edelsohn
  Cc: Jeff Law, Andi Kleen, GCC Patches, Jan Hubicka

On 11/11/2016 02:47 AM, Martin Liška wrote:

> We use lists like for -fsanitize=address,undefined, however as -fprofile-update has only 3 (and passing 'single,atomic' does not make sense), I would prefer
> to s/maybe-atomic/prefer-atomic. I guess handling the option list in gcc.c and doing substitutions would be very inconvenient.

ok.


-- 
Nathan Sidwell

^ permalink raw reply	[flat|nested] 95+ messages in thread

end of thread, other threads:[~2016-11-11 15:11 UTC | newest]

Thread overview: 95+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-01  8:50 [PATCH 0/4] Various GCOV/PGO improvements marxin
2016-08-01  8:49 ` [PATCH 3/4] Fix typo in gcov.texi marxin
2016-08-01  8:50 ` [PATCH 2/4] Remove __gcov_indirect_call_profiler marxin
2016-08-01  8:50 ` [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch marxin
2016-08-01 12:22   ` Nathan Sidwell
2016-08-01 13:29     ` Martin Liška
2016-08-04 14:48       ` Nathan Sidwell
2016-08-04 15:34         ` Martin Liška
2016-08-04 16:43           ` Nathan Sidwell
2016-08-04 17:03             ` Nathan Sidwell
2016-08-05  8:55               ` Martin Liška
2016-08-05 12:38                 ` Nathan Sidwell
2016-08-05 12:48                   ` Martin Liška
2016-08-05 13:14                     ` Nathan Sidwell
2016-08-05 13:43                       ` Martin Liška
2016-08-08 13:59                         ` [PATCH 5/N] Add new *_atomic counter update function, (-fprofile-update=atomic) Martin Liška
2016-08-08 15:24                           ` Nathan Sidwell
2016-08-08 16:51                             ` Martin Liška
2016-08-08 17:03                               ` Martin Liška
2016-08-09 12:36                                 ` Nathan Sidwell
2016-08-08 16:56                             ` [PATCH] Fix POW2 histogram Martin Liška
2016-08-09  8:41                               ` [PATCH 2/N] Fix usage of " Martin Liška
2016-08-09 12:37                                 ` Nathan Sidwell
2016-08-09 12:34                               ` [PATCH] Fix " Nathan Sidwell
2016-08-09 11:24                         ` [PATCH] Set -fprofile-update=atomic when -pthread is present Martin Liška
2016-08-09 12:40                           ` Nathan Sidwell
2016-08-09 19:04                           ` Andi Kleen
2016-08-12 13:31                             ` Martin Liška
2016-08-18  3:16                               ` Jeff Law
2016-08-18 11:02                                 ` Nathan Sidwell
2016-08-18 15:51                                 ` Andi Kleen
2016-08-18 15:53                                   ` Jeff Law
2016-10-03 12:13                                     ` Martin Liška
2016-10-03 12:26                                       ` Nathan Sidwell
2016-10-03 16:46                                         ` Jeff Law
2016-10-03 17:52                                           ` Andi Kleen
2016-10-04 12:05                                         ` Martin Liška
2016-10-05 17:54                                           ` Jeff Law
2016-10-13 15:34                                           ` [PATCH] Introduce -fprofile-update=maybe-atomic Martin Liška
2016-10-31  9:13                                             ` Martin Liška
2016-11-10 13:19                                               ` Martin Liška
2016-11-10 15:43                                                 ` Nathan Sidwell
2016-11-10 15:55                                                   ` David Edelsohn
2016-11-10 16:18                                                     ` Nathan Sidwell
2016-11-10 15:58                                                   ` Martin Liška
2016-11-10 16:17                                                     ` David Edelsohn
2016-11-10 16:24                                                       ` Martin Liška
2016-11-10 17:31                                                         ` Nathan Sidwell
2016-11-11 10:48                                                           ` Martin Liška
2016-11-11 15:11                                                             ` Nathan Sidwell
2016-11-10 16:14                                                   ` Nathan Sidwell
2016-11-10 16:16                                                     ` David Edelsohn
2016-08-18 15:54                                   ` [PATCH] Set -fprofile-update=atomic when -pthread is present Jakub Jelinek
2016-08-18 16:06                                     ` Richard Biener
2016-09-07 11:41                                       ` Martin Liška
     [not found]                                         ` <CAFiYyc0UaSzXhZmyG9QRkHGT4JFowxBfE2yb-NvXE=hR1xafdA@mail.gmail.com>
2016-09-15 10:18                                           ` [RFC] Speed-up -fprofile-update=atomic Martin Liška
2016-10-04  9:45                                             ` Richard Biener
2016-10-12 13:53                                               ` Martin Liška
2016-10-13  9:43                                                 ` Richard Biener
2016-10-17 11:47                                                   ` Martin Liška
     [not found]                                                     ` <CAFiYyc3eDT4g926PPZuktz5fEW=k-PibAcxhigx4GBcxoXNJFQ@mail.gmail.com>
2016-10-24 12:09                                                       ` Martin Liška
     [not found]                                                         ` <CAFiYyc1tSdTdqqkHcMp+dgE43+8tHL6kY8E07TCHoZBeUT-ggQ@mail.gmail.com>
2016-10-25 14:32                                                           ` Martin Liška
2016-10-26  9:29                                                             ` Richard Biener
2016-10-26  9:32                                                               ` Richard Biener
2016-08-18 16:04                                   ` [PATCH] Set -fprofile-update=atomic when -pthread is present Richard Biener
2016-08-10 12:57                         ` [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch Nathan Sidwell
2016-08-13 12:14                         ` [BUILDROBOT] avr broken (was: [PATCH 1/4] Cherry-pick fprofile-generate-atomic from google/gcc-4_9 branch) Jan-Benedict Glaw
     [not found]                           ` <4455937b-eba7-fe66-fe1a-3172567dd1e4@suse.cz>
2016-08-16 13:36                             ` [BUILDROBOT] avr broken Nathan Sidwell
     [not found]                               ` <617e8799-b7db-fefd-b3a3-842e9a7decfd@suse.cz>
2016-08-16 14:31                                 ` Nathan Sidwell
2016-08-16 17:05                                   ` Jan-Benedict Glaw
2016-08-16 18:26                                     ` Nathan Sidwell
2016-08-17  7:21                                       ` Denis Chertykov
2016-08-17  7:22                                         ` Martin Liška
2016-08-17  8:11                                       ` Jan-Benedict Glaw
2016-08-16 12:56                         ` [PATCH] Detect whether target can use -fprofile-update=atomic Martin Liška
2016-08-16 14:31                           ` Nathan Sidwell
2016-09-06 10:57                             ` Martin Liška
2016-09-06 11:17                               ` David Edelsohn
2016-09-06 12:15                                 ` Nathan Sidwell
2016-09-06 12:39                                   ` Jakub Jelinek
2016-09-06 12:43                                     ` David Edelsohn
2016-09-06 12:41                                   ` David Edelsohn
2016-09-06 12:51                                     ` Martin Liška
2016-09-06 13:13                                       ` Jakub Jelinek
2016-09-06 13:15                                         ` Martin Liška
2016-09-06 13:45                                           ` Jakub Jelinek
2016-09-06 13:50                                             ` Martin Liška
2016-09-06 14:06                                               ` Jakub Jelinek
2016-09-07  7:52                                               ` Christophe Lyon
2016-09-07  9:35                                                 ` Martin Liška
2016-09-07 16:06                                                   ` Christophe Lyon
2016-09-12 20:20                                                   ` Jeff Law
2016-09-29  8:31                                               ` Rainer Orth
2016-08-01  8:50 ` [PATCH 4/4] Add tests for __gcov_dump and __gcov_reset marxin
2016-08-01 12:11 ` [PATCH 0/4] Various GCOV/PGO improvements Nathan Sidwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).