public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r14-5579] gcov: Improve -fprofile-update=atomic
@ 2023-11-18 11:45 Sebastian Huber
  0 siblings, 0 replies; only message in thread
From: Sebastian Huber @ 2023-11-18 11:45 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:20a3c74c347429c109bc7002285b735be83f6a0b

commit r14-5579-g20a3c74c347429c109bc7002285b735be83f6a0b
Author: Sebastian Huber <sebastian.huber@embedded-brains.de>
Date:   Tue Nov 14 21:36:51 2023 +0100

    gcov: Improve -fprofile-update=atomic
    
    The code coverage support uses counters to determine which edges in the control
    flow graph were executed.  If a counter overflows, then the code coverage
    information is invalid.  Therefore the counter type should be a 64-bit integer.
    In multi-threaded applications, it is important that the counter increments are
    atomic.  This is not the case by default.  The user can enable atomic counter
    increments through the -fprofile-update=atomic and
    -fprofile-update=prefer-atomic options.
    
    If the target supports 64-bit atomic operations, then everything is fine.  If
    not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
    counter increments will be used.  However, if the target does not support the
    required atomic operations and -fprofile-atomic=update was chosen by the user,
    then a warning was issued and as a forced fallback to non-atomic operations was
    done.  This is probably not what a user wants.  There is still hardware on the
    market which does not have atomic operations and is used for multi-threaded
    applications.  A user which selects -fprofile-update=atomic wants consistent
    code coverage data and not random data.
    
    This patch removes the fallback to non-atomic operations for
    -fprofile-update=atomic the target platform supports libatomic.  To
    mitigate potential performance issues an optimization for systems which
    only support 32-bit atomic operations is provided.  Here, the edge
    counter increments are done like this:
    
      low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED);
      high_inc = low == 0 ? 1 : 0;
      __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED);
    
    In gimple_gen_time_profiler() this split operation cannot be used, since the
    updated counter value is also required.  Here, a library call is emitted.  This
    is not a performance issue since the update is only done if counters[0] == 0.
    
    gcc/c-family/ChangeLog:
    
            * c-cppbuiltin.cc (c_cpp_builtins):  Define
            __LIBGCC_HAVE_LIBATOMIC for libgcov.
    
    gcc/ChangeLog:
    
            * doc/invoke.texi (-fprofile-update): Clarify default method.  Document
            the atomic method behaviour.
            * tree-profile.cc (enum counter_update_method): New.
            (counter_update): Likewise.
            (gen_counter_update): Use counter_update_method.  Split the
            atomic counter update in two 32-bit atomic operations if
            necessary.
            (tree_profiling): Select counter_update_method.
    
    libgcc/ChangeLog:
    
            * libgcov.h (GCOV_SUPPORTS_ATOMIC): Always define it.
            Set it also to 1, if __LIBGCC_HAVE_LIBATOMIC is defined.

Diff:
---
 gcc/c-family/c-cppbuiltin.cc |  2 +
 gcc/doc/invoke.texi          | 19 ++++++++-
 gcc/tree-profile.cc          | 99 ++++++++++++++++++++++++++++++++++++++++----
 libgcc/libgcov.h             | 10 ++---
 4 files changed, 114 insertions(+), 16 deletions(-)

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index 5a5309d0d09..8d2f07a96ce 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1567,6 +1567,8 @@ c_cpp_builtins (cpp_reader *pfile)
       /* For libgcov.  */
       builtin_define_with_int_value ("__LIBGCC_VTABLE_USES_DESCRIPTORS__",
 				     TARGET_VTABLE_USES_DESCRIPTORS);
+      builtin_define_with_int_value ("__LIBGCC_HAVE_LIBATOMIC",
+				     targetm.have_libatomic);
     }
 
   /* For use in assembly language.  */
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c0b571327fb..557d613a1e6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16982,7 +16982,24 @@ while the second one prevents profile corruption by emitting thread-safe code.
 Using @samp{prefer-atomic} would be transformed either to @samp{atomic},
 when supported by a target, or to @samp{single} otherwise.  The GCC driver
 automatically selects @samp{prefer-atomic} when @option{-pthread}
-is present in the command line.
+is present in the command line, otherwise the default method is @samp{single}.
+
+If @samp{atomic} is selected, then the profile information is updated using
+atomic operations on a best-effort basis.  Ideally, the profile information is
+updated through atomic operations in hardware.  If the target platform does not
+support the required atomic operations in hardware, however, @file{libatomic}
+is available, then the profile information is updated through calls to
+@file{libatomic}.  If the target platform neither supports the required atomic
+operations in hardware nor @file{libatomic}, then the profile information is
+not atomically updated and a warning is issued.  In this case, the obtained
+profiling information may be corrupt for multi-threaded applications.
+
+For performance reasons, if 64-bit counters are used for the profiling
+information and the target platform only supports 32-bit atomic operations in
+hardware, then the performance critical profiling updates are done using two
+32-bit atomic operations for each counter update.  If a signal interrupts these
+two operations updating a counter, then the profiling information may be in an
+inconsistent state.
 
 @opindex fprofile-filter-files
 @item -fprofile-filter-files=@var{regex}
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 24805ff905c..7d3cb1ef089 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -73,6 +73,41 @@ static GTY(()) tree ic_tuple_var;
 static GTY(()) tree ic_tuple_counters_field;
 static GTY(()) tree ic_tuple_callee_field;
 
+/* Types of counter update methods.
+
+   By default, the counter updates are done for a single threaded system
+   (COUNTER_UPDATE_SINGLE_THREAD).
+
+   If the user selected atomic profile counter updates
+   (-fprofile-update=atomic), then the counter updates will be done atomically
+   on a best-effort basis.  One of three methods to do the counter updates is
+   selected according to the target capabilities.
+
+   Ideally, the counter updates are done through atomic operations in hardware
+   (COUNTER_UPDATE_ATOMIC_BUILTIN).
+
+   If the target supports only 32-bit atomic increments and gcov_type_node is a
+   64-bit integer type, then for the profile edge counters the increment is
+   performed through two separate 32-bit atomic increments
+   (COUNTER_UPDATE_ATOMIC_SPLIT or COUNTER_UPDATE_ATOMIC_PARTIAL).  If the
+   target supports libatomic (targetm.have_libatomic), then other counter
+   updates are carried out by libatomic calls (COUNTER_UPDATE_ATOMIC_SPLIT).
+   If the target does not support libatomic, then the other counter updates are
+   not done atomically (COUNTER_UPDATE_ATOMIC_PARTIAL) and a warning is
+   issued.
+
+   If the target does not support atomic operations in hardware, however,  it
+   supports libatomic, then all updates are carried out by libatomic calls
+   (COUNTER_UPDATE_ATOMIC_BUILTIN).  */
+enum counter_update_method {
+  COUNTER_UPDATE_SINGLE_THREAD,
+  COUNTER_UPDATE_ATOMIC_BUILTIN,
+  COUNTER_UPDATE_ATOMIC_SPLIT,
+  COUNTER_UPDATE_ATOMIC_PARTIAL
+};
+
+static counter_update_method counter_update = COUNTER_UPDATE_SINGLE_THREAD;
+
 /* Do initialization work for the edge profiler.  */
 
 /* Add code:
@@ -269,7 +304,8 @@ gen_counter_update (gimple_stmt_iterator *gsi, tree counter, tree result,
   tree one = build_int_cst (type, 1);
   tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED);
 
-  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+  if (counter_update == COUNTER_UPDATE_ATOMIC_BUILTIN ||
+      (result && counter_update == COUNTER_UPDATE_ATOMIC_SPLIT))
     {
       /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
       tree f = builtin_decl_explicit (TYPE_PRECISION (type) > 32
@@ -278,6 +314,38 @@ gen_counter_update (gimple_stmt_iterator *gsi, tree counter, tree result,
       gcall *call = gimple_build_call (f, 3, addr, one, relaxed);
       gen_assign_counter_update (gsi, call, f, result, name);
     }
+  else if (!result && (counter_update == COUNTER_UPDATE_ATOMIC_SPLIT ||
+		       counter_update == COUNTER_UPDATE_ATOMIC_PARTIAL))
+    {
+      /* low = __atomic_add_fetch_4 (addr, 1, MEMMODEL_RELAXED);
+	 high_inc = low == 0 ? 1 : 0;
+	 __atomic_add_fetch_4 (addr_high, high_inc, MEMMODEL_RELAXED); */
+      tree zero32 = build_zero_cst (uint32_type_node);
+      tree one32 = build_one_cst (uint32_type_node);
+      tree addr_high = make_temp_ssa_name (TREE_TYPE (addr), NULL, name);
+      tree four = build_int_cst (size_type_node, 4);
+      gassign *assign1 = gimple_build_assign (addr_high, POINTER_PLUS_EXPR,
+					      addr, four);
+      gsi_insert_after (gsi, assign1, GSI_NEW_STMT);
+      if (WORDS_BIG_ENDIAN)
+	std::swap (addr, addr_high);
+      tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4);
+      gcall *call1 = gimple_build_call (f, 3, addr, one, relaxed);
+      tree low = make_temp_ssa_name (uint32_type_node, NULL, name);
+      gimple_call_set_lhs (call1, low);
+      gsi_insert_after (gsi, call1, GSI_NEW_STMT);
+      tree is_zero = make_temp_ssa_name (boolean_type_node, NULL, name);
+      gassign *assign2 = gimple_build_assign (is_zero, EQ_EXPR, low,
+					      zero32);
+      gsi_insert_after (gsi, assign2, GSI_NEW_STMT);
+      tree high_inc = make_temp_ssa_name (uint32_type_node, NULL, name);
+      gassign *assign3 = gimple_build_assign (high_inc, COND_EXPR,
+					      is_zero, one32, zero32);
+      gsi_insert_after (gsi, assign3, GSI_NEW_STMT);
+      gcall *call2 = gimple_build_call (f, 3, addr_high, high_inc,
+					relaxed);
+      gsi_insert_after (gsi, call2, GSI_NEW_STMT);
+    }
   else
     {
       tree tmp1 = make_temp_ssa_name (type, NULL, name);
@@ -689,15 +757,20 @@ tree_profiling (void)
   struct cgraph_node *node;
 
   /* Verify whether we can utilize atomic update operations.  */
-  bool can_support_atomic = false;
+  bool can_support_atomic = targetm.have_libatomic;
   unsigned HOST_WIDE_INT gcov_type_size
     = tree_to_uhwi (TYPE_SIZE_UNIT (get_gcov_type ()));
-  if (gcov_type_size == 4)
-    can_support_atomic
-      = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
-  else if (gcov_type_size == 8)
-    can_support_atomic
-      = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+  bool have_atomic_4
+    = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
+  bool have_atomic_8
+    = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
+  if (!can_support_atomic)
+    {
+      if (gcov_type_size == 4)
+	can_support_atomic = have_atomic_4;
+      else if (gcov_type_size == 8)
+	can_support_atomic = have_atomic_8;
+    }
 
   if (flag_profile_update == PROFILE_UPDATE_ATOMIC
       && !can_support_atomic)
@@ -710,6 +783,16 @@ tree_profiling (void)
     flag_profile_update = can_support_atomic
       ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE;
 
+  if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
+    {
+      if (gcov_type_size == 8 && !have_atomic_8 && have_atomic_4)
+	counter_update = COUNTER_UPDATE_ATOMIC_SPLIT;
+      else
+	counter_update = COUNTER_UPDATE_ATOMIC_BUILTIN;
+    }
+  else if (gcov_type_size == 8 && have_atomic_4)
+      counter_update = COUNTER_UPDATE_ATOMIC_PARTIAL;
+
   /* This is a small-ipa pass that gets called only once, from
      cgraphunit.cc:ipa_passes().  */
   gcc_assert (symtab->state == IPA_SSA);
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 763118ea5b5..d04c070d0cf 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -95,18 +95,14 @@ typedef unsigned gcov_type_unsigned __attribute__ ((mode (QI)));
 #define GCOV_LOCKED_WITH_LOCKING 0
 #endif
 
-#ifndef GCOV_SUPPORTS_ATOMIC
 /* Detect whether target can support atomic update of profilers.  */
-#if __SIZEOF_LONG_LONG__ == 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
-#define GCOV_SUPPORTS_ATOMIC 1
-#else
-#if __SIZEOF_LONG_LONG__ == 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
+#if (__SIZEOF_LONG_LONG__ == 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) || \
+    (__SIZEOF_LONG_LONG__ == 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8) || \
+    defined (__LIBGCC_HAVE_LIBATOMIC)
 #define GCOV_SUPPORTS_ATOMIC 1
 #else
 #define GCOV_SUPPORTS_ATOMIC 0
 #endif
-#endif
-#endif
 
 /* In libgcov we need these functions to be extern, so prefix them with
    __gcov.  In libgcov they must also be hidden so that the instance in

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-11-18 11:45 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-18 11:45 [gcc r14-5579] gcov: Improve -fprofile-update=atomic Sebastian Huber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).