From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dedi548.your-server.de (dedi548.your-server.de [85.10.215.148]) by sourceware.org (Postfix) with ESMTPS id 72FCB3858CD1 for ; Tue, 14 Nov 2023 22:08:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 72FCB3858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embedded-brains.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embedded-brains.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 72FCB3858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=85.10.215.148 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699999720; cv=none; b=Vr2l3lR8iy/CzP6ajsA1U639HdWEjBKgvwmYxJQSkzV+JptuOz71JgOtfFUKCyoM7WUpPBCfAX4TA9mCNosHdQeklaaXzfYokN8oOZ5M88IjG8BjHx6gbFj6qYJSNkYklyJKRX2rikGlEuNGXmEH7r6Ew9I+BiEO9YlaTqbxTD4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699999720; c=relaxed/simple; bh=yF20NJYeayFz3FoVESFSVqa5x0umlqbrQ3eFg7yNzjc=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=Pm1nW3Z+tZK2OdwtPHP26wBJQ1AKxQJp4N9/oC6h4+dXwnTBEPMpaEcyXcSb1su5iUZWonp3gJ+7H9aQF2FkqwWZhe1uD4gLtnxeCVIGJG8jVS12xtygnuOSaTNniaOQD8dxUlOpUr7P4qBiUrLxgHLaG8bAQhDjgDbuRr/nlhM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from sslproxy05.your-server.de ([78.46.172.2]) by dedi548.your-server.de with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1r31aH-0000TC-A1 for gcc-patches@gcc.gnu.org; Tue, 14 Nov 2023 23:08:37 +0100 Received: from [82.100.198.138] (helo=mail.embedded-brains.de) by sslproxy05.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1r31aH-000Qf4-5z for gcc-patches@gcc.gnu.org; Tue, 14 Nov 2023 23:08:37 +0100 Received: from localhost (localhost [127.0.0.1]) by mail.embedded-brains.de (Postfix) with ESMTP id C1C0D48018B for ; Tue, 14 Nov 2023 23:08:36 +0100 (CET) Received: from mail.embedded-brains.de ([127.0.0.1]) by localhost (zimbra.eb.localhost [127.0.0.1]) (amavis, port 10032) with ESMTP id D-vWHb_f5mWr for ; Tue, 14 Nov 2023 23:08:33 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by mail.embedded-brains.de (Postfix) with ESMTP id 53F45480207 for ; Tue, 14 Nov 2023 23:08:33 +0100 (CET) X-Virus-Scanned: amavis at zimbra.eb.localhost Received: from mail.embedded-brains.de ([127.0.0.1]) by localhost (zimbra.eb.localhost [127.0.0.1]) (amavis, port 10026) with ESMTP id Ycr0MXNY4gia for ; Tue, 14 Nov 2023 23:08:33 +0100 (CET) Received: from zimbra.eb.localhost (unknown [10.10.171.18]) by mail.embedded-brains.de (Postfix) with ESMTPSA id 7EBE848018B for ; Tue, 14 Nov 2023 23:08:31 +0100 (CET) From: Sebastian Huber To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/4] gcov: Improve -fprofile-update=atomic Date: Tue, 14 Nov 2023 23:08:25 +0100 Message-Id: <20231114220825.22074-5-sebastian.huber@embedded-brains.de> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231114220825.22074-1-sebastian.huber@embedded-brains.de> References: <20231114220825.22074-1-sebastian.huber@embedded-brains.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Authenticated-Sender: smtp-embedded@poldi-networks.de X-Virus-Scanned: Clear (ClamAV 0.103.10/27093/Tue Nov 14 09:38:37 2023) X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The code coverage support uses counters to determine which edges in the c= ontrol flow graph were executed. If a counter overflows, then the code coverage information is invalid. Therefore the counter type should be a 64-bit in= teger. In multi-threaded applications, it is important that the counter incremen= ts are atomic. This is not the case by default. The user can enable atomic cou= nter increments through the -fprofile-update=3Datomic and -fprofile-update=3Dprefer-atomic options. If the target supports 64-bit atomic operations, then everything is fine.= If not and -fprofile-update=3Dprefer-atomic was chosen by the user, then non= -atomic counter increments will be used. However, if the target does not support= the required atomic operations and -fprofile-atomic=3Dupdate was chosen by th= e user, then a warning was issued and as a forced fallback to non-atomic operatio= ns was done. This is probably not what a user wants. There is still hardware o= n the market which does not have atomic operations and is used for multi-thread= ed applications. A user which selects -fprofile-update=3Datomic wants consi= stent code coverage data and not random data. This patch removes the fallback to non-atomic operations for -fprofile-update=3Datomic the target platform supports libatomic. To mitigate potential performance issues an optimization for systems which only support 32-bit atomic operations is provided. Here, the edge counter increments are done like this: low =3D __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED); high_inc =3D low =3D=3D 0 ? 1 : 0; __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED); In gimple_gen_time_profiler() this split operation cannot be used, since = the updated counter value is also required. Here, a library call is emitted.= This is not a performance issue since the update is only done if counters[0] =3D= =3D 0. gcc/c-family/ChangeLog: * c-cppbuiltin.cc (c_cpp_builtins): Define __LIBGCC_HAVE_LIBATOMIC for libgcov. gcc/ChangeLog: * doc/invoke.texi (-fprofile-update): Clarify default method. Document the atomic method behaviour. * tree-profile.cc (enum counter_update_method): New. (counter_update): Likewise. (gen_counter_update): Use counter_update_method. Split the atomic counter update in two 32-bit atomic operations if necessary. (tree_profiling): Select counter_update_method. libgcc/ChangeLog: * libgcov.h (GCOV_SUPPORTS_ATOMIC): Always define it. Set it also to 1, if __LIBGCC_HAVE_LIBATOMIC is defined. --- gcc/c-family/c-cppbuiltin.cc | 2 + gcc/doc/invoke.texi | 19 ++++++- gcc/tree-profile.cc | 99 +++++++++++++++++++++++++++++++++--- libgcc/libgcov.h | 10 ++-- 4 files changed, 114 insertions(+), 16 deletions(-) diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc index cdf9850cb19e..e8576737fafb 100644 --- a/gcc/c-family/c-cppbuiltin.cc +++ b/gcc/c-family/c-cppbuiltin.cc @@ -1538,6 +1538,8 @@ c_cpp_builtins (cpp_reader *pfile) /* For libgcov. */ builtin_define_with_int_value ("__LIBGCC_VTABLE_USES_DESCRIPTORS__= ", TARGET_VTABLE_USES_DESCRIPTORS); + builtin_define_with_int_value ("__LIBGCC_HAVE_LIBATOMIC", + TARGET_HAVE_LIBATOMIC); } =20 /* For use in assembly language. */ diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index de40f62e219c..8fe3c86ad419 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16603,7 +16603,24 @@ while the second one prevents profile corruption= by emitting thread-safe code. Using @samp{prefer-atomic} would be transformed either to @samp{atomic}, when supported by a target, or to @samp{single} otherwise. The GCC driv= er automatically selects @samp{prefer-atomic} when @option{-pthread} -is present in the command line. +is present in the command line, otherwise the default method is @samp{si= ngle}. + +If @samp{atomic} is selected, then the profile information is updated us= ing +atomic operations on a best-effort basis. Ideally, the profile informat= ion is +updated through atomic operations in hardware. If the target platform d= oes not +support the required atomic operations in hardware, however, @file{libat= omic} +is available, then the profile information is updated through calls to +@file{libatomic}. If the target platform neither supports the required = atomic +operations in hardware nor @file{libatomic}, then the profile informatio= n is +not atomically updated and a warning is issued. In this case, the obtai= ned +profiling information may be corrupt for multi-threaded applications. + +For performance reasons, if 64-bit counters are used for the profiling +information and the target platform only supports 32-bit atomic operatio= ns in +hardware, then the performance critical profiling updates are done using= two +32-bit atomic operations for each counter update. If a signal interrupt= s these +two operations updating a counter, then the profiling information may be= in an +inconsistent state. =20 @opindex fprofile-filter-files @item -fprofile-filter-files=3D@var{regex} diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc index 24805ff905c7..12255f06f992 100644 --- a/gcc/tree-profile.cc +++ b/gcc/tree-profile.cc @@ -73,6 +73,41 @@ static GTY(()) tree ic_tuple_var; static GTY(()) tree ic_tuple_counters_field; static GTY(()) tree ic_tuple_callee_field; =20 +/* Types of counter update methods. + + By default, the counter updates are done for a single threaded system + (COUNTER_UPDATE_SINGLE_THREAD). + + If the user selected atomic profile counter updates + (-fprofile-update=3Datomic), then the counter updates will be done at= omically + on a best-effort basis. One of three methods to do the counter updat= es is + selected according to the target capabilities. + + Ideally, the counter updates are done through atomic operations in ha= rdware + (COUNTER_UPDATE_ATOMIC_BUILTIN). + + If the target supports only 32-bit atomic increments and gcov_type_no= de is a + 64-bit integer type, then for the profile edge counters the increment= is + performed through two separate 32-bit atomic increments + (COUNTER_UPDATE_ATOMIC_SPLIT or COUNTER_UPDATE_ATOMIC_PARTIAL). If t= he + target supports libatomic (TARGET_HAVE_LIBATOMIC), then other counter + updates are carried out by libatomic calls (COUNTER_UPDATE_ATOMIC_SPL= IT). + If the target does not support libatomic, then the other counter upda= tes are + not done atomically (COUNTER_UPDATE_ATOMIC_PARTIAL) and a warning is + issued. + + If the target does not support atomic operations in hardware, however= , it + supports libatomic, then all updates are carried out by libatomic cal= ls + (COUNTER_UPDATE_ATOMIC_BUILTIN). */ +enum counter_update_method { + COUNTER_UPDATE_SINGLE_THREAD, + COUNTER_UPDATE_ATOMIC_BUILTIN, + COUNTER_UPDATE_ATOMIC_SPLIT, + COUNTER_UPDATE_ATOMIC_PARTIAL +}; + +static counter_update_method counter_update =3D COUNTER_UPDATE_SINGLE_TH= READ; + /* Do initialization work for the edge profiler. */ =20 /* Add code: @@ -269,7 +304,8 @@ gen_counter_update (gimple_stmt_iterator *gsi, tree c= ounter, tree result, tree one =3D build_int_cst (type, 1); tree relaxed =3D build_int_cst (integer_type_node, MEMMODEL_RELAXED); =20 - if (flag_profile_update =3D=3D PROFILE_UPDATE_ATOMIC) + if (counter_update =3D=3D COUNTER_UPDATE_ATOMIC_BUILTIN || + (result && counter_update =3D=3D COUNTER_UPDATE_ATOMIC_SPLIT)) { /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */ tree f =3D builtin_decl_explicit (TYPE_PRECISION (type) > 32 @@ -278,6 +314,38 @@ gen_counter_update (gimple_stmt_iterator *gsi, tree = counter, tree result, gcall *call =3D gimple_build_call (f, 3, addr, one, relaxed); gen_assign_counter_update (gsi, call, f, result, name); } + else if (!result && (counter_update =3D=3D COUNTER_UPDATE_ATOMIC_SPLIT= || + counter_update =3D=3D COUNTER_UPDATE_ATOMIC_PARTIAL)) + { + /* low =3D __atomic_add_fetch_4 (addr, 1, MEMMODEL_RELAXED); + high_inc =3D low =3D=3D 0 ? 1 : 0; + __atomic_add_fetch_4 (addr_high, high_inc, MEMMODEL_RELAXED); */ + tree zero32 =3D build_zero_cst (uint32_type_node); + tree one32 =3D build_one_cst (uint32_type_node); + tree addr_high =3D make_temp_ssa_name (TREE_TYPE (addr), NULL, nam= e); + tree four =3D build_int_cst (size_type_node, 4); + gassign *assign1 =3D gimple_build_assign (addr_high, POINTER_PLUS_= EXPR, + addr, four); + gsi_insert_after (gsi, assign1, GSI_NEW_STMT); + if (WORDS_BIG_ENDIAN) + std::swap (addr, addr_high); + tree f =3D builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4); + gcall *call1 =3D gimple_build_call (f, 3, addr, one, relaxed); + tree low =3D make_temp_ssa_name (uint32_type_node, NULL, name); + gimple_call_set_lhs (call1, low); + gsi_insert_after (gsi, call1, GSI_NEW_STMT); + tree is_zero =3D make_temp_ssa_name (boolean_type_node, NULL, name= ); + gassign *assign2 =3D gimple_build_assign (is_zero, EQ_EXPR, low, + zero32); + gsi_insert_after (gsi, assign2, GSI_NEW_STMT); + tree high_inc =3D make_temp_ssa_name (uint32_type_node, NULL, name= ); + gassign *assign3 =3D gimple_build_assign (high_inc, COND_EXPR, + is_zero, one32, zero32); + gsi_insert_after (gsi, assign3, GSI_NEW_STMT); + gcall *call2 =3D gimple_build_call (f, 3, addr_high, high_inc, + relaxed); + gsi_insert_after (gsi, call2, GSI_NEW_STMT); + } else { tree tmp1 =3D make_temp_ssa_name (type, NULL, name); @@ -689,15 +757,20 @@ tree_profiling (void) struct cgraph_node *node; =20 /* Verify whether we can utilize atomic update operations. */ - bool can_support_atomic =3D false; + bool can_support_atomic =3D TARGET_HAVE_LIBATOMIC; unsigned HOST_WIDE_INT gcov_type_size =3D tree_to_uhwi (TYPE_SIZE_UNIT (get_gcov_type ())); - if (gcov_type_size =3D=3D 4) - can_support_atomic - =3D HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi= ; - else if (gcov_type_size =3D=3D 8) - can_support_atomic - =3D HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi= ; + bool have_atomic_4 + =3D HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi; + bool have_atomic_8 + =3D HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi; + if (!can_support_atomic) + { + if (gcov_type_size =3D=3D 4) + can_support_atomic =3D have_atomic_4; + else if (gcov_type_size =3D=3D 8) + can_support_atomic =3D have_atomic_8; + } =20 if (flag_profile_update =3D=3D PROFILE_UPDATE_ATOMIC && !can_support_atomic) @@ -710,6 +783,16 @@ tree_profiling (void) flag_profile_update =3D can_support_atomic ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE; =20 + if (flag_profile_update =3D=3D PROFILE_UPDATE_ATOMIC) + { + if (gcov_type_size =3D=3D 8 && !have_atomic_8 && have_atomic_4) + counter_update =3D COUNTER_UPDATE_ATOMIC_SPLIT; + else + counter_update =3D COUNTER_UPDATE_ATOMIC_BUILTIN; + } + else if (gcov_type_size =3D=3D 8 && have_atomic_4) + counter_update =3D COUNTER_UPDATE_ATOMIC_PARTIAL; + /* This is a small-ipa pass that gets called only once, from cgraphunit.cc:ipa_passes(). */ gcc_assert (symtab->state =3D=3D IPA_SSA); diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h index 763118ea5b52..d04c070d0cfa 100644 --- a/libgcc/libgcov.h +++ b/libgcc/libgcov.h @@ -95,18 +95,14 @@ typedef unsigned gcov_type_unsigned __attribute__ ((m= ode (QI))); #define GCOV_LOCKED_WITH_LOCKING 0 #endif =20 -#ifndef GCOV_SUPPORTS_ATOMIC /* Detect whether target can support atomic update of profilers. */ -#if __SIZEOF_LONG_LONG__ =3D=3D 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 -#define GCOV_SUPPORTS_ATOMIC 1 -#else -#if __SIZEOF_LONG_LONG__ =3D=3D 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 +#if (__SIZEOF_LONG_LONG__ =3D=3D 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4= ) || \ + (__SIZEOF_LONG_LONG__ =3D=3D 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8= ) || \ + defined (__LIBGCC_HAVE_LIBATOMIC) #define GCOV_SUPPORTS_ATOMIC 1 #else #define GCOV_SUPPORTS_ATOMIC 0 #endif -#endif -#endif =20 /* In libgcov we need these functions to be extern, so prefix them with __gcov. In libgcov they must also be hidden so that the instance in --=20 2.35.3