From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42d.google.com (mail-wr1-x42d.google.com [IPv6:2a00:1450:4864:20::42d]) by sourceware.org (Postfix) with ESMTPS id E6309385E018 for ; Mon, 13 Jun 2022 09:46:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E6309385E018 Received: by mail-wr1-x42d.google.com with SMTP id u8so6354213wrm.13 for ; Mon, 13 Jun 2022 02:46:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=t294zpF3pI8se1HB6mmrTjuiRHSqLyljlmYsjq524HY=; b=rHlLiD09s4T49wUGozaIpm+JhCWT9qy+C6mfLbBmZ62mUh+9mavmsAaB9ns2t0oTLT xDXwTkhGpDJuknacfoOFjKZPlbrLSkAsTaODdWiGNOiQQ/jvb3EN5TSNM6JMHA847M2Z GVNYj9YbELgPVRKBfbMccpx9MFkRS83rqj55VsQz3wHCRhXkchAhL+qFFxLEhbW/3Uvu xvZUCCJws2XQJgtFZARMSRcyxcqZOCUlgYChJBezG6ULkbbOQDB3Wjnbamz6hDodKfpT tG2LQq9L1tcjgvWaV89ox2BJEbzvzoG8i0f/RAuVKdKXgZjuEXZBPkl44Fzz5/uohR6Q Oo5Q== X-Gm-Message-State: AOAM530MFm7rRINu7+QRivEb9b2vX6yU04eNZGOgIGk8P/JABHrTyyfF gcVCJ1B8G+yFZ2aEIKZcouQ0nA== X-Google-Smtp-Source: ABdhPJyglf4qNRF3yyTYQrFdx+6Ov32Zp6iYrGdAgbO8k8ZtWyPwrcxmHceGuCAIixh+btZ6+WRhBw== X-Received: by 2002:adf:fb52:0:b0:216:9eff:342b with SMTP id c18-20020adffb52000000b002169eff342bmr44299679wrs.356.1655113579643; Mon, 13 Jun 2022 02:46:19 -0700 (PDT) Received: from fomalhaut.localnet ([2a01:e0a:8d5:d990:bf38:f508:6f40:de1d]) by smtp.gmail.com with ESMTPSA id t4-20020a05600001c400b0020e5b4ebaecsm8058580wrx.4.2022.06.13.02.46.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Jun 2022 02:46:18 -0700 (PDT) From: Eric Botcazou X-Google-Original-From: Eric Botcazou To: Richard Biener Cc: GCC Patches Subject: Re: [PATCH] Introduce -finstrument-functions-once Date: Mon, 13 Jun 2022 11:46:17 +0200 Message-ID: <3437346.iIbC2pHGDl@fomalhaut> In-Reply-To: References: <4713782.GXAFRqVoOG@fomalhaut> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="nextPart8950060.CDJkKcVGEf" Content-Transfer-Encoding: 7Bit X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2022 09:46:23 -0000 This is a multi-part message in MIME format. --nextPart8950060.CDJkKcVGEf Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" > So that also applies to > > "... and the second profiling function is called before the exit > +corresponding to this first entry" > > specifically "corresponding to this first entry"? As if the second > entry exits first will that call the second profiling function or will > it really be the thread that called the first profiling function > (what happens when that thread terminates before calling the second > profiling function? (***)). Consider re-wording this slightly. The calls are always paired, i.e. if a thread calls the first function, then it will call the second function; I can indeed state it explicitly in the doc. > + /* If -finstrument-functions-once is specified, generate: > + > + static volatile bool F.0 = true; > + bool tmp_first; > > is there any good reason to make F.0 volatile? That doesn't prevent > races. No, it does not, but it guarantees a single read so the pairing. > Any reason to make F.0 initialized to true rather than false (bss init?) None, changed. > (***) looking at the implementation the second profiling function > can end up being never called when the thread calling the first > profiling function does not exit the function. So I wonder if > the "optimization"(?) not re-reading F.0 makes sense (it also > requires to keep the value of F.0 live across the whole function) It's for the pairing. The value should be spilled onto the stack if need be, so you'd get at most 2 loads like if you re-read the variable. Revised patch attached. -- Eric Botcazou --nextPart8950060.CDJkKcVGEf Content-Disposition: attachment; filename="p.diff" Content-Transfer-Encoding: 7Bit Content-Type: text/x-patch; charset="UTF-8"; name="p.diff" diff --git a/gcc/common.opt b/gcc/common.opt index 7ca0cceed82..8e961f16b0e 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1890,9 +1890,13 @@ EnumValue Enum(cf_protection_level) String(none) Value(CF_NONE) finstrument-functions -Common Var(flag_instrument_function_entry_exit) +Common Var(flag_instrument_function_entry_exit,1) Instrument function entry and exit with profiling calls. +finstrument-functions-once +Common Var(flag_instrument_function_entry_exit,2) +Instrument function entry and exit with profiling calls invoked once. + finstrument-functions-exclude-function-list= Common RejectNegative Joined -finstrument-functions-exclude-function-list=name,... Do not instrument listed functions. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 174bc09e5cf..b6c0305f198 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -618,7 +618,7 @@ Objective-C and Objective-C++ Dialects}. -fno-stack-limit -fsplit-stack @gol -fvtable-verify=@r{[}std@r{|}preinit@r{|}none@r{]} @gol -fvtv-counts -fvtv-debug @gol --finstrument-functions @gol +-finstrument-functions -finstrument-functions-once @gol -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{}} @gol -fprofile-prefix-map=@var{old}=@var{new} @@ -16395,6 +16395,22 @@ cannot safely be called (perhaps signal handlers, if the profiling routines generate output or allocate memory). @xref{Common Function Attributes}. +@item -finstrument-functions-once +@opindex -finstrument-functions-once +This is similar to @option{-finstrument-functions}, but the profiling +functions are called only once per instrumented function, i.e. the first +profiling function is called after the first entry into the instrumented +function and the second profiling function is called before the exit +corresponding to this first entry. + +The definition of @code{once} for the purpose of this option is a little +vague because the implementation is not protected against data races. +As a result, the implementation only guarantees that the profiling +functions are called at @emph{least} once per process and at @emph{most} +once per thread, but the calls are always paired, that is to say, if a +thread calls the first function, then it will call the second function, +unless it never reaches the exit of the instrumented function. + @item -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{} @opindex finstrument-functions-exclude-file-list diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index cd1796643d7..04990ad91a6 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -16586,6 +16586,51 @@ flag_instrument_functions_exclude_p (tree fndecl) return false; } +/* Build a call to the instrumentation function FNCODE and add it to SEQ. + If COND_VAR is not NULL, it is a boolean variable guarding the call to + the instrumentation function. IF STMT is not NULL, it is a statement + to be executed just before the call to the instrumentation function. */ + +static void +build_instrumentation_call (gimple_seq *seq, enum built_in_function fncode, + tree cond_var, gimple *stmt) +{ + /* The instrumentation hooks aren't going to call the instrumented + function and the address they receive is expected to be matchable + against symbol addresses. Make sure we don't create a trampoline, + in case the current function is nested. */ + tree this_fn_addr = build_fold_addr_expr (current_function_decl); + TREE_NO_TRAMPOLINE (this_fn_addr) = 1; + + tree label_true, label_false; + if (cond_var) + { + label_true = create_artificial_label (UNKNOWN_LOCATION); + label_false = create_artificial_label (UNKNOWN_LOCATION); + gcond *cond = gimple_build_cond (EQ_EXPR, cond_var, boolean_false_node, + label_true, label_false); + gimplify_seq_add_stmt (seq, cond); + gimplify_seq_add_stmt (seq, gimple_build_label (label_true)); + gimplify_seq_add_stmt (seq, gimple_build_predict (PRED_COLD_LABEL, + NOT_TAKEN)); + } + + if (stmt) + gimplify_seq_add_stmt (seq, stmt); + + tree x = builtin_decl_implicit (BUILT_IN_RETURN_ADDRESS); + gcall *call = gimple_build_call (x, 1, integer_zero_node); + tree tmp_var = create_tmp_var (ptr_type_node, "return_addr"); + gimple_call_set_lhs (call, tmp_var); + gimplify_seq_add_stmt (seq, call); + x = builtin_decl_implicit (fncode); + call = gimple_build_call (x, 2, this_fn_addr, tmp_var); + gimplify_seq_add_stmt (seq, call); + + if (cond_var) + gimplify_seq_add_stmt (seq, gimple_build_label (label_false)); +} + /* Entry point to the gimplification pass. FNDECL is the FUNCTION_DECL node for the function we want to gimplify. @@ -16636,40 +16681,66 @@ gimplify_function_tree (tree fndecl) && DECL_DISREGARD_INLINE_LIMITS (fndecl)) && !flag_instrument_functions_exclude_p (fndecl)) { - tree x; - gbind *new_bind; - gimple *tf; - gimple_seq cleanup = NULL, body = NULL; - tree tmp_var, this_fn_addr; - gcall *call; - - /* The instrumentation hooks aren't going to call the instrumented - function and the address they receive is expected to be matchable - against symbol addresses. Make sure we don't create a trampoline, - in case the current function is nested. */ - this_fn_addr = build_fold_addr_expr (current_function_decl); - TREE_NO_TRAMPOLINE (this_fn_addr) = 1; - - x = builtin_decl_implicit (BUILT_IN_RETURN_ADDRESS); - call = gimple_build_call (x, 1, integer_zero_node); - tmp_var = create_tmp_var (ptr_type_node, "return_addr"); - gimple_call_set_lhs (call, tmp_var); - gimplify_seq_add_stmt (&cleanup, call); - x = builtin_decl_implicit (BUILT_IN_PROFILE_FUNC_EXIT); - call = gimple_build_call (x, 2, this_fn_addr, tmp_var); - gimplify_seq_add_stmt (&cleanup, call); - tf = gimple_build_try (seq, cleanup, GIMPLE_TRY_FINALLY); - - x = builtin_decl_implicit (BUILT_IN_RETURN_ADDRESS); - call = gimple_build_call (x, 1, integer_zero_node); - tmp_var = create_tmp_var (ptr_type_node, "return_addr"); - gimple_call_set_lhs (call, tmp_var); - gimplify_seq_add_stmt (&body, call); - x = builtin_decl_implicit (BUILT_IN_PROFILE_FUNC_ENTER); - call = gimple_build_call (x, 2, this_fn_addr, tmp_var); - gimplify_seq_add_stmt (&body, call); + gimple_seq body = NULL, cleanup = NULL; + gassign *assign; + tree cond_var; + + /* If -finstrument-functions-once is specified, generate: + + static volatile bool C.0 = false; + bool tmp_called; + + tmp_called = C.0; + if (!tmp_called) + { + C.0 = true; + [call profiling enter function] + } + + without specific protection for data races. */ + if (flag_instrument_function_entry_exit > 1) + { + tree first_var + = build_decl (DECL_SOURCE_LOCATION (current_function_decl), + VAR_DECL, + create_tmp_var_name ("C"), + boolean_type_node); + DECL_ARTIFICIAL (first_var) = 1; + DECL_IGNORED_P (first_var) = 1; + TREE_STATIC (first_var) = 1; + TREE_THIS_VOLATILE (first_var) = 1; + TREE_USED (first_var) = 1; + DECL_INITIAL (first_var) = boolean_false_node; + varpool_node::add (first_var); + + cond_var = create_tmp_var (boolean_type_node, "tmp_called"); + assign = gimple_build_assign (cond_var, first_var); + gimplify_seq_add_stmt (&body, assign); + + assign = gimple_build_assign (first_var, boolean_true_node); + } + + else + { + cond_var = NULL_TREE; + assign = NULL; + } + + build_instrumentation_call (&body, BUILT_IN_PROFILE_FUNC_ENTER, + cond_var, assign); + + /* If -finstrument-functions-once is specified, generate: + + if (!tmp_called) + [call profiling exit function] + + without specific protection for data races. */ + build_instrumentation_call (&cleanup, BUILT_IN_PROFILE_FUNC_EXIT, + cond_var, NULL); + + gimple *tf = gimple_build_try (seq, cleanup, GIMPLE_TRY_FINALLY); gimplify_seq_add_stmt (&body, tf); - new_bind = gimple_build_bind (NULL, body, NULL); + gbind *new_bind = gimple_build_bind (NULL, body, NULL); /* Replace the current function body with the body wrapped in the try/finally TF. */ --nextPart8950060.CDJkKcVGEf--