From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30093 invoked by alias); 9 Nov 2018 18:18:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 30081 invoked by uid 89); 9 Nov 2018 18:18:40 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY autolearn=ham version=3.3.2 spammy=ongoing, foot, recorder, stopping X-HELO: mga06.intel.com Received: from mga06.intel.com (HELO mga06.intel.com) (134.134.136.31) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Nov 2018 18:18:39 +0000 Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Nov 2018 10:18:37 -0800 Received: from tassilo.jf.intel.com (HELO tassilo.localdomain) ([10.7.201.126]) by fmsmga005.fm.intel.com with ESMTP; 09 Nov 2018 10:18:37 -0800 Received: by tassilo.localdomain (Postfix, from userid 1000) id 1D96130235B; Fri, 9 Nov 2018 10:18:37 -0800 (PST) Date: Fri, 09 Nov 2018 18:18:00 -0000 From: Andi Kleen To: Richard Biener Cc: Andi Kleen , GCC Patches Subject: Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation Message-ID: <20181109181837.GB6218@tassilo.jf.intel.com> References: <20181104063235.6914-1-andi@firstfloor.org> <20181104063235.6914-2-andi@firstfloor.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-SW-Source: 2018-11/txt/msg00762.txt.bz2 Hi Richard, On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote: > > Passes bootstrap and test suite on x86_64-linux, also > > bootstrapped and tested gcc itself with full -fvartrace > > and -fvartrace-locals instrumentation. > > So how is this supposed to be used? I guess in a > edit-debug cycle and not for production code? It can actually be used for production code. When processor trace is disabled the PTWRITE instructions acts as nops. So it's only increasing the code foot print. Since the instrumentation should only log values which are already computed it normally doesn't cause any other code. Even when it is enabled the primary overhead is the additional memory bandwidth, since the CPU can do the logging in parallel to other code. As long as the instrumentation is not too excessive to generate too much memory bandwidth, it might be actually quite reasonable to even keep the logging on for production code, and use it as a "flight recorder", which is dumped on failures. This would also be the model in gdb, once we have support in it. You would run the program in the debugger and it just logs the data to a memory buffer, but when stopping the value history can be examined. There's also some ongoing work to add (optional) support for PT to Linux crash dumps, so eventually that will work without having to always run the debugger. Today it can be done by running perf in the background to record the PT, however there the setup is a bit more complicated. The primary use case I was envisioning was to set the attribute on some critical functions/structures/types of interest and then have a very overhead logging option for them (generally cheaper than equivalent software instrumentation). And then they automatically gets logged without the programmer needing to add lots of instrumentation code to catch every instance. So think of it as a "hardware accelerated printf" > > What do you actually write with PTWRITE? I suppose > you need to keep a ID to something mapping somewhere > so you can make sense of the perf records? PTWRITE writes 32bit/64bit values. The CPU reports the IP of PTWRITE in the log, either explicitely, or implicitely if branch trace is enabled too. The IP can then be used to look up the DWARF scope for that IP. Then the decoder decodes the operand of PTWRITE and maps it back using the dwarf information. So it all works using existing debugger infrastructure, and a quite simple instruction decoder. I'll clarify that in the description. > > Few comments inline below, but I'm not sure if this > whole thing is interesting for GCC (as opposed to being > a instrumentation plugin) I'm biased, but I think automatic data tracing is a very exciting use case, so hopefully it can be considered for mainstream gcc. > > handle_no_profile_instrument_function_attribute, > > NULL }, > > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, tree name, tree, int, > > return NULL_TREE; > > } > > > > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in > > + struct attribute_spec.handler. */ > > + > > +static tree > > +handle_vartrace_attribute (tree *node, tree, tree, int flags, > > + bool *) > > +{ > > + if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE)) > > + *node = build_variant_type_copy (*node); > > I don't think you want the attribute on types. As far as I understood your > descriptions it should only be on variables and functions. The idea was that it's possible to trace all instances of a type, especially structure members. Otherwise it will be harder for the programmer to hunt down every instance. For example if I have a structure that is used over a program, and one member gets the wrong value. I can do then in the header: struct foo { int member __attribute__(("vartrace")); }; and then recompile the program. Every instance of writing to member will then be automatically instrumented (assuming the program stays type safe) Makes sense? [BTW I considered adding an address trace too for pointer writes to hunt down the non type safe instances and possibly some other use cases. That might be possible follow on work] > > + > > #undef TARGET_GIMPLIFY_VA_ARG_EXPR > > #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > > index 1eca009e255..08286aa4591 100644 > > --- a/gcc/doc/extend.texi > > +++ b/gcc/doc/extend.texi > > @@ -3193,6 +3193,13 @@ the standard C library can be guaranteed not to throw an exception > > with the notable exceptions of @code{qsort} and @code{bsearch} that > > take function pointer arguments. > > > > +@item no_vartrace > > +@cindex @code{no_vartrace} function or variable attribute > > +Disable data tracing for the function or variable or structured field > > +marked with this attribute. Applies to types. Currently implemented > > +for x86 when the @option{ptwrite} target option is enabled for systems > > +that support the @code{PTWRITE} instruction. > > How does it apply to types? Same as it would apply to variables or functions. So when the whole file or the whole function is traced instances of the marked type will not be traced. > > @@ -11933,6 +11933,10 @@ Address Sanitizer shadow memory address. NULL if Address Sanitizer is not > > supported by the target. > > @end deftypefn > > > > +@deftypefn {Target Hook} tree TARGET_VARTRACE_FUNC (tree @var{type}) > > +Return a builtin to call to trace variables or NULL if not supported by the target. > > Please elaborate on the required signature of the builtin, its > arguments and semantics. > Is this really expected to be similar enough across architectures to make this a > middle-end feature rather than a target specific isntrumentation thing > in md-reorg or so? I'm not aware of any other architecture having a PTWRITE equivalent today, but I would assume if one adds one it would look similar. There are already other architectures that have a Processor Trace equivalent, like ARM and MIPS. Yes could move it into config/i386, but I assumed the concept was generic enough for the generic middle end?` > > @hook TARGET_ATOMIC_TEST_AND_SET_TRUEVAL > > diff --git a/gcc/passes.def b/gcc/passes.def > > index 24f212c8e31..518cb4ef6f7 100644 > > --- a/gcc/passes.def > > +++ b/gcc/passes.def > > @@ -179,6 +179,7 @@ along with GCC; see the file COPYING3. If not see > > NEXT_PASS (pass_oacc_device_lower); > > NEXT_PASS (pass_omp_device_lower); > > NEXT_PASS (pass_omp_target_link); > > + NEXT_PASS (pass_vartrace); > > Wow, that's early. Reasoning for the placement before post-IPA optimizations? I was hoping my instrumentation would be optimized too, and also do the instrumentation nearer the original code. But yes perhaps it should be later and that might help with the occasionally redundant PTWRITEs which are generated today. Any suggestions where it should be? Thanks for the useful comments. I'll work on that and repost. -Andi