From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-489560-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30093 invoked by alias); 9 Nov 2018 18:18:41 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 30081 invoked by uid 89); 9 Nov 2018 18:18:40 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY autolearn=ham version=3.3.2 spammy=ongoing, foot, recorder, stopping
X-HELO: mga06.intel.com
Received: from mga06.intel.com (HELO mga06.intel.com) (134.134.136.31) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 09 Nov 2018 18:18:39 +0000
Received: from fmsmga005.fm.intel.com ([10.253.24.32])  by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Nov 2018 10:18:37 -0800
Received: from tassilo.jf.intel.com (HELO tassilo.localdomain) ([10.7.201.126])  by fmsmga005.fm.intel.com with ESMTP; 09 Nov 2018 10:18:37 -0800
Received: by tassilo.localdomain (Postfix, from userid 1000)	id 1D96130235B; Fri,  9 Nov 2018 10:18:37 -0800 (PST)
Date: Fri, 09 Nov 2018 18:18:00 -0000
From: Andi Kleen <ak@linux.intel.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
Message-ID: <20181109181837.GB6218@tassilo.jf.intel.com>
References: <20181104063235.6914-1-andi@firstfloor.org> <20181104063235.6914-2-andi@firstfloor.org> <CAFiYyc34-GkFdeYQrZ+Aat6CukFJe=HwoRtXkWGaCm_iQQ24Tw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAFiYyc34-GkFdeYQrZ+Aat6CukFJe=HwoRtXkWGaCm_iQQ24Tw@mail.gmail.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-SW-Source: 2018-11/txt/msg00762.txt.bz2

Hi Richard,

On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote:
> > Passes bootstrap and test suite on x86_64-linux, also
> > bootstrapped and tested gcc itself with full -fvartrace
> > and -fvartrace-locals instrumentation.
> 
> So how is this supposed to be used?  I guess in a
> edit-debug cycle and not for production code?

It can actually be used for production code.

When processor trace is disabled the PTWRITE
instructions acts as nops. So it's only increasing
the code foot print. Since the instrumentation
should only log values which are already computed
it normally doesn't cause any other code.

Even when it is enabled the primary overhead is the
additional memory bandwidth, since the CPU can
do the logging in parallel to other code. As long
as the instrumentation is not too excessive to generate
too much memory bandwidth, it might be actually
quite reasonable to even keep the logging on for
production code, and use it as a "flight recorder",
which is dumped on failures. 

This would also be the model in gdb, once we have support
in it. You would run the program in the debugger
and it just logs the data to a memory buffer,
but when stopping the value history can be examined.

There's also some ongoing work to add (optional) support
for PT to Linux crash dumps, so eventually that will
work without having to always run the debugger.

Today it can be done by running perf in the background
to record the PT, however there the setup is a bit
more complicated.

The primary use case I was envisioning was to set
the attribute on some critical functions/structures/types
of interest and then have a very overhead logging
option for them (generally cheaper than
equivalent software instrumentation). And then
they automatically gets logged without the programmer
needing to add lots of instrumentation code to 
catch every instance. So think of it as a 
"hardware accelerated printf"

> 
> What do you actually write with PTWRITE?  I suppose
> you need to keep a ID to something mapping somewhere
> so you can make sense of the perf records?

PTWRITE writes 32bit/64bit values. The CPU reports the
IP of PTWRITE in the log, either explicitely, or implicitely if branch
trace is enabled too. The IP can then be used to look up
the DWARF scope for that IP. Then the decoder
decodes the operand of PTWRITE and maps it back using 
the dwarf information. So it all works using 
existing debugger infrastructure, and a quite simple
instruction decoder.

I'll clarify that in the description.

> 
> Few comments inline below, but I'm not sure if this
> whole thing is interesting for GCC (as opposed to being
> a instrumentation plugin)

I'm biased, but I think automatic data tracing is a very exciting
use case, so hopefully it can be considered for mainstream gcc.

> >                               handle_no_profile_instrument_function_attribute,
> >                               NULL },
> > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, tree name, tree, int,
> >    return NULL_TREE;
> >  }
> >
> > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_vartrace_attribute (tree *node, tree, tree, int flags,
> > +                          bool *)
> > +{
> > +  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
> > +    *node = build_variant_type_copy (*node);
> 
> I don't think you want the attribute on types.  As far as I understood your
> descriptions it should only be on variables and functions.

The idea was that it's possible to trace all instances of a type,
especially structure members. Otherwise it will be harder for
the programmer to hunt down every instance.

For example if I have a structure that is used over a program,
and one member gets the wrong value.

I can do then in the header:

struct foo {
	int member __attribute__(("vartrace"));
};

and then recompile the program. Every instance of writing to
member will then be automatically instrumented (assuming
the program stays type safe)

Makes sense?

[BTW I considered adding an address trace
too for pointer writes to hunt down the non type safe
instances and possibly some other use cases. 
That might be possible follow on work]

> > +
> >  #undef TARGET_GIMPLIFY_VA_ARG_EXPR
> >  #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 1eca009e255..08286aa4591 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -3193,6 +3193,13 @@ the standard C library can be guaranteed not to throw an exception
> >  with the notable exceptions of @code{qsort} and @code{bsearch} that
> >  take function pointer arguments.
> >
> > +@item no_vartrace
> > +@cindex @code{no_vartrace} function or variable attribute
> > +Disable data tracing for the function or variable or structured field
> > +marked with this attribute. Applies to types. Currently implemented
> > +for x86 when the @option{ptwrite} target option is enabled for systems
> > +that support the @code{PTWRITE} instruction.
> 
> How does it apply to types?

Same as it would apply to variables or functions.

So when the whole file or the whole function is traced instances
of the marked type will not be traced.

> > @@ -11933,6 +11933,10 @@ Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not
> >  supported by the target.
> >  @end deftypefn
> >
> > +@deftypefn {Target Hook} tree TARGET_VARTRACE_FUNC (tree @var{type})
> > +Return a builtin to call to trace variables or NULL if not supported by the target.
> 
> Please elaborate on the required signature of the builtin, its
> arguments and semantics.
> Is this really expected to be similar enough across architectures to make this a
> middle-end feature rather than a target specific isntrumentation thing
> in md-reorg or so?

I'm not aware of any other architecture having a PTWRITE equivalent today,
but I would assume if one adds one it would look similar. There
are already other architectures that have a Processor Trace equivalent,
like ARM and MIPS.

Yes could move it into config/i386, but I assumed the concept
was generic enough for the generic middle end?`

> >  @hook TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 24f212c8e31..518cb4ef6f7 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -179,6 +179,7 @@ along with GCC; see the file COPYING3.  If not see
> >    NEXT_PASS (pass_oacc_device_lower);
> >    NEXT_PASS (pass_omp_device_lower);
> >    NEXT_PASS (pass_omp_target_link);
> > +  NEXT_PASS (pass_vartrace);
> 
> Wow, that's early.  Reasoning for the placement before post-IPA optimizations?

I was hoping my instrumentation would be optimized too,
and also do the instrumentation nearer the original
code.  But yes perhaps it should be later and that might
help with the occasionally redundant PTWRITEs which
are generated today.

Any suggestions where it should be?

Thanks for the useful comments. I'll work on that and repost.

-Andi