From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <systemtap-return-10007-listarch-systemtap=sources.redhat.com@sourceware.org>
Received: (qmail 3377 invoked by alias); 23 Sep 2008 02:08:16 -0000
Received: (qmail 2784 invoked by uid 22791); 23 Sep 2008 02:08:15 -0000
X-Spam-Status: No, hits=-2.3 required=5.0 	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from tomts25-srv.bellnexxia.net (HELO tomts25-srv.bellnexxia.net) (209.226.175.188)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 23 Sep 2008 02:07:24 +0000
Received: from toip6.srvr.bell.ca ([209.226.175.125])           by tomts25-srv.bellnexxia.net           (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP           id <20080923020718.MHCP1557.tomts25-srv.bellnexxia.net@toip6.srvr.bell.ca>           for <systemtap@sources.redhat.com>;           Mon, 22 Sep 2008 22:07:18 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqEEAH3r10hMQWq+/2dsb2JhbACBXbUSgWY
Received: from bas5-montreal19-1279355582.dsl.bell.ca (HELO krystal.dyndns.org) ([76.65.106.190])   by toip6.srvr.bell.ca with ESMTP; 22 Sep 2008 22:02:30 -0400
Received: from localhost (localhost [127.0.0.1])   (uid 1000)   by krystal.dyndns.org with local; Mon, 22 Sep 2008 22:02:16 -0400   id 0017AAA0.48D84E28.000068D8
Date: Tue, 23 Sep 2008 02:08:00 -0000
From: Mathieu Desnoyers <compudj@krystal.dyndns.org>
To: Roland Dreier <rdreier@cisco.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,   Masami Hiramatsu <mhiramat@redhat.com>,   Martin Bligh <mbligh@google.com>,   Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,   Thomas Gleixner <tglx@linutronix.de>,   Steven Rostedt <rostedt@goodmis.org>, darren@dvhart.com,   "Frank Ch. Eigler" <fche@redhat.com>,   systemtap-ml <systemtap@sources.redhat.com>
Subject: Re: Unified tracing buffer
Message-ID: <20080923020216.GC24937@Krystal>
References: <33307c790809191433w246c0283l55a57c196664ce77@mail.gmail.com> <48D7F5E8.3000705@redhat.com> <33307c790809221313s3532d851g7239c212bc72fe71@mail.gmail.com> <48D81B5F.2030702@redhat.com> <33307c790809221616h5e7410f5gc37c262d83722111@mail.gmail.com> <48D832B6.3010409@redhat.com> <alpine.LFD.1.10.0809221718100.3265@nehalem.linux-foundation.org> <adaod2f649o.fsf@cisco.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
In-Reply-To: <adaod2f649o.fsf@cisco.com>
X-Editor: vi
X-Info: http://krystal.dyndns.org:8080
X-Operating-System: Linux/2.6.21.3-grsec (i686)
X-Uptime: 21:39:45 up 110 days,  6:20,  7 users,  load average: 0.89, 0.45, 	0.38
User-Agent: Mutt/1.5.16 (2007-06-11)
X-IsSubscribed: yes
Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <systemtap.sourceware.org>
List-Subscribe: <mailto:systemtap-subscribe@sourceware.org>
List-Post: <mailto:systemtap@sourceware.org>
List-Help: <mailto:systemtap-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: systemtap-owner@sourceware.org
X-SW-Source: 2008-q3/txt/msg00742.txt.bz2

* Roland Dreier (rdreier@cisco.com) wrote:
>  > Because all it tells you is the ordering of the atomic increment, not of 
>  > the caller. The atomic increment is not related to all the other ops that 
>  > the code that you trace actually does in any shape or form, and so the 
>  > ordering of the trace doesn't actually imply anything for the ordering of 
>  > the operations you are tracing!
> 
> This reminds me of a naive question that occurred to me while we were
> discussing this at KS.  Namely, what does "ordering" mean for events?
> 
> An example I'm all too familiar with is the lack of ordering of MMIO on
> big SGI systems -- if you forget an mmiowb(), then two CPUs taking a
> spinlock and doing writel() inside the spinlock and then dropping the
> spinlock (which should be enough to "order" things) might see the
> writel() reach the final device "out of order" because the write has to
> travel through a routed system fabric.
> 
> Just like Einstein said, it really seems to me that the order of things
> depends on your frame of reference.
> 
>  - R.
> 

Exactly as Linus said, event ordering comes down to this : a choice
between heavy locking around the real operation traced and the tracing
statement itself (irq disable/spinlock) or the acknowledgement that the
ordering is only insured across the actual tracing _instrumentation_.

A worse case scenario would be to get an interrupt between the "real"
operation (e.g. a memory or mmio write) and the tracing statement, be
scheduled out, which would let a lot of stuff happen between the actual
impact of the operation on kernel memory and the tracing statement
itself.

If we want to be _sure_ such thing never happen, we would then have to
pay the price of heavy locking and that would not be pretty, especially
for complex data structure modifications comes in play. I don't really
think anyone with an half-sane mind would want to slow down such
critical kernel operations for the benefit of totally ordered tracing.

However, in many cases where ordering matters, e.g. to instrument
spinlocks themselves, if we put the instrumentation within the critical
section rather than outside of it, then we benefit from the existing
kernel locking (but only for events related to this specific spinlock).
This is the same for many synchronization primitives, except for atomic
operations, where we have to accept that the order will be imperfect.

So only in the specific case of instrumentation of things like locking,
where it is possible to insure that instrumentation is synchronized with
the instrumented operation, does it make a difference to choose the TSC
(which implies a slight delta between the TSCs due to cache line delays
at synchronization and delay due to TSCs drifts caused by temperature)
over an atomic increment.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68