From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-2308-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 28510 invoked by alias); 3 Feb 2011 22:39:21 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 28489 invoked by uid 22791); 3 Feb 2011 22:39:19 -0000
X-SWARE-Spam-Status: No, hits=-4.7 required=5.0
	tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Roland McGrath <roland@redhat.com>
To: Project Archer <archer@sourceware.org>
X-Fcc: ~/Mail/utrace
CC: Oleg Nesterov <oleg@redhat.com>
Subject: ptrace improvement ideas
Message-Id: <20110203223905.D0C77180081@magilla.sf.frob.com>
Date: Thu, 03 Feb 2011 22:39:00 -0000
X-SW-Source: 2011-q1/txt/msg00026.txt.bz2

I've been considering ideas for incremental improvements to the Linux
ptrace interface to make life better for debuggers.  This is far less
ambitious than big ideas about replacing ptrace with a good interface.
But the focus is on what is practical to get accepted by the kernel
community without months or years of wrangling and delay.

I'm trying to concentrate on things that both are of immediate help to
GDB, and are fairly straightforward and noninvasive in the kernel
implementation.  Some things that sound simple in the abstract from
the userland point of view are in fact substantially difficult to
implement in the kernel given the current structure of things.  So
this will not be one giant win to solve all your pain points.  It will
be a series of small, incremental improvements that address
significant pain points while also being moderately low-hanging fruit
in the kernel implementation.

I don't want to propose any specific changes to the kernel community
off the cuff.  We need to work out details with GDB hackers so that we
have things that really improve life in GDB, and that GDB hackers will
really make use of quite soon.  It's worse than nothing to add or
change things in the kernel interfaces before GDB folks are ready to
really work on using them, so that we can be quite sure that the
details and corner cases are thoroughly-specified and done so in ways
that really work well for GDB in practice.

We're open to your requests, of course.  But we need to keep a tight focus
on the things that we can implement fairly quickly and simply in the
kernel as it stands today, that make a substantial difference in the
correctness, performance, or ease of maintenance of GDB for real-world
debugging cases, and that GDB will actually make use of quite soon.

I have a few ideas to start with.  Some of these are quite simple to
implement in the kernel and thus we can expect to get them in without
controversy.  Others require more investigation by me and Oleg to be sure
we can really do them well in the kernel without making too many waves.
Everything will have to be done in an incremental fashion.  That means we
will have to get the easy changes done first and have GDB really using
them and getting observable benefits, before we can propose another round
of changes in the kernel.  If something is a nice idea, and even seems
simple to do from the kernel perspective, but GDB is not really going to
start using it right away, then we won't do it.  Ideally we will have
proven our draft interfaces and their implementations with GDB work and
gotten comfortable with them in all their corners, before we try to submit
the changes to the kernel.

So, here are my first few ideas.

* PTRACE_ATTACH_NOSTOP

This is a new request that differs from PTRACE_ATTACH in two ways.

First, it does not generate a SIGSTOP.  On return from ptrace, the
tracee thread is attached for ptrace, but may be running uninterrupted
or may be stopped, or however it was.  If traceable events happen
right afterward, then it may be in a ptrace stop by the time you look.

Second, it uses the other arguments to the ptrace call.  One of these
is reserved for future use (unless you have an idea), meaning the call
with fail with EINVAL if it's nonzero.  The other is a set of flags as
now used with PTRACE_SETOPTIONS.  The options will be set atomically
with the attach.  So there is no window during which you are attached
but the event-reporting behavior is not yet configured exactly as you
want it.

* PTRACE_O_INHERIT

This is a new option bit for PTRACE_SETOPTIONS or the options argument of
PTRACE_ATTACH_NOSTOP.  Its effect is that clones of the tracee inherit the
ptrace attachedness and option settings of their parent.  This applies to
all kinds of clones, which in userland are known as thread creations,
forks, and vforks.  This has no other effects, meaning it does not cause
either the parent or the child to stop for any event.  There's no point in
using this along with all of PTRACE_O_TRACECLONE, PTRACE_O_TRACEFORK, and
PTRACE_O_TRACEVFORK, because those already imply the inheritance behavior.
The point of PTRACE_O_INHERIT would be to attach newly-created threads and
children without causing an event stop and the attendant overhead.

This means that you would have no notice that the new thread was your
tracee until you got some event report for it.  This being the case, it
appears as a spontaneous wait result for a PID you hadn't heard of before.
To help keep track of what that's about, the siginfo_t for SIGCHLD would
be extended with a new field si_tgid.  To get this information reliably,
the debugger needs to use the waitid call instead of waitpid/wait4.  Thus,
for a new thread (i.e. CLONE_THREAD clone), you would see a new PID you
didn't know about, and the siginfo_t from that waitid would show the CLD_*
status as normal, with si_pid being the individual thread's ID and si_tgid
being the ID of the thread-group (PID in userland terms).

Because of this spontaneous report aspect, it could be difficult to figure
out what's going on with any new thread that is a fork/vfork, or other use
of clone (oddball applications, or old linuxthreads), rather than a
CLONE_THREAD case (NPTL pthread_create).  In those cases, si_tgid and
si_pid are the same and neither matches any process you already know you
are tracing.  In general, it can be impossible to figure out whose child
this is, because its parent could exit so its ppid (as seen in
/proc/pid/status et al) becomes 1.  So perhaps it would be better to have
this be just PTRACE_O_THREAD_INHERIT, where it only applies to CLONE_THREAD
clones.

* PTRACE_O_NO_ZOMBIE_THREAD

This is another new option.  It applies to the behavior on the death of a
CLONE_THREAD clone (i.e. an NPTL pthread_create thread).  The thread would
not become a zombie and not cause any wait result.  Instead, it would just
die and disappear silently, as they do when not traced.  If you want to
notice individual threads dying, you can already use PTRACE_O_TRACEEXIT
for that instead.  

Another subtle issue is when the initial thread (the one whose thread ID
matches the tgid) exits while other threads are live.  It's already the
case (I'm pretty sure, anyway) that you don't get a wait result when this
happens.  That's because that wait result is reserved for when the whole
group exits, i.e. the entire process is dead and there are no threads left
in it at all.  That would remain so with this option, but now it would be
the obvious and consistent thing to see rather than being a subtle
difference between this particular thread dying and any other thread dying.

One nonobvious issue here is that of PID reuse.  It's unlikely, but
possible, that the individual thread ID of a dead thread is reused for a
different new thread.  With this option, there would be no notification to
the debugger that the old thread using this ID has died.  If you are also
using PTRACE_O_INHERIT, and the same thread ID is reused for a new thread
in the same process (same tgid), then all you would see is some new event
for a thread ID you already knew about.  It would appear that the same
thread remained alive and something new happened, when in fact what
happened was that the old thread died and a new thread came along and
happened to get the same ID.  So without other new feature aspects, you
would have to assume this could happen, and be sure not to be confused by
it or to tell the user something misleading.

* PTRACE_INTERRUPT

This is a new request, with an attendant new PTRACE_EVENT_* type.
I have not thought out all the details of this yet.  I think it is
viable to implement it without too much trouble, but it is certainly
more involved than the first three ideas above.

This request asks to make a given tracee thread stop and give a
PTRACE_EVENT_INTERRUPT wait result.  Unlike other ptrace requests, you can
make this request on a tracee that is not already stopped.  It is similar
to sending a signal with tkill, but it does not interfere with any real
signals, is not affected by the blocked signal mask, etc.

If the tracee is already stopped for a ptrace stop, then this would return
EALREADY.  If the tracee is already stopped for job control, then it would
morph that into a ptrace stop (so that SIGCONT cannot resume it), and
likewise return EALREADY.

One major use for this would be to clean up the cold-attach procedure to
avoid the races and bad side effects it has now.  First, the debugger
would use PTRACE_ATTACH_NOSTOP to establish tracing but not perturb the
thread at all.  As soon as that returns success, the debugger can use
PTRACE_INTERRUPT on it.  This will yield EALREADY if the tracee is already
stopped, telling you that you can safely inspect it with other ptrace
requests (or read /proc/pid/mem, or whatever).  If it instead returns 0,
that means that the tracee will stop soon, telling you that you can safely
do a blocking wait* call on it and not worry about any races or long blocks.


These are a few ideas to get the discussion started.  We will need to hash
everything out in detail before we commit to feature proposals for the
kernel.  These are certainly not the only things we can do with the
constraints I described above.  But these are examples of some things that
are fairly easy to do in the kernel.


Thanks,
Roland