From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <archer-return-2210-listarch-archer=sourceware.org@sourceware.org>
Received: (qmail 15167 invoked by alias); 15 Sep 2010 08:01:02 -0000
Mailing-List: contact archer-help@sourceware.org; run by ezmlm
Sender: <archer@sourceware.org>
Precedence: bulk
List-Post: <mailto:archer@sourceware.org>
List-Help: <mailto:archer-help@sourceware.org>
List-Subscribe: <mailto:archer-subscribe@sourceware.org>
List-Id: <archer.sourceware.org>
Received: (qmail 15146 invoked by uid 22791); 15 Sep 2010 08:01:01 -0000
X-SWARE-Spam-Status: No, hits=-3.2 required=5.0
	tests=AWL,BAYES_50,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Roland McGrath <roland@redhat.com>
To: Project Archer <archer@sourceware.org>
Subject: PTRACE_GETREGSET
Message-Id: <20100915080050.2C8B2403E6@magilla.sf.frob.com>
Date: Wed, 15 Sep 2010 08:01:00 -0000
X-SW-Source: 2010-q3/txt/msg00193.txt.bz2

Starting with Linux 2.6.25, the Linux kernel has been regularizing
its internal support for accessing the arch-specific register data.
The organizing principle is that each arch exports some number of
blobs of arch-specific data for each thread that can be seen in
userland, and we call each of these a regset.  There is a single
canonical layout and meaning for each regset; these are the
established layouts used on each arch in the ELF core file formats,
the payloads of the the "CORE" (or "LINUX") notes.  We identify the
regsets by the n_type (NT_*) code used in the ELF core note format.

In historical practice on Linux, the formats of the register data
accessible via ptrace (and details of ptrace calls to get it) vary
widely by arch.  On some machines such as x86, the PTRACE_GETREGS et
al formats have always matched the formats used in ELF core files.
On other machines, the historical ptrace ABIs are more arcane.

This combination of NT_* code and regset layout/meaning will be the
single(*) preferred user<->kernel ABI for dealing with each kind of
register data in all future kernel facilities.  The n_type codes
and note payload formats used in ELF core files were the one
existing well-organized user<->kernel ABI for register data, so the
kernel has settled on that as the one standard format.  To wit, GDB
already has target support code that knows all these exact layouts
for each arch somewhere, used in core file reading--so there is
nothing new here that's arch-specific.

Since Linux 2.6.34 (and also backported in RHEL6-beta), new ptrace
requests PTRACE_GETREGSET and PTRACE_SETREGSET let the debugger use
these canonical terms to access all thread register data.  In the
(very, very) long run, the old ptrace requests like PTRACE_GETREGS,
that are specific to each arch, will be deprecated.  For both new
arch ports and new kinds of register data, these generic ones will
be the only means to access the register data.  Already on x86,
this is the only way to get the NT_X86_XSTATE data (e.g. %ymmN
register high halves).

The operation of these requests is entirely arch-independent.  Only
what NT_* code you ask for and what layout that data block has is
arch-dependent (and already specified by the core file formats).
Actual availability of the requests on each arch depends on the
arch-specific internal support code being wired up in the kernel
(9 are as of 2.6.35, more as their arch maintainers get round tuits).

A caveat about those requests for bi-arch systems.  Unlike other
ptrace requests, these access the native formats of the tracee
process, rather than the native formats of the debugger process.
So, a 64-bit debugger process using PTRACE_GETREGSET on a 32-bit
tracee process will see the 32-bit layouts (i.e. what would appear
in an ELF core file if that process dumped one).

This internal organization is the way that all kernel-based things
have to access user register data, i.e. ugdb.  Hence, those things
too can use entirely arch-independent code to the extent that their
needs for register data are expressed in these terms.

A remote protocol extension to access register data by n_type code
rather than gdb-specific layouts would let gdbserver and ugdb work
without needing any arch-specific code for direct register access.
Of course, gdbserver already has arch-specific code for that, and
will always keep it to work on older kernels.  But, it wouldn't
need any new arch-specific code to support new kinds of register
data (just pass through what gdb knows to ask for).  And, of
course, my main motivation is that ugdb could be done without ever
having any arch-specific code at all (it really needs none for
anything else except the protocol requests that use gdb register
numbers or other gdb-specific, arch-specific layouts).

(*) There is a single format for each regset and that's in theory
the canonical and sole representation of that register data.  But
some machines (x86, and I hope no others) have two or three regset
flavors to represent the same data, because successive generations
of hardware (FPU, etc.) extended the register data.  On x86-64,
when NT_X86_XSTATE is available, that's a superset of all the data
in NT_PRFPREG.  On x86-32, NT_PRXFPREG (when a machine supports it)
is a superset of all the data in NT_PRFPREG in a related but
different machine format, and NT_X86_XSTATE (when available) is a
superset of both in yet another format (the only slight relief is
that NT_X86_XSTATE format is actually the same for 32 and 64).  So
you can't quite say that any one of these is the single canonical
format for that register data, you'll usually get both (or all
three, on x86-32) in a core file, but only the oldest/smallest one
(subset) is guaranteed available on every given machine.  But that
weird overlapping tower of regsets is wacky that way because each
of these is really a layout that some natural machine instructions
use (there are different register names in different instructions
that overlap correspondingly too!).  But the user<->kernel ABI
principle stands that each "natural" chunk of register data has one
true format, and any alternative formats, such as old arch-specific
ptrace requests not matching the core file formats, are deprecated
and not to be instituted in future ABIs.


Thanks,
Roland