From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15167 invoked by alias); 15 Sep 2010 08:01:02 -0000 Mailing-List: contact archer-help@sourceware.org; run by ezmlm Sender: Precedence: bulk List-Post: List-Help: List-Subscribe: List-Id: Received: (qmail 15146 invoked by uid 22791); 15 Sep 2010 08:01:01 -0000 X-SWARE-Spam-Status: No, hits=-3.2 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit From: Roland McGrath To: Project Archer Subject: PTRACE_GETREGSET Message-Id: <20100915080050.2C8B2403E6@magilla.sf.frob.com> Date: Wed, 15 Sep 2010 08:01:00 -0000 X-SW-Source: 2010-q3/txt/msg00193.txt.bz2 Starting with Linux 2.6.25, the Linux kernel has been regularizing its internal support for accessing the arch-specific register data. The organizing principle is that each arch exports some number of blobs of arch-specific data for each thread that can be seen in userland, and we call each of these a regset. There is a single canonical layout and meaning for each regset; these are the established layouts used on each arch in the ELF core file formats, the payloads of the the "CORE" (or "LINUX") notes. We identify the regsets by the n_type (NT_*) code used in the ELF core note format. In historical practice on Linux, the formats of the register data accessible via ptrace (and details of ptrace calls to get it) vary widely by arch. On some machines such as x86, the PTRACE_GETREGS et al formats have always matched the formats used in ELF core files. On other machines, the historical ptrace ABIs are more arcane. This combination of NT_* code and regset layout/meaning will be the single(*) preferred user<->kernel ABI for dealing with each kind of register data in all future kernel facilities. The n_type codes and note payload formats used in ELF core files were the one existing well-organized user<->kernel ABI for register data, so the kernel has settled on that as the one standard format. To wit, GDB already has target support code that knows all these exact layouts for each arch somewhere, used in core file reading--so there is nothing new here that's arch-specific. Since Linux 2.6.34 (and also backported in RHEL6-beta), new ptrace requests PTRACE_GETREGSET and PTRACE_SETREGSET let the debugger use these canonical terms to access all thread register data. In the (very, very) long run, the old ptrace requests like PTRACE_GETREGS, that are specific to each arch, will be deprecated. For both new arch ports and new kinds of register data, these generic ones will be the only means to access the register data. Already on x86, this is the only way to get the NT_X86_XSTATE data (e.g. %ymmN register high halves). The operation of these requests is entirely arch-independent. Only what NT_* code you ask for and what layout that data block has is arch-dependent (and already specified by the core file formats). Actual availability of the requests on each arch depends on the arch-specific internal support code being wired up in the kernel (9 are as of 2.6.35, more as their arch maintainers get round tuits). A caveat about those requests for bi-arch systems. Unlike other ptrace requests, these access the native formats of the tracee process, rather than the native formats of the debugger process. So, a 64-bit debugger process using PTRACE_GETREGSET on a 32-bit tracee process will see the 32-bit layouts (i.e. what would appear in an ELF core file if that process dumped one). This internal organization is the way that all kernel-based things have to access user register data, i.e. ugdb. Hence, those things too can use entirely arch-independent code to the extent that their needs for register data are expressed in these terms. A remote protocol extension to access register data by n_type code rather than gdb-specific layouts would let gdbserver and ugdb work without needing any arch-specific code for direct register access. Of course, gdbserver already has arch-specific code for that, and will always keep it to work on older kernels. But, it wouldn't need any new arch-specific code to support new kinds of register data (just pass through what gdb knows to ask for). And, of course, my main motivation is that ugdb could be done without ever having any arch-specific code at all (it really needs none for anything else except the protocol requests that use gdb register numbers or other gdb-specific, arch-specific layouts). (*) There is a single format for each regset and that's in theory the canonical and sole representation of that register data. But some machines (x86, and I hope no others) have two or three regset flavors to represent the same data, because successive generations of hardware (FPU, etc.) extended the register data. On x86-64, when NT_X86_XSTATE is available, that's a superset of all the data in NT_PRFPREG. On x86-32, NT_PRXFPREG (when a machine supports it) is a superset of all the data in NT_PRFPREG in a related but different machine format, and NT_X86_XSTATE (when available) is a superset of both in yet another format (the only slight relief is that NT_X86_XSTATE format is actually the same for 32 and 64). So you can't quite say that any one of these is the single canonical format for that register data, you'll usually get both (or all three, on x86-32) in a core file, but only the oldest/smallest one (subset) is guaranteed available on every given machine. But that weird overlapping tower of regsets is wacky that way because each of these is really a layout that some natural machine instructions use (there are different register names in different instructions that overlap correspondingly too!). But the user<->kernel ABI principle stands that each "natural" chunk of register data has one true format, and any alternative formats, such as old arch-specific ptrace requests not matching the core file formats, are deprecated and not to be instituted in future ABIs. Thanks, Roland