From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23141 invoked by alias); 2 Aug 2003 15:18:29 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 23133 invoked from network); 2 Aug 2003 15:18:28 -0000 Received: from unknown (HELO concert.shout.net) (204.253.184.25) by sources.redhat.com with SMTP; 2 Aug 2003 15:18:28 -0000 Received: from duracef.shout.net (duracef.shout.net [204.253.184.12]) by concert.shout.net (8.12.9/8.12.9) with ESMTP id h72FISSE012454 for ; Sat, 2 Aug 2003 10:18:28 -0500 Received: from duracef.shout.net (localhost [127.0.0.1]) by duracef.shout.net (8.12.9/8.12.9) with ESMTP id h72FISHK031425 for ; Sat, 2 Aug 2003 10:18:28 -0500 Received: (from mec@localhost) by duracef.shout.net (8.12.9/8.12.9/Submit) id h72FISaN031424 for gdb@sources.redhat.com; Sat, 2 Aug 2003 11:18:28 -0400 Date: Sat, 02 Aug 2003 15:18:00 -0000 From: Michael Elizabeth Chastain Message-Id: <200308021518.h72FISaN031424@duracef.shout.net> To: gdb@sources.redhat.com Subject: backtrace through 'sleep', (1255 and 1253) X-SW-Source: 2003-08/txt/msg00035.txt.bz2 Here's what I've learned so far. This is the code for 'sleep' in /lib/i686/libc.so.6: push %ebp xor %ecx, %ecx mov %esp, %ebp push %edi xor %edx, %edx ... call __i686.get_pc_thunk.bx add $0x7bfab, %ebx sub $0x1cc, %esp ... This is on a red hat linux 8 system, native i686-pc-linux-gnu. This is C code, not hand-coded assembler! The "xor" instructions have been mixed into the prologue. They are just setting some variables to zero. The call to __i686.get_pc_thunk.bx comes from gcc -fpic. Here is the code in i386_frame_cache: frame_unwind_register (next_frame, I386_EBP_REGNUM, buf); cache->base = extract_unsigned_integer (buf, 4); if (cache->base == 0) return cache; cache->save_regs[I386_EIP_REGNUM] = 4; cache->pc = frame_func_unwind (next_frame); if (cache->pc != 0) i386_analyze_prologue (cache->pc, frame_pc_unwind (next_frame), cache); if (cache->locals < 0) { /* We didn't find a valid frame, which means that CACHE->base currently holds the frame pointer for our calling frame. If we're at the start of a function, or somewhere half-way its prologue, the function's frame probably hasn't been fully setup yet. Try to reconstruct the base address for the stack frame by looking at the stack pointer. For truly "frameless" functions this might work too. */ frame_unwind_register (next_frame, I386_ESP_REGNUM, buf); cache->base = extract_unsigned_integer (buf, 4) + cache->sp_offset; } The etiology is: The prologue analyzer fails on this function because of the 'xor %ecx, %ecx'. So cache->locals == -1. /* We didn't find a valid frame ... */ So the code behaves like it's in a frameless function. It grabs the stack pointer and adds an offset to it and uses that for a frame. Whereas, in reality, the pc is in the middle of 'sleep' (well past the prologue), and there is a perfectly good frame. In fact if I undo the bogus re-assignment to cache->base in this case then the stack trace works fine. Now, what to do about it ... Red Hat Linux 8 has an rpm for a debug version of glibc. The glibc-debug rpm installs libraries in /usr/lib/debug, rather than overwriting /lib/i686. I installed glibc-debug and set LD_LIBRARY_PATH to /usr/lib/debug, and it worked! The test cases in both gdb/1253 and gdb/1255 both backtraced just fine! Also, static-linking with glibc works, because the static version of 'sleep' has different code (no -fpic) with a prologue that gdb can digest. So we can either: . Document the problem and tell people to use a debugging glibc or static-link their program. Also send a message to vendors that they may want to make the debugging glibc the default glibc. Vendors may even want to patch their gcc to not mix other instructions into the prologue, because gdb is a lot more sensitive to un-analyzable prologues now. . Ask the gcc guys directly to not schedule any instructions between 'push %ebp' and 'mov %esp, %ebp'. . Change gdb so that the prologue reader is more powerful. It doesn't take much to get through the 'xor %ecx, %ecx' instruction. The trouble is that there could be a billion different instructions in there ('mov any-register, immediate'). The advantage is that this would work without any changes to external software. . Do nothing, let the users suffer. . Something else? Michael C