From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25165 invoked by alias); 21 Sep 2011 14:26:37 -0000 Received: (qmail 25155 invoked by uid 22791); 21 Sep 2011 14:26:36 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,TW_XF X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 21 Sep 2011 14:26:20 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=EU1-MAIL.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1R6Nkh-0000Gg-FS from pedro_alves@mentor.com ; Wed, 21 Sep 2011 07:26:19 -0700 Received: from scottsdale.localnet ([172.16.63.104]) by EU1-MAIL.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 21 Sep 2011 15:26:17 +0100 From: Pedro Alves To: "Burkhardt, Glenn" Subject: Re: fail to attach to process on Solaris Date: Wed, 21 Sep 2011 14:26:00 -0000 User-Agent: KMail/1.13.6 (Linux/2.6.38-11-generic; KDE/4.7.0; x86_64; ; ) Cc: gdb@sourceware.org References: <201109022224.29804.pedro@codesourcery.com> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201109211526.15713.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2011-09/txt/msg00080.txt.bz2 On Wednesday 21 September 2011 00:22:21, Burkhardt, Glenn wrote: > The problem appears that thread debug library has callback for register > get operation that's connected to "sol-thread.c:ps_lgetregs()". In the > case that fails, the thread exists, but the calling sequence tries to > lookup registers for a LWP with the same ID as the thread. This is Solaris 9, with the default 1:1 model thread library, right? > #0 find_procinfo_or_die (pid=12276, tid=67) at procfs.c:489 > #1 0x000a1cd0 in procfs_fetch_registers (ops=0x7293d8, > regcache=0x71b1d0, > regnum=-1) at procfs.c:3483 > #2 0x0012feec in sol_thread_fetch_registers (ops=0x718a70, > regcache=0x71b1d0, > regnum=-1) at sol-thread.c:457 > #3 0x00231af0 in target_fetch_registers (regcache=0x71b1d0, regno=-1) > at target.c:3417 > #4 0x00130e48 in ps_lgetregs (ph=0x700998, lwpid=67, > gregset=0xffbfe37c) > at sol-thread.c:923 > #5 0xff0735dc in td_thr_getgregs () from /usr/lib/libthread_db.so.1 > #6 0x0012fff8 in sol_thread_fetch_registers (ops=0x718a70, > regcache=0x71b3b0, > regnum=68) at sol-thread.c:473 But what is the rest of the stack trace? IOW, where's this being called from? > > For this stack trace of 'gdb', 'sol_thread_fetch_registers()' is passed > > (gdb) frame > #6 0x0012fff8 in sol_thread_fetch_registers (ops=0x718a70, > regcache=0x71b3b0, > regnum=68) at sol-thread.c:473 > 473 val = p_td_thr_getgregs (&thandle, gregset); > (gdb) p *regcache > $24 = {descr = 0x84fc40, aspace = 0x7aa258, registers = 0x846c48 "", > register_status = 0x14f37c0 "", readonly_p = 0, ptid = {pid = 12276, > lwp = 0, tid = 67}} > > So it's looking for registers from a thread that's not associated with > an LWP. But the > function 'ps_lgetregs()' is always looking for the registers on the LWP > list. > > I can't see how the callback 'ps_lgetregs()' is connected to the thread > debug library. In fact, the documentation for the thread debug library > seems sparse. I've only been able to find out about it in the man pages > and comments section of sol-thread.c So any pointers to documentation > would be helpful. That's about all there is... Luckily or not, glibc copied the same interface out of Solaris, so people who understand the Linux version can understand the Solaris' one with ease. Older Solaris versions supported an M:N thread model, where multiple user space threads would be mapped to the same kernel thread (LWP), and sometimes even to no kernel thread (LWP) (when they're idle). libthread_db.so is a library the system provides, that debuggers load into their own address space, that serves as bridge between user threads, and however they're mapped underneath. So in this case, GDB wants to fetch the registers of some thread. It asks libthread_db.so for its registers. libthread_db.so internally knows that that thread is mapped into LWP 67, and to serve GDB's initial request, it needs to fetch the registers of LWP 67. libthread_db.so can't read registers off of an LWP itself, but the debugger client can. So libthread_db.so calls back info the debugger through the `ps_lgetregs' function of the proc_service interface (see man ps_lgetregs). ps_lgetregs ends up recursing into sol_thread_fetch_registers, but this time, inferior_ptid points directly into an LWP, so we just pass the request directly to the LWP support layer in procfs.c. It's at this point that things are failing for some reason. So, next step would be understanding whether LWP 67 really still exists or not at the failure point. Can you find that out peeking at /proc/... from the command line? Maybe the LWP had just exited while GDB was attaching to the process, but GDB hadn't processed the exit event yet? Or has GDB failed in the thread->lwp id mappings somewhere? -- Pedro Alves