public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
* How to backtrace an separate stack?
@ 2022-03-03 11:22 Stefan Hajnoczi
  2022-03-07 10:49 ` Pedro Alves
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-03 11:22 UTC (permalink / raw)
  To: gdb; +Cc: qemu-devel, Dr. David Alan Gilbert, tom, pedro

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

Hi,
The QEMU emulator uses coroutines with separate stacks. It can be
challenging to debug coroutines that have yielded because GDB is not
aware of them (no thread is currently executing them).

QEMU has a GDB Python script that helps. It "creates" a stack frame for
a given coroutine by temporarily setting register values and then using
the "bt" command. This works on a live process under ptrace control but
not for coredumps where registers can't be set.

Here is the script (or see the bottom of this email for an inline copy
of the relevant code):
https://gitlab.com/qemu-project/qemu/-/blob/master/scripts/qemugdb/coroutine.py

I hoped that "select-frame address ADDRESS" could be used instead so
this would work on coredumps too. Unfortunately "select-frame" only
searches stack frames that GDB is already aware of, so it cannot be used
to backtrace coroutine stacks.

Is there a way to backtrace a stack at an arbitrary address in GDB?

Thanks,
Stefan
---
def get_jmpbuf_regs(jmpbuf):
    JB_RBX  = 0
    JB_RBP  = 1
    JB_R12  = 2
    JB_R13  = 3
    JB_R14  = 4
    JB_R15  = 5
    JB_RSP  = 6
    JB_PC   = 7

    pointer_guard = get_glibc_pointer_guard()
    return {'rbx': jmpbuf[JB_RBX],
        'rbp': glibc_ptr_demangle(jmpbuf[JB_RBP], pointer_guard),
        'rsp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
        'r12': jmpbuf[JB_R12],
        'r13': jmpbuf[JB_R13],
        'r14': jmpbuf[JB_R14],
        'r15': jmpbuf[JB_R15],
        'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }

def bt_jmpbuf(jmpbuf):
    '''Backtrace a jmpbuf'''
    regs = get_jmpbuf_regs(jmpbuf)
    old = dict()

    # remember current stack frame and select the topmost
    # so that register modifications don't wreck it
    selected_frame = gdb.selected_frame()
    gdb.newest_frame().select()

    for i in regs:
        old[i] = gdb.parse_and_eval('(uint64_t)$%s' % i)

    for i in regs:
        gdb.execute('set $%s = %s' % (i, regs[i]))

    gdb.execute('bt')

    for i in regs:
        gdb.execute('set $%s = %s' % (i, old[i]))

    selected_frame.select()

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-03 11:22 How to backtrace an separate stack? Stefan Hajnoczi
@ 2022-03-07 10:49 ` Pedro Alves
  2022-03-08  9:47   ` Stefan Hajnoczi
  2022-03-07 14:49 ` Florian Weimer
  2022-03-07 16:58 ` Tom Tromey
  2 siblings, 1 reply; 13+ messages in thread
From: Pedro Alves @ 2022-03-07 10:49 UTC (permalink / raw)
  To: Stefan Hajnoczi, gdb; +Cc: qemu-devel, Dr. David Alan Gilbert, tom

On 2022-03-03 11:22, Stefan Hajnoczi wrote:
> Hi,
> The QEMU emulator uses coroutines with separate stacks. It can be
> challenging to debug coroutines that have yielded because GDB is not
> aware of them (no thread is currently executing them).
> 
> QEMU has a GDB Python script that helps. It "creates" a stack frame for
> a given coroutine by temporarily setting register values and then using
> the "bt" command. This works on a live process under ptrace control but
> not for coredumps where registers can't be set.
> 
> Here is the script (or see the bottom of this email for an inline copy
> of the relevant code):
> https://gitlab.com/qemu-project/qemu/-/blob/master/scripts/qemugdb/coroutine.py
> 
> I hoped that "select-frame address ADDRESS" could be used instead so
> this would work on coredumps too. Unfortunately "select-frame" only
> searches stack frames that GDB is already aware of, so it cannot be used
> to backtrace coroutine stacks.
> 
> Is there a way to backtrace a stack at an arbitrary address in GDB?

I don't think there's an easy/great answer.  Maybe it could
be done with a Python unwinder [1]?  See gdb.python/py-unwind-user-regs.py
in the GDB testsuite for an example you could probably start with.

As for something built-in to GDB, this reminded me of a discussion a while ago
around a "frame create" command.  Here were my thoughts back then, I think
still valid:

  https://sourceware.org/legacy-ml/gdb-patches/2015-09/msg00658.html

[1] https://sourceware.org/gdb/onlinedocs/gdb/Unwinding-Frames-in-Python.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-03 11:22 How to backtrace an separate stack? Stefan Hajnoczi
  2022-03-07 10:49 ` Pedro Alves
@ 2022-03-07 14:49 ` Florian Weimer
  2022-03-07 17:30   ` Tom Tromey
  2022-03-07 16:58 ` Tom Tromey
  2 siblings, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2022-03-07 14:49 UTC (permalink / raw)
  To: Stefan Hajnoczi via Gdb
  Cc: Stefan Hajnoczi, tom, qemu-devel, pedro, Dr. David Alan Gilbert

* Stefan Hajnoczi via Gdb:

> The QEMU emulator uses coroutines with separate stacks. It can be
> challenging to debug coroutines that have yielded because GDB is not
> aware of them (no thread is currently executing them).
>
> QEMU has a GDB Python script that helps. It "creates" a stack frame for
> a given coroutine by temporarily setting register values and then using
> the "bt" command. This works on a live process under ptrace control but
> not for coredumps where registers can't be set.
>
> Here is the script (or see the bottom of this email for an inline copy
> of the relevant code):
> https://gitlab.com/qemu-project/qemu/-/blob/master/scripts/qemugdb/coroutine.py
>
> I hoped that "select-frame address ADDRESS" could be used instead so
> this would work on coredumps too. Unfortunately "select-frame" only
> searches stack frames that GDB is already aware of, so it cannot be used
> to backtrace coroutine stacks.
>
> Is there a way to backtrace a stack at an arbitrary address in GDB?

I'm a bit surprised by this.  Conceptually, why would GDB need to know
about stack boundaries?  Is there some heuristic to detect broken
frames?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-03 11:22 How to backtrace an separate stack? Stefan Hajnoczi
  2022-03-07 10:49 ` Pedro Alves
  2022-03-07 14:49 ` Florian Weimer
@ 2022-03-07 16:58 ` Tom Tromey
  2022-03-07 17:18   ` Pedro Alves
  2022-03-14 20:30   ` Tom Tromey
  2 siblings, 2 replies; 13+ messages in thread
From: Tom Tromey @ 2022-03-07 16:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: gdb, qemu-devel, Dr. David Alan Gilbert, tom, pedro

>>>>> "Stefan" == Stefan Hajnoczi <stefanha@redhat.com> writes:

Stefan> I hoped that "select-frame address ADDRESS" could be used instead so
Stefan> this would work on coredumps too. Unfortunately "select-frame" only
Stefan> searches stack frames that GDB is already aware of, so it cannot be used
Stefan> to backtrace coroutine stacks.

I wonder if "select-frame view" is closer to what you want.

I can't attest to how well it works or doesn't work.  I've never tried
it.

Stefan> Is there a way to backtrace a stack at an arbitrary address in GDB?

IMO this is just a longstanding hole in GDB.  Green threads exist, so it
would be good for GDB to have a way to inspect them.

For Ada, this problem was solved by adding knowledge of the runtime to
GDB itself -- that's basically what the "ravenscar" stuff is about.  Not
necessarily an approach I'd recommend.

I think the main problem with adding green thread support is just
finding someone to do the work.  Personally I think a decent approach
would be to add some core code to handle this, and then expose some
necessary bits via the Python API, so that user programs like qemu could
ship Python code that would replicate the ideas in the ravenscar layer
-- grovelling around in the inferior data structures to find info about
the thread.

Note that some of the GDB work might be complicated.  For ravenscar,
there's still an AdaCore-local patch (sent to gdb-patches but not really
suitable for inclusion) to avoid problems with "random thread switches"
-- there were problems where the current thread would, from GDB's
perspective, unexpectedly change when single-stepping through the green
scheduler.

Tom

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-07 16:58 ` Tom Tromey
@ 2022-03-07 17:18   ` Pedro Alves
  2022-03-08  8:43     ` Stefan Hajnoczi
  2022-03-14 20:30   ` Tom Tromey
  1 sibling, 1 reply; 13+ messages in thread
From: Pedro Alves @ 2022-03-07 17:18 UTC (permalink / raw)
  To: Tom Tromey, Stefan Hajnoczi; +Cc: gdb, qemu-devel, Dr. David Alan Gilbert

On 2022-03-07 16:58, Tom Tromey wrote:
>>>>>> "Stefan" == Stefan Hajnoczi <stefanha@redhat.com> writes:
> 
> Stefan> I hoped that "select-frame address ADDRESS" could be used instead so
> Stefan> this would work on coredumps too. Unfortunately "select-frame" only
> Stefan> searches stack frames that GDB is already aware of, so it cannot be used
> Stefan> to backtrace coroutine stacks.
> 
> I wonder if "select-frame view" is closer to what you want.
> 
> I can't attest to how well it works or doesn't work.  I've never tried
> it.

A backtrace after "select-frame view" will still start at the
current (machine register's) frame.  Maybe it's sufficient to emulate it with
a sequence of "up" + "frame", though.  Keep in mind that you'll lose the view
with "info threads" or any command that flushes the frame cache internally,
as I mentioned in that ancient discussion.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-07 14:49 ` Florian Weimer
@ 2022-03-07 17:30   ` Tom Tromey
  2022-03-09 10:06     ` Florian Weimer
  0 siblings, 1 reply; 13+ messages in thread
From: Tom Tromey @ 2022-03-07 17:30 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Stefan Hajnoczi via Gdb, Stefan Hajnoczi, tom, qemu-devel, pedro,
	Dr. David Alan Gilbert

Florian> I'm a bit surprised by this.  Conceptually, why would GDB need to know
Florian> about stack boundaries?  Is there some heuristic to detect broken
Florian> frames?

Yes, the infamous "previous frame inner to this frame" error message.  I
think this is primarily intended to detect stack trashing, but maybe it
also serves to work around bad debuginfo or bugs in the unwinders.

This error was disabled for cases where the GCC split stack feature is
used.  There's been requests to disable it in other cases as well, I
think.

Tom

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-07 17:18   ` Pedro Alves
@ 2022-03-08  8:43     ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-08  8:43 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Tom Tromey, gdb, qemu-devel, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 1569 bytes --]

On Mon, Mar 07, 2022 at 05:18:12PM +0000, Pedro Alves wrote:
> On 2022-03-07 16:58, Tom Tromey wrote:
> >>>>>> "Stefan" == Stefan Hajnoczi <stefanha@redhat.com> writes:
> > 
> > Stefan> I hoped that "select-frame address ADDRESS" could be used instead so
> > Stefan> this would work on coredumps too. Unfortunately "select-frame" only
> > Stefan> searches stack frames that GDB is already aware of, so it cannot be used
> > Stefan> to backtrace coroutine stacks.
> > 
> > I wonder if "select-frame view" is closer to what you want.
> > 
> > I can't attest to how well it works or doesn't work.  I've never tried
> > it.
> 
> A backtrace after "select-frame view" will still start at the
> current (machine register's) frame.  Maybe it's sufficient to emulate it with
> a sequence of "up" + "frame", though.  Keep in mind that you'll lose the view
> with "info threads" or any command that flushes the frame cache internally,
> as I mentioned in that ancient discussion.

I tried the following with gdb (11.2-1.fc35):

  select-frame view STACK_ADDR PC
  frame <-- this displays the top coroutine stack frame
  up
  frame <-- this displays the secondmost main stack frame

Unfortunately "up" returns to the main stack instead of unwinding the
coroutine stack.

"i r" and "i lo" still show values from the main stack frame after
"select-frame view". This makes sense since "select-frame view" only
sets the stack and PC addresses, not the register contents.

Alas, "select-frame view" isn't quite enough from what I can tell.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-07 10:49 ` Pedro Alves
@ 2022-03-08  9:47   ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-08  9:47 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb, qemu-devel, Dr. David Alan Gilbert, tom

[-- Attachment #1: Type: text/plain, Size: 7525 bytes --]

On Mon, Mar 07, 2022 at 10:49:47AM +0000, Pedro Alves wrote:
> On 2022-03-03 11:22, Stefan Hajnoczi wrote:
> > Hi,
> > The QEMU emulator uses coroutines with separate stacks. It can be
> > challenging to debug coroutines that have yielded because GDB is not
> > aware of them (no thread is currently executing them).
> > 
> > QEMU has a GDB Python script that helps. It "creates" a stack frame for
> > a given coroutine by temporarily setting register values and then using
> > the "bt" command. This works on a live process under ptrace control but
> > not for coredumps where registers can't be set.
> > 
> > Here is the script (or see the bottom of this email for an inline copy
> > of the relevant code):
> > https://gitlab.com/qemu-project/qemu/-/blob/master/scripts/qemugdb/coroutine.py
> > 
> > I hoped that "select-frame address ADDRESS" could be used instead so
> > this would work on coredumps too. Unfortunately "select-frame" only
> > searches stack frames that GDB is already aware of, so it cannot be used
> > to backtrace coroutine stacks.
> > 
> > Is there a way to backtrace a stack at an arbitrary address in GDB?
> 
> I don't think there's an easy/great answer.  Maybe it could
> be done with a Python unwinder [1]?  See gdb.python/py-unwind-user-regs.py
> in the GDB testsuite for an example you could probably start with.

I tried writing an unwinder that returns the topmost coroutine stack
frame. "info threads" + "bt" shows the main stack though:

  (gdb) qemu coroutine 0x55be3c592120
    Id   Target Id                         Frame
  * 1    Thread 0x7f7abbdd4f00 (LWP 58989) Returning a frame with rip 0x55be3ae19ff4
  0x00007f7abcd2489e in __ppoll (fds=0x21, nfds=6717500806073509987, timeout=<optimized out>, sigmask=0x1f000) at ../sysdeps/unix/sysv/linux/ppoll.c:43
  ...
  #0  0x00007f7abcd2489e in __ppoll (fds=0x55be3c78a9f0, nfds=43, timeout=<optimized out>, timeout@entry=0x7ffef27cc040, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43
  #1  0x000055be3ae26435 in ppoll (__ss=0x0, __timeout=0x7ffef27cc040, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:81

I was hoping that frame #1 would be the coroutine stack since the debug
message "Returning a frame with rip 0x55be3ae19ff4" shows the unwinder
was invoked.

I've included the code below in case anyone has suggestions for making
the unwinder work. See bt_jmpbuf() and the Unwinder class.

The idea is that bt_jmpbuf() passes the registers of the coroutine to
the unwinder and invokes "info thread" + "bt". The unwinder only returns
a stack frame the first time it's invoked. It cannot unwind successive
stack frames so it disables itself after returning the topmost one (I
was hoping GDB's built-in unwinder would take over from there).

Thanks,
Stefan
---
#
# GDB debugging support
#
# Copyright 2012 Red Hat, Inc. and/or its affiliates
#
# Authors:
#  Avi Kivity <avi@redhat.com>
#
# This work is licensed under the terms of the GNU GPL, version 2
# or later.  See the COPYING file in the top-level directory.

import gdb
import gdb.unwinder

VOID_PTR = gdb.lookup_type('void').pointer()


class FrameId(object):
    def __init__(self, sp, pc):
        self.sp = sp
        self.pc = pc


class Unwinder(gdb.unwinder.Unwinder):
    def __init__(self):
        super(Unwinder, self).__init__('QEMU coroutine unwinder')
        self._regs = None

    def arm(self, regs):
        self._regs = regs

    def __call__(self, pending_frame):
        print('A')
        if not self._regs:
            return None
        regs = self._regs
        self._regs = None

        frame_id = FrameId(regs['rbp'], regs['rip'])
        unwind_info = pending_frame.create_unwind_info(frame_id)
        for reg_name in regs:
            unwind_info.add_saved_register(reg_name, regs[reg_name])

        print('Returning a frame with rip 0x%x' % regs['rip'])
        return unwind_info


unwinder = Unwinder()
gdb.unwinder.register_unwinder(None, unwinder)


def pthread_self():
    '''Fetch pthread_self() from the glibc start_thread function.'''
    f = gdb.newest_frame()
    while f.name() != 'start_thread':
        f = f.older()
        if f is None:
            return gdb.parse_and_eval('$fs_base')

    try:
        return f.read_var("arg")
    except ValueError:
        return gdb.parse_and_eval('$fs_base')

def get_glibc_pointer_guard():
    '''Fetch glibc pointer guard value'''
    fs_base = pthread_self()
    return gdb.parse_and_eval('*(uint64_t*)((uint64_t)%s + 0x30)' % fs_base)

def glibc_ptr_demangle(val, pointer_guard):
    '''Undo effect of glibc's PTR_MANGLE()'''
    return gdb.parse_and_eval('(((uint64_t)%s >> 0x11) | ((uint64_t)%s << (64 - 0x11))) ^ (uint64_t)%s' % (val, val, pointer_guard))

def get_jmpbuf_regs(jmpbuf):
    JB_RBX  = 0
    JB_RBP  = 1
    JB_R12  = 2
    JB_R13  = 3
    JB_R14  = 4
    JB_R15  = 5
    JB_RSP  = 6
    JB_PC   = 7

    pointer_guard = get_glibc_pointer_guard()
    return {'rbx': jmpbuf[JB_RBX],
        'rbp': glibc_ptr_demangle(jmpbuf[JB_RBP], pointer_guard),
        'rsp': glibc_ptr_demangle(jmpbuf[JB_RSP], pointer_guard),
        'r12': jmpbuf[JB_R12],
        'r13': jmpbuf[JB_R13],
        'r14': jmpbuf[JB_R14],
        'r15': jmpbuf[JB_R15],
        'rip': glibc_ptr_demangle(jmpbuf[JB_PC], pointer_guard) }

def bt_jmpbuf(jmpbuf):
    '''Backtrace a jmpbuf'''
    regs = get_jmpbuf_regs(jmpbuf)
    unwinder.arm(regs)
    gdb.execute('info threads')
    gdb.execute('bt')

def co_cast(co):
    return co.cast(gdb.lookup_type('CoroutineUContext').pointer())

def coroutine_to_jmpbuf(co):
    coroutine_pointer = co_cast(co)
    return coroutine_pointer['env']['__jmpbuf']


class CoroutineCommand(gdb.Command):
    '''Display coroutine backtrace'''
    def __init__(self):
        gdb.Command.__init__(self, 'qemu coroutine', gdb.COMMAND_DATA,
                             gdb.COMPLETE_NONE)

    def invoke(self, arg, from_tty):
        argv = gdb.string_to_argv(arg)
        if len(argv) != 1:
            gdb.write('usage: qemu coroutine <coroutine-pointer>\n')
            return

        bt_jmpbuf(coroutine_to_jmpbuf(gdb.parse_and_eval(argv[0])))

class CoroutineBt(gdb.Command):
    '''Display backtrace including coroutine switches'''
    def __init__(self):
        gdb.Command.__init__(self, 'qemu bt', gdb.COMMAND_STACK,
                             gdb.COMPLETE_NONE)

    def invoke(self, arg, from_tty):

        gdb.execute("bt")

        if gdb.parse_and_eval("qemu_in_coroutine()") == False:
            return

        co_ptr = gdb.parse_and_eval("qemu_coroutine_self()")

        while True:
            co = co_cast(co_ptr)
            co_ptr = co["base"]["caller"]
            if co_ptr == 0:
                break
            gdb.write("Coroutine at " + str(co_ptr) + ":\n")
            bt_jmpbuf(coroutine_to_jmpbuf(co_ptr))

class CoroutineSPFunction(gdb.Function):
    def __init__(self):
        gdb.Function.__init__(self, 'qemu_coroutine_sp')

    def invoke(self, addr):
        return get_jmpbuf_regs(coroutine_to_jmpbuf(addr))['rsp'].cast(VOID_PTR)

class CoroutinePCFunction(gdb.Function):
    def __init__(self):
        gdb.Function.__init__(self, 'qemu_coroutine_pc')

    def invoke(self, addr):
        return get_jmpbuf_regs(coroutine_to_jmpbuf(addr))['rip'].cast(VOID_PTR)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-07 17:30   ` Tom Tromey
@ 2022-03-09 10:06     ` Florian Weimer
  2022-03-09 19:50       ` Tom Tromey
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2022-03-09 10:06 UTC (permalink / raw)
  To: Tom Tromey
  Cc: Stefan Hajnoczi via Gdb, Stefan Hajnoczi, qemu-devel, pedro,
	Dr. David Alan Gilbert

* Tom Tromey:

> Florian> I'm a bit surprised by this.  Conceptually, why would GDB need to know
> Florian> about stack boundaries?  Is there some heuristic to detect broken
> Florian> frames?
>
> Yes, the infamous "previous frame inner to this frame" error message.  I
> think this is primarily intended to detect stack trashing, but maybe it
> also serves to work around bad debuginfo or bugs in the unwinders.
>
> This error was disabled for cases where the GCC split stack feature is
> used.  There's been requests to disable it in other cases as well, I
> think.

Is there a user-level command to disable the check manually?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-09 10:06     ` Florian Weimer
@ 2022-03-09 19:50       ` Tom Tromey
  0 siblings, 0 replies; 13+ messages in thread
From: Tom Tromey @ 2022-03-09 19:50 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Tom Tromey, Stefan Hajnoczi via Gdb, Stefan Hajnoczi, qemu-devel,
	pedro, Dr. David Alan Gilbert

>> Yes, the infamous "previous frame inner to this frame" error message.  I
>> think this is primarily intended to detect stack trashing, but maybe it
>> also serves to work around bad debuginfo or bugs in the unwinders.

Florian> Is there a user-level command to disable the check manually?

I don't think so.  I think it would be fine if someone wanted to add
one.

Tom

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-07 16:58 ` Tom Tromey
  2022-03-07 17:18   ` Pedro Alves
@ 2022-03-14 20:30   ` Tom Tromey
  2022-03-15 14:17     ` Stefan Hajnoczi
  1 sibling, 1 reply; 13+ messages in thread
From: Tom Tromey @ 2022-03-14 20:30 UTC (permalink / raw)
  To: Tom Tromey
  Cc: Stefan Hajnoczi, gdb, qemu-devel, pedro, Dr. David Alan Gilbert

Tom> IMO this is just a longstanding hole in GDB.  Green threads exist,
Tom> so it would be good for GDB to have a way to inspect them.

I took a stab at implementing this recently.  It's still very rough but
it's good enough to discuss whether it's something I should try to
polish.

For testing the proof of concept, I used vireo, a simple user-space
thread setup based on makecontext.

https://github.com/geofft/vireo

I've appended the Python code that teaches gdb how to find vireo
threads.  It's incomplete, as in, if you re-'run', it will fail.

Here's what a session looks like:

    (gdb) cont
    Continuing.
    [New Vireo Thread 1]
    [New Vireo Thread 2]
    send 0 from 0 to 1

    Thread 1 "pingpong" hit Breakpoint 2, pingpong () at examples/pingpong.c:27
    27			int i = vireo_recv(&who);
    (gdb) info thread
      Id   Target Id                                    Frame 
    * 1    Thread 0x7ffff7cb2b80 (LWP 42208) "pingpong" pingpong () at examples/pingpong.c:27
      2    Vireo Thread 1 "pingpong"                    pingpong () at examples/pingpong.c:27
      3    Vireo Thread 2 "pingpong"                    pingpong () at examples/pingpong.c:27
    (gdb) thread 3
    [Switching to thread 3 (Vireo Thread 2)]
    #0  pingpong () at examples/pingpong.c:27
    27			int i = vireo_recv(&who);
    (gdb) bt
    #0  pingpong () at examples/pingpong.c:27
    #1  0x00007ffff7d329c0 in ?? () from /lib64/libc.so.6
    #2  0x00007ffff7fc20e0 in ?? () from /home/tromey/gdb/vireo/examples/../libvireo.so
    #3  0x0000000000000000 in ?? ()

I realize now, writing this, that the approach to underlying threads
should be improved.  These need to be tracked more actively, so that
breakpoint stops can report the corresponding green thread.  You can see
above that this isn't done.  Also I think the "Frame" info is wrong
right now.

Anyway, the basic idea is to let Python tell gdb about the existence of
green threads, and let gdb mostly treat them identically to OS threads.
Under the hood, things like 'continue' will use the underlying OS
thread.

You can play with this if you want.  It's on 'submit/green-threads' on
my github.  Be warned that I rebase a lot.

Some things to work out:

- Exactly how should the 'underlying thread' concept work?
  Hooking into the inferior's scheduler seems slow, and also
  like it could present a chicken/egg problem.
  Maybe it needs a "green thread provider" object so that on
  a stop we can query that to see if the green thread corresponding
  to an OS thread is already known.

- Do we need a special hook to stop unwinding per-green-thread.
  You may not want to allow unwinding through the scheduler.

Tom


import gdb

thread_map = {}

main_thread = None

# From glibc/sysdeps/unix/sysv/linux/x86/sys/ucontext.h
x8664_regs = [ 'r8', 'r9', 'r10', 'r11', 'r12', 'r13', 'r14',
               'r15', 'rdi', 'rsi', 'rbp', 'rbx', 'rdx', 'rax',
               'rcx', 'rsp', 'rip', 'efl', 'csgsfs', 'err',
               'trapno', 'oldmask', 'cr2' ]

def vireo_current():
    return int(gdb.parse_and_eval('curenv')) + 1

class VireoGreenThread:
    def __init__(self, tid):
        self.tid = tid

    def _get_state(self):
        return gdb.parse_and_eval('envs')[self.tid]['state']

    def fetch(self, reg):
        """Fetch REG from memory."""
        global x8664_regs
        global thread_map
        thread = thread[self.tid]
        state = self._get_state()
        gregs = state['uc_mcontext']['gregs']
        for i in range(0, len(x8664_regs)):
            if reg is None or reg == x8664_regs[i]:
                thread.write_register(x8664_regs[i], gregs[i])

    def store(self, reg):
        global x8664_regs
        global thread_map
        thread = thread[self.tid]
        state = self._get_state()
        gregs = state['uc_mcontext']['gregs']
        for i in range(0, len(x8664_regs)):
            if reg is None or reg == x8664_regs[i]:
                gregs[i] = thread.read_register(x8664_regs[i])

    def name(self):
        return "Vireo Thread " + str(self.tid)

    def underlying_thread(self):
        if vireo_current() == self.tid:
            global main_thread
            return main_thread
        return None

class VFinish(gdb.FinishBreakpoint):
    def stop(self):
        tid = int(self.return_value) + 1
        global thread_map
        thread_map[tid] = gdb.create_green_thread(tid, VireoGreenThread(tid))
        return False

class VCreate(gdb.Breakpoint):
    def stop(self):
        VFinish(gdb.newest_frame(), True)
        return False

class VExit(gdb.Breakpoint):
    def stop(self):
        global main_thread
        if main_thread is None:
            main_thread = gdb.selected_thread()
        global thread_map
        tid = vireo_current()
        if tid in thread_map:
            thread_map[tid].set_exited()
            del thread_map[tid]

VCreate('vireo_create', internal=True)
VExit('vireo_exit', internal=True)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-14 20:30   ` Tom Tromey
@ 2022-03-15 14:17     ` Stefan Hajnoczi
  2022-03-18 21:13       ` Tom Tromey
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-03-15 14:17 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb, qemu-devel, pedro, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 6226 bytes --]

On Mon, Mar 14, 2022 at 02:30:16PM -0600, Tom Tromey wrote:
> Tom> IMO this is just a longstanding hole in GDB.  Green threads exist,
> Tom> so it would be good for GDB to have a way to inspect them.
> 
> I took a stab at implementing this recently.  It's still very rough but
> it's good enough to discuss whether it's something I should try to
> polish.
> 
> For testing the proof of concept, I used vireo, a simple user-space
> thread setup based on makecontext.
> 
> https://github.com/geofft/vireo
> 
> I've appended the Python code that teaches gdb how to find vireo
> threads.  It's incomplete, as in, if you re-'run', it will fail.
> 
> Here's what a session looks like:
> 
>     (gdb) cont
>     Continuing.
>     [New Vireo Thread 1]
>     [New Vireo Thread 2]
>     send 0 from 0 to 1
> 
>     Thread 1 "pingpong" hit Breakpoint 2, pingpong () at examples/pingpong.c:27
>     27			int i = vireo_recv(&who);
>     (gdb) info thread
>       Id   Target Id                                    Frame 
>     * 1    Thread 0x7ffff7cb2b80 (LWP 42208) "pingpong" pingpong () at examples/pingpong.c:27
>       2    Vireo Thread 1 "pingpong"                    pingpong () at examples/pingpong.c:27
>       3    Vireo Thread 2 "pingpong"                    pingpong () at examples/pingpong.c:27
>     (gdb) thread 3
>     [Switching to thread 3 (Vireo Thread 2)]
>     #0  pingpong () at examples/pingpong.c:27
>     27			int i = vireo_recv(&who);
>     (gdb) bt
>     #0  pingpong () at examples/pingpong.c:27
>     #1  0x00007ffff7d329c0 in ?? () from /lib64/libc.so.6
>     #2  0x00007ffff7fc20e0 in ?? () from /home/tromey/gdb/vireo/examples/../libvireo.so
>     #3  0x0000000000000000 in ?? ()
> 
> I realize now, writing this, that the approach to underlying threads
> should be improved.  These need to be tracked more actively, so that
> breakpoint stops can report the corresponding green thread.  You can see
> above that this isn't done.  Also I think the "Frame" info is wrong
> right now.
> 
> Anyway, the basic idea is to let Python tell gdb about the existence of
> green threads, and let gdb mostly treat them identically to OS threads.
> Under the hood, things like 'continue' will use the underlying OS
> thread.
> 
> You can play with this if you want.  It's on 'submit/green-threads' on
> my github.  Be warned that I rebase a lot.

This looks cool! Would it be useful to see a port of QEMU's coroutine.py
script to your green threads API?

> Some things to work out:
> 
> - Exactly how should the 'underlying thread' concept work?
>   Hooking into the inferior's scheduler seems slow, and also
>   like it could present a chicken/egg problem.
>   Maybe it needs a "green thread provider" object so that on
>   a stop we can query that to see if the green thread corresponding
>   to an OS thread is already known.

QEMU's coroutines aren't in a scheduler list so there is no way to
enumerate all coroutines. The Python script can register a GDB command
(e.g. "qemu coroutine 0x12345678") that makes GDB aware of the
coroutine.

The only automatic way of cleaning up coroutines that I can think of is
by placing a breakpoint on QEMU's internal coroutine deletion function
from the Python script and then making GDB aware that the coroutine no
longer exists. It looks like this is the approach your vireo script
takes.

> - Do we need a special hook to stop unwinding per-green-thread.
>   You may not want to allow unwinding through the scheduler.

When we've used 'bt' on coroutine stacks in the past, reaching the end
of the stack wasn't a problem for the user. There is no error message
from GDB.

> 
> Tom
> 
> 
> import gdb
> 
> thread_map = {}
> 
> main_thread = None
> 
> # From glibc/sysdeps/unix/sysv/linux/x86/sys/ucontext.h
> x8664_regs = [ 'r8', 'r9', 'r10', 'r11', 'r12', 'r13', 'r14',
>                'r15', 'rdi', 'rsi', 'rbp', 'rbx', 'rdx', 'rax',
>                'rcx', 'rsp', 'rip', 'efl', 'csgsfs', 'err',
>                'trapno', 'oldmask', 'cr2' ]
> 
> def vireo_current():
>     return int(gdb.parse_and_eval('curenv')) + 1
> 
> class VireoGreenThread:
>     def __init__(self, tid):
>         self.tid = tid
> 
>     def _get_state(self):
>         return gdb.parse_and_eval('envs')[self.tid]['state']
> 
>     def fetch(self, reg):
>         """Fetch REG from memory."""
>         global x8664_regs
>         global thread_map
>         thread = thread[self.tid]
>         state = self._get_state()
>         gregs = state['uc_mcontext']['gregs']
>         for i in range(0, len(x8664_regs)):
>             if reg is None or reg == x8664_regs[i]:
>                 thread.write_register(x8664_regs[i], gregs[i])
> 
>     def store(self, reg):
>         global x8664_regs
>         global thread_map
>         thread = thread[self.tid]
>         state = self._get_state()
>         gregs = state['uc_mcontext']['gregs']
>         for i in range(0, len(x8664_regs)):
>             if reg is None or reg == x8664_regs[i]:
>                 gregs[i] = thread.read_register(x8664_regs[i])
> 
>     def name(self):
>         return "Vireo Thread " + str(self.tid)
> 
>     def underlying_thread(self):
>         if vireo_current() == self.tid:
>             global main_thread
>             return main_thread
>         return None
> 
> class VFinish(gdb.FinishBreakpoint):
>     def stop(self):
>         tid = int(self.return_value) + 1
>         global thread_map
>         thread_map[tid] = gdb.create_green_thread(tid, VireoGreenThread(tid))
>         return False
> 
> class VCreate(gdb.Breakpoint):
>     def stop(self):
>         VFinish(gdb.newest_frame(), True)
>         return False
> 
> class VExit(gdb.Breakpoint):
>     def stop(self):
>         global main_thread
>         if main_thread is None:
>             main_thread = gdb.selected_thread()
>         global thread_map
>         tid = vireo_current()
>         if tid in thread_map:
>             thread_map[tid].set_exited()
>             del thread_map[tid]
> 
> VCreate('vireo_create', internal=True)
> VExit('vireo_exit', internal=True)
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How to backtrace an separate stack?
  2022-03-15 14:17     ` Stefan Hajnoczi
@ 2022-03-18 21:13       ` Tom Tromey
  0 siblings, 0 replies; 13+ messages in thread
From: Tom Tromey @ 2022-03-18 21:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Tom Tromey, gdb, qemu-devel, pedro, Dr. David Alan Gilbert

>> You can play with this if you want.  It's on 'submit/green-threads' on
>> my github.  Be warned that I rebase a lot.

Stefan> This looks cool! Would it be useful to see a port of QEMU's coroutine.py
Stefan> script to your green threads API?

Wouldn't hurt :)

Stefan> QEMU's coroutines aren't in a scheduler list so there is no way to
Stefan> enumerate all coroutines. The Python script can register a GDB command
Stefan> (e.g. "qemu coroutine 0x12345678") that makes GDB aware of the
Stefan> coroutine.

On the one hand, maybe this means the model is wrong.

On the other, I suppose qemu could also have a new command to create a
temporary "thread", given a ucontext_t (or whatever), and switch to it.
Then when the user "continue"s, the thread could be deleted again.

Tom

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-18 21:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03 11:22 How to backtrace an separate stack? Stefan Hajnoczi
2022-03-07 10:49 ` Pedro Alves
2022-03-08  9:47   ` Stefan Hajnoczi
2022-03-07 14:49 ` Florian Weimer
2022-03-07 17:30   ` Tom Tromey
2022-03-09 10:06     ` Florian Weimer
2022-03-09 19:50       ` Tom Tromey
2022-03-07 16:58 ` Tom Tromey
2022-03-07 17:18   ` Pedro Alves
2022-03-08  8:43     ` Stefan Hajnoczi
2022-03-14 20:30   ` Tom Tromey
2022-03-15 14:17     ` Stefan Hajnoczi
2022-03-18 21:13       ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).