From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [170.10.133.74]) by sourceware.org (Postfix) with ESMTPS id 9A2D93858D3C for ; Thu, 5 May 2022 18:17:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9A2D93858D3C Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-212-JKlQtmIYPW22kUx0LskcMQ-1; Thu, 05 May 2022 14:17:21 -0400 X-MC-Unique: JKlQtmIYPW22kUx0LskcMQ-1 Received: by mail-ej1-f72.google.com with SMTP id qw30-20020a1709066a1e00b006f45e7f44b0so3065559ejc.5 for ; Thu, 05 May 2022 11:17:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=IKYqZBdaboPiEvC4J2PjZwUlGdC8lv2hoMbxzpGe3EQ=; b=n5e1sBmYsE10FIm66F6axkR1/2CYEAI5eupES/j3B8CrJL4d9iM9/MNU8dJMKg41RM 2dJitgWCLmFzX2qwGQbWoEPDF72U1nGoF+FyigfJQxsUoEFjA8MvRK9mSOwVhLgeObDX d68awQLRWv6YxYCBHIQm/LocGGGgpkOWP9+yr1ack/ngSXGjT1CLp/BUwVerEZQVxs9N 3OptIzktqddu5jyJu1WQ9Nt0vn69AMQE2d1v44GR9AFkZzKd034CALHMPmvaRTSqCyce JNpJw3yMxDl45URLRagpPI37rcXLxCkBS1TiIaB/y6QTeb1HJtptGLfDVFBWPwEv6luk NsxQ== X-Gm-Message-State: AOAM530i73Pqvptg/7ToBvbu3fpjxVXBG2SVKg9ZdJkY4/nrm7tGdur7 N2a5bsuZ3SOA+YQ9dyckHZKM1qOrXY3F/1NJLLMJLLWbslDBUOIyvtz46+KD6IqULFGqKUAsWaH vILqSbZdrKULv8uoggBr3Fg== X-Received: by 2002:a05:6402:51c8:b0:427:b374:6af4 with SMTP id r8-20020a05640251c800b00427b3746af4mr27239173edd.349.1651774638271; Thu, 05 May 2022 11:17:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx2EI9MgBichk2fw+pfaHeYEuiWNktBqE1lIfDZyUrQNJQRoVfIhXotEXCf/8Chk10OOFjTRw== X-Received: by 2002:a05:6402:51c8:b0:427:b374:6af4 with SMTP id r8-20020a05640251c800b00427b3746af4mr27239127edd.349.1651774637682; Thu, 05 May 2022 11:17:17 -0700 (PDT) Received: from localhost (92.40.179.191.threembb.co.uk. [92.40.179.191]) by smtp.gmail.com with ESMTPSA id og21-20020a1709071dd500b006f3ef214df8sm886576ejc.94.2022.05.05.11.17.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 May 2022 11:17:16 -0700 (PDT) From: Andrew Burgess To: Simon Marchi , gdb-patches@sourceware.org Cc: Andrew Burgess Subject: Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook In-Reply-To: <50911018-847c-e060-6517-395f9410c80c@simark.ca> References: <50911018-847c-e060-6517-395f9410c80c@simark.ca> Date: Thu, 05 May 2022 19:17:15 +0100 Message-ID: <878rrfn8t0.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_ABUSEAT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2022 18:17:27 -0000 Simon Marchi via Gdb-patches writes: >> +@defivar DisassembleInfo address >> +An integer containing the address at which @value{GDBN} wishes to >> +disassemble a single instruction. >> +@end defivar > > This use of defivar results in the text > > Instance Variable of DisassembleInfo: address > > Just a nit, but IWBN to use Python nomenclature, this is a read-only > property or attribute. However, if the rest of the Python doc uses > this, I don't mind continuing with what we have for consistency. > >> +@defmethod Disassembler __init__ (name) >> +The constructor takes @var{name}, a string, which should be a short >> +name for this disassembler. Currently, this name is only used in some >> +debug output. > > I would probably avoid saying that last sentence, it's bound to become > stale at some point. > >> +The @code{DisassemblerResult} type is defined as a possible class to >> +represent disassembled instructions, but it is not required to use >> +this type, so long as the required attributes are present. > > I think it would be more straighforward and preferable to tell people to > just return a DisassemblerResult object, unless you see some advantage > to this that I don't. > >> +@smallexample >> +class ExampleDisassembler(gdb.disassembler.Disassembler): >> + def __init__(self): >> + super(ExampleDisassembler, self).__init__("ExampleDisassembler") > > Can we use the > > super().__init__(...) > > syntax, now that we dropped support for Python 2? Updated. > >> + >> + def __call__(self, info): >> + result = gdb.disassembler.builtin_disassemble(info) >> + if result.string is not None: > > Can builtin_disassemble really return None? Nope, fixed. > >> + length = result.length >> + text = result.string + "\t## Comment" >> + return gdb.disassembler.DisassemblerResult(length, text) > > Doesn't really matter, but this could probably modify the string in > the existing DisassemblerResult object, and then return it: > > result.string += "\t## Comment" > return result > > But I'm fine with what you have, if you think it's clearer for an > example. The problem is all the properties of DisassemblerResult are read-only. Given it's pretty light-weight I didn't really see any problem just creating a new one. I suspect that I might end up changing that in the future, but for now I don't see any great need to allow for modifications right now. I figure extending the API to allow modifications in the future is fine if/when that becomes critical. Let me know if that's going to be a problem and I can get the setting code added now. > >> + >> +gdb.disassembler.register_disassembler(ExampleDisassembler()) >> +@end smallexample >> + >> @node Python Auto-loading >> @subsection Python Auto-loading >> @cindex Python auto-loading >> diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py >> new file mode 100644 >> index 00000000000..19ec0ecf82f >> --- /dev/null >> +++ b/gdb/python/lib/gdb/disassembler.py >> @@ -0,0 +1,109 @@ >> +# Copyright (C) 2021-2022 Free Software Foundation, Inc. >> + >> +# This program is free software; you can redistribute it and/or modify >> +# it under the terms of the GNU General Public License as published by >> +# the Free Software Foundation; either version 3 of the License, or >> +# (at your option) any later version. >> +# >> +# This program is distributed in the hope that it will be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program. If not, see . >> + >> +"""Disassembler related module.""" >> + >> +import gdb >> +import _gdb.disassembler >> + >> +from _gdb.disassembler import * > > Can we avoid glob imports? This makes it not clear what is actually > imported. Or, is this to re-export what _gdb.disassembler exports? If > so, maybe add a comment. It's the second case. We do something similar in gdb/python/lib/gdb/__init__.py. I've added a comment in this case explaining what's going on. > >> + >> +# Module global dictionary of gdb.disassembler.Disassembler objects. >> +# The keys of this dictionary are bfd architecture names, or the >> +# special value None. >> +# >> +# When a request to disassemble comes in we first lookup the bfd >> +# architecture name from the gdbarch, if that name exists in this >> +# dictionary then we use that Disassembler object. >> +# >> +# If there's no architecture specific disassembler then we look for >> +# the key None in this dictionary, and if that key exists, we use that >> +# disassembler. >> +# >> +# If none of the above checks found a suitable disassembler, then no >> +# disassembly is performed in Python. >> +_disassemblers_dict = {} >> + >> + >> +class Disassembler(object): >> + """A base class from which all user implemented disassemblers must >> + inherit.""" >> + >> + def __init__(self, name): >> + """Constructor. Takes a name, which should be a string, which can be >> + used to identify this disassembler in diagnostic messages.""" >> + self.name = name >> + >> + def __call__(self, info): >> + """A default implementation of __call__. All sub-classes must >> + override this method. Calling this default implementation will throw >> + a NotImplementedError exception.""" >> + raise NotImplementedError("Disassembler.__call__") >> + >> + >> +def register_disassembler(disassembler, architecture=None): >> + """Register a disassembler. DISASSEMBLER is a sub-class of >> + gdb.disassembler.Disassembler. ARCHITECTURE is either None or a >> + string, the name of an architecture known to GDB. >> + >> + DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or > > disassmbler -> disassembler Fixed. > >> + all architectures when ARCHITECTURE is None. >> + >> + Returns the previous disassembler registered with this >> + ARCHITECTURE value. >> + """ >> + >> + if not isinstance(disassembler, Disassembler) and disassembler is not None: >> + raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler") > > I see passing None for DISASSEMBLER unregisters the currently registered > disassembler. I am not sure it was mentioned in the doc. > >> + >> + old = None >> + if architecture in _disassemblers_dict: >> + old = _disassemblers_dict[architecture] >> + del _disassemblers_dict[architecture] >> + if disassembler is not None: >> + _disassemblers_dict[architecture] = disassembler >> + >> + # Call the private _set_enabled function within the >> + # _gdb.disassembler module. This function sets a global flag >> + # within GDB's C++ code that enables or dissables the Python >> + # disassembler functionality, this improves performance of the >> + # disassembler by avoiding unneeded calls into Python when we know >> + # that no disassemblers are registered. >> + _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0) >> + return old >> + >> + >> +def _print_insn(info): >> + """This function is called by GDB when it wants to disassemble an >> + instruction. INFO describes the instruction to be >> + disassembled.""" >> + >> + def lookup_disassembler(arch): >> + try: >> + name = arch.name() >> + if name is None: >> + return None >> + if name in _disassemblers_dict: >> + return _disassemblers_dict[name] >> + if None in _disassemblers_dict: >> + return _disassemblers_dict[None] >> + return None >> + except: >> + return None > > There doesn't seem to be anything that throws in here, is there? > >> + >> + disassembler = lookup_disassembler(info.architecture) >> + if disassembler is None: >> + return None >> + return disassembler(info) >> diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c >> new file mode 100644 >> index 00000000000..e8b33fecee4 >> --- /dev/null >> +++ b/gdb/python/py-disasm.c >> @@ -0,0 +1,970 @@ >> +/* Python interface to instruction disassembly. >> + >> + Copyright (C) 2021-2022 Free Software Foundation, Inc. >> + >> + This file is part of GDB. >> + >> + This program is free software; you can redistribute it and/or modify >> + it under the terms of the GNU General Public License as published by >> + the Free Software Foundation; either version 3 of the License, or >> + (at your option) any later version. >> + >> + This program is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + GNU General Public License for more details. >> + >> + You should have received a copy of the GNU General Public License >> + along with this program. If not, see . */ >> + >> +#include "defs.h" >> +#include "python-internal.h" >> +#include "dis-asm.h" >> +#include "arch-utils.h" >> +#include "charset.h" >> +#include "disasm.h" >> +#include "progspace.h" >> + >> +/* Implement gdb.disassembler.DisassembleInfo type. An object of this type >> + represents a single disassembler request from GDB. */ >> + >> +struct disasm_info_object { > > { goes on next line. Doh. I wondered how I made this mistake. Turns out it's just copy 'n' paste from other bits of Python code. Anyway, I fixed all the ones in this file. > >> + PyObject_HEAD >> + >> + /* The architecture in which we are disassembling. */ >> + struct gdbarch *gdbarch; >> + >> + /* The program_space in which we are disassembling. */ >> + struct program_space *program_space; >> + >> + /* Address of the instruction to disassemble. */ >> + bfd_vma address; >> + >> + /* The disassemble_info passed from core GDB, this contains the >> + callbacks necessary to read the instruction from core GDB, and to >> + print the disassembled instruction. */ >> + disassemble_info *gdb_info; >> +}; >> + >> +extern PyTypeObject disasm_info_object_type >> + CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object"); >> + >> +/* Implement gdb.disassembler.DisassemblerResult type, an object that holds >> + the result of calling the disassembler. This is mostly the length of >> + the disassembled instruction (in bytes), and the string representing the >> + disassembled instruction. */ >> + >> +struct disasm_result_object { > > Here too. > >> +/* Implement gdb.disassembler.builtin_disassemble(). Calls back into GDB's >> + builtin disassembler. The first argument is a DisassembleInfo object >> + describing what to disassemble. The second argument is optional and >> + provides a mechanism to modify the memory contents that the builtin >> + disassembler will actually disassemble. >> + >> + Returns an instance of gdb.disassembler.DisassemblerResult, an object >> + that wraps a disassembled instruction, or it raises a >> + gdb.MemoryError. */ >> + >> +static PyObject * >> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw) >> +{ >> + PyObject *info_obj, *memory_source_obj = nullptr; >> + static const char *keywords[] = { "info", "memory_source", nullptr }; >> + if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords, >> + &disasm_info_object_type, &info_obj, >> + &memory_source_obj)) > > I'm wondering, why is there a separate memory_source parameter when info > already has a read_memory method, that could potentially be overriden by > the user? Here's how the API would be used right now: class MyDisassembler(Disassembler): def __init__(self, name): super().__init__(name) self.__info = None def __call__(self, info): self.__info = info result = gdb.disassembler.builtin_disassemble(info, self) def read_memory(self, length, offset): return self.__info.read_memory(length, offset) This is obviosly pretty pointless, the memory source just calls the standard read_memory routine so you'll get the same behaviour as if no memory source was passed at all, but it shows how the API works. If we wanted to override the DisassembleInfo.read_memory routine we'd do something like this: class MyInfo(DisassembleInfo): def __init__(self,old_info): super().__init__(old_info) def read_memory(self, length, offset): return super().read_memory(length, offset) class MyDisassembler(Disassembler): def __init__(self, name): super().__init__(name) def __call__(self, info): wrapped_info = MyInfo(info) result = gdb.disassembler.builtin_disassemble(wrapped_info) What are your thoughts on that? I think that would be pretty easy to implement if you feel its an improvement. > >> + return nullptr; >> + >> + disasm_info_object *disasm_info = (disasm_info_object *) info_obj; >> + if (!disasm_info_object_is_valid (disasm_info)) >> + { >> + PyErr_SetString (PyExc_RuntimeError, >> + _("DisassembleInfo is no longer valid.")); >> + return nullptr; >> + } >> + >> + /* A memory source is any object that provides the 'read_memory' >> + callback. At this point we only check for the existence of a >> + 'read_memory' attribute, if this isn't callable then we'll throw an >> + exception from within gdbpy_disassembler::read_memory_func. */ >> + if (memory_source_obj != nullptr) >> + { >> + if (!PyObject_HasAttrString (memory_source_obj, "read_memory")) >> + { >> + PyErr_SetString (PyExc_TypeError, >> + _("memory_source doesn't have a read_memory method")); >> + return nullptr; >> + } >> + } > > IMO we could maybe skip this check too. Python already produces a clear > enough exception message when trying to access an attribute that doesn't > exist, it mentions the name of the user type: > > In [4]: f = Foo() > > In [5]: f.read_memory() > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call last) > Input In [5], in () > ----> 1 f.read_memory() > > AttributeError: 'Foo' object has no attribute 'read_memory' Done. > >> + >> + /* Where the result will be written. */ >> + gdbpy_disassembler disassembler (disasm_info, memory_source_obj); >> + >> + /* Now actually perform the disassembly. */ >> + int length >> + = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address, >> + disassembler.disasm_info ()); >> + >> + if (length == -1) >> + { >> + >> + /* In an ideal world, every disassembler should always call the >> + memory error function before returning a status of -1 as the only >> + error a disassembler should encounter is a failure to read >> + memory. Unfortunately, there are some disassemblers who don't >> + follow this rule, and will return -1 without calling the memory >> + error function. >> + >> + To make the Python API simpler, we just classify everything as a >> + memory error, but the message has to be modified for the case >> + where the disassembler didn't call the memory error function. */ >> + if (disassembler.memory_error_address ().has_value ()) >> + { >> + CORE_ADDR addr = *disassembler.memory_error_address (); >> + disasmpy_set_memory_error_for_address (addr); >> + } >> + else >> + PyErr_Format (gdbpy_gdb_memory_error, "unknown disassembly error"); > > Is there a use case for other error kinds? For example, if the > disassembler finds that the machine code does not encode a valid > instruction, what should it do? > > Maybe this is more a question for the gdb.disassembler.Disassembler > implementation side of the API. No! Like the comment says, everything should disassemble to something, even if it's just ".byte xxxx". The libopcodes disassembler only supports reporting one type of error, that's memory error. It's just unfortunate libopcodes also uses the return value to indicate that an error occurred, and in some cases disassemblers return -1 (to indicate error) without setting a memory error. In these cases the disassembler has probably written to the output stream what went wrong, but this really is not how libopcodes is supposed to work. So, no. We either have a memory error, or an unknown error. Ideally, given enough time, libopcodes will be fixed so that we _only_ ever emit memory errors. > >> + return nullptr; >> + } >> + >> + /* Instructions are either non-zero in length, or we got an error, >> + indicated by a length of -1, which we handled above. */ >> + gdb_assert (length > 0); >> + >> + /* We should not have seen a memory error in this case. */ >> + gdb_assert (!disassembler.memory_error_address ().has_value ()); >> + >> + /* Create an object to represent the result of the disassembler. */ >> + gdbpy_ref res >> + (PyObject_New (disasm_result_object, &disasm_result_object_type)); >> + res->length = length; >> + res->content = new string_file; > > Since the DisassemblerResult object type has an __init__, maybe we > should call it? It would be a bit of boilerplate, calling __init__ > here, but it would avoid duplicating how we initialize a > disasm_result_object. Done. > >> + *(res->content) = disassembler.release (); >> + >> + return reinterpret_cast (res.release ()); >> +} >> + >> +/* Implement gdb.set_enabled function. Takes a boolean parameter, and > > gdb.disassembler._set_enabled? Fixed. > >> +/* This implements the disassemble_info read_memory_func callback. This >> + will either call the standard read memory function, or, if the user has >> + supplied a memory source (see disasmpy_builtin_disassemble) then this >> + will call back into Python to obtain the memory contents. >> + >> + Read LEN bytes from MEMADDR and place them into BUFF. Return 0 on >> + success (in which case BUFF has been filled), or -1 on error, in which >> + case the contents of BUFF are undefined. */ >> + >> +int >> +gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff, >> + unsigned int len, >> + struct disassemble_info *info) >> +{ >> + gdbpy_disassembler *dis >> + = static_cast (info->application_data); >> + disasm_info_object *obj = dis->py_disasm_info (); >> + PyObject *memory_source = dis->m_memory_source; >> + >> + /* The simple case, the user didn't pass a separate memory source, so we >> + just delegate to the standard disassemble_info read_memory_func, >> + passing in the original disassemble_info object, which core GDB might >> + require in order to read the instruction bytes (when reading the >> + instruction from a buffer). */ >> + if (memory_source == nullptr) >> + return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info); >> + >> + /* The user provided a separate memory source, we need to call the >> + read_memory method on the memory source and use the buffer it returns >> + as the bytes of memory. */ >> + LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address; >> + gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory", >> + "KL", len, offset)); >> + if (result_obj == nullptr) >> + { >> + /* If we got a gdb.MemoryError then we ignore this and just report >> + that the read failed to the caller. The caller is then >> + responsible for calling the memory_error_func if it wants to. >> + Remember, the disassembler might just be probing to see if these >> + bytes can be read, if we automatically call the memory error >> + function, we can end up registering an error prematurely. */ >> + if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error)) >> + PyErr_Clear (); >> + else >> + gdbpy_print_stack (); >> + return -1; >> + } >> + >> + /* Convert the result to a buffer. */ >> + Py_buffer py_buff; >> + if (!PyObject_CheckBuffer (result_obj.get ()) >> + || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0) >> + { >> + PyErr_Format (PyExc_TypeError, >> + _("Result from read_memory is not a buffer")); >> + gdbpy_print_stack (); >> + return -1; >> + } >> + >> + /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this >> + scope. */ >> + Py_buffer_up buffer_up (&py_buff); >> + >> + /* Validate that the buffer is the correct length. */ >> + if (py_buff.len != len) >> + { >> + PyErr_Format (PyExc_ValueError, >> + _("Result from read_memory is incorrectly sized buffer")); > > It's handy to tell the expected and actual sizes in these cases. Agreed. Done. > >> + gdbpy_print_stack (); >> + return -1; >> + } >> + >> + /* Copy the data out of the Python buffer and return succsess.*/ > > succsess -> success Fixed. > >> +/* A wrapper around a reference to a Python DisassembleInfo object, which >> + ensures that the object is marked as invalid when we leave the enclosing >> + scope. >> + >> + Each DisassembleInfo is created in gdbpy_print_insn, and is done with by >> + the time that function returns. However, there's nothing to stop a user >> + caching a reference to the DisassembleInfo, and thus keeping the object >> + around. >> + >> + We therefore have the notion of a DisassembleInfo becoming invalid, this >> + happens when gdbpy_print_insn returns. This class is responsible for >> + marking the DisassembleInfo as invalid in its destructor. */ >> + >> +struct scoped_disasm_info_object >> +{ >> + /* Constructor. */ >> + scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr, >> + disassemble_info *info) >> + : m_disasm_info (allocate_disasm_info_object ()) >> + { >> + m_disasm_info->address = memaddr; >> + m_disasm_info->gdb_info = info; >> + m_disasm_info->gdbarch = gdbarch; >> + m_disasm_info->program_space = current_program_space; >> + } >> + >> + /* Upon destruction mark m_diasm_info as invalid. */ >> + ~scoped_disasm_info_object () >> + { >> + m_disasm_info->gdb_info = nullptr; >> + } >> + >> + /* Return a pointer to the underlying disasm_info_object instance. */ >> + disasm_info_object * >> + get () const >> + { >> + return m_disasm_info.get (); >> + } >> + >> +private: >> + >> + /* Wrapper around the call to PyObject_New, this wrapper function can be >> + called from the constructor initialization list, while PyObject_New, a >> + macro, can't. */ >> + static disasm_info_object * >> + allocate_disasm_info_object () >> + { >> + return (disasm_info_object *) PyObject_New (disasm_info_object, >> + &disasm_info_object_type); >> + } > > This makes me think, is there a way for a Python user to call into the > disassembler? Should the DisassembleInfo object have a user-callable > constructor, should the user want to construct one? > > I could imagine you could do this out of nowhere: > > gdb.disassembler.builtin_disassemble(DisassembleInfo(addr, arch, progspace)) No! Don't do that. We already have gdb.Architecture.disassemble which provides access to the disassembler. You might feel that method is misplaced on Architecture (but that wasn't me!), but it is what it is. I think if you are writing some random piece of Python code then you should not be worrying about Python disassemblers vs builtin disassembler; you should just call gdb.Architecture.disassemble and let GDB invoke the "correct" disassembler for you. Preventing direct calls to gdb.disassembler.builtin_disassemble is one of the main reasons that I deliberately don't provide a user callable constructor for DisassembleInfo, during development I did have that method at one point, and removed it precisely to prevent the above! > > But that would skip the Python disassemblers, so a user could also want > to call this function that doesn't exist today: > > gdb.disassemble(DisassembleInfo(addr, arch, progspace)) I think having: gdb.disassemble(start_address, end_address, architecture, program_space) would be better than the current disassemble method on Architecture. Thinking about what that does I suspect that I might end up having to work on Architecture.disassemble at some point in the future, so I might add a top-level gdb.disassemble and make the existing architecture method forward to that one. We'll see. > >> + >> + /* A reference to a gdb.disassembler.DisassembleInfo object. When this >> + containing instance goes out of scope this reference is released, >> + however, the user might be holding other references to the >> + DisassembleInfo object in Python code, so the underlying object might >> + not be deleted. */ >> + gdbpy_ref m_disasm_info; >> +}; >> + >> +/* See python-internal.h. */ >> + >> +gdb::optional >> +gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr, >> + disassemble_info *info) >> +{ >> + /* Early exit case. This must be done as early as possible, and >> + definitely before we enter Python environment. The >> + python_print_insn_enabled flag is set (from Python) only when the user >> + has installed one (or more) Python disassemblers. So in the common >> + case (no custom disassembler installed) this flag will be false, >> + allowing for a quick return. */ >> + if (!gdb_python_initialized || !python_print_insn_enabled) >> + return {}; >> + >> + gdbpy_enter enter_py (get_current_arch (), current_language); >> + >> + /* The attribute we are going to lookup that provides the print_insn >> + functionality. */ >> + static const char *callback_name = "_print_insn"; >> + >> + /* Grab a reference to the gdb.disassembler module, and check it has the >> + attribute that we need. */ >> + gdbpy_ref<> gdb_python_disassembler_module >> + (PyImport_ImportModule ("gdb.disassembler")); >> + if (gdb_python_disassembler_module == nullptr >> + || !PyObject_HasAttrString (gdb_python_disassembler_module.get (), >> + callback_name)) >> + return {}; > > Since it's kind of expected that _print_insn is there, should this be a > gdb_assert? Just returning silently here makes it more difficult to > investigate problems, IMO. The only reason for the assert to trigger > would be if someone messed with the GDB Python modules, which I think is > ok. I've not gone with an assert, but I did rewrite this code so now the user will get an error if _print_insn is not present. I did that by removing the HasAttrString check here, and then... > >> + >> + /* Now grab the callback attribute from the module. */ >> + gdbpy_ref<> hook >> + (PyObject_GetAttrString (gdb_python_disassembler_module.get (), >> + callback_name)); >> + if (hook == nullptr) > > This can't be true, since you already checked with > PyObject_HasAttrString. ... this check is now useful, the GetAttrString will fail if _print_insn is not present, and the PyErr will be set. > >> + { >> + gdbpy_print_stack (); >> + return {}; >> + } >> + >> + /* Create the new DisassembleInfo object we will pass into Python. This >> + object will be marked as invalid when we leave this scope. */ >> + scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info); >> + disasm_info_object *disasm_info = scoped_disasm_info.get (); >> + >> + /* Call into the registered disassembler to (possibly) perform the >> + disassembly. */ >> + PyObject *insn_disas_obj = (PyObject *) disasm_info; >> + gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (), >> + insn_disas_obj, >> + nullptr)); >> + >> + if (result == nullptr) >> + { >> + /* The call into Python code resulted in an exception. If this was a >> + gdb.MemoryError, then we can figure out an address and call the >> + disassemble_info::memory_error_func to report the error back to >> + core GDB. Any other exception type we assume means a bug in the >> + user's code, and print stack. */ >> + >> + if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error)) >> + { >> + /* A gdb.MemoryError might have an address attribute which >> + contains the address at which the memory error occurred. If >> + this is the case then use this address, otherwise, fallback to >> + just using the address of the instruction we were asked to >> + disassemble. */ >> + PyObject *error_type, *error_value, *error_traceback; >> + CORE_ADDR addr; >> + >> + PyErr_Fetch (&error_type, &error_value, &error_traceback); >> + >> + if (error_value != nullptr >> + && PyObject_HasAttrString (error_value, "address")) >> + { >> + PyObject *addr_obj = PyObject_GetAttrString (error_value, >> + "address"); >> + if (get_addr_from_python (addr_obj, &addr) < 0) >> + addr = disasm_info->address; >> + } >> + else >> + addr = disasm_info->address; >> + >> + PyErr_Clear (); >> + info->memory_error_func (-1, addr, info); >> + return gdb::optional (-1); >> + } >> + else >> + { >> + /* Anything that is not gdb.MemoryError. */ >> + gdbpy_print_stack (); >> + return {}; >> + } >> + } >> + else if (result == Py_None) >> + { >> + /* A return value of None indicates that the Python code could not, >> + or doesn't want to, disassemble this instruction. Just return an >> + empty result and core GDB will try to disassemble this for us. */ >> + return {}; >> + } >> + >> + /* The call into Python neither raised an exception, or returned None. >> + Check to see if the result looks valid. */ >> + gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length")); >> + if (length_obj == nullptr) >> + { >> + gdbpy_print_stack (); >> + return {}; >> + } >> + >> + gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string")); >> + if (string_obj == nullptr) >> + { >> + gdbpy_print_stack (); >> + return {}; >> + } >> + if (!gdbpy_is_string (string_obj.get ())) >> + { >> + PyErr_SetString (PyExc_TypeError, _("string attribute is not a string.")); >> + gdbpy_print_stack (); >> + return {}; >> + } >> + >> + gdb::unique_xmalloc_ptr string >> + = gdbpy_obj_to_string (string_obj.get ()); >> + if (string == nullptr) >> + { >> + gdbpy_print_stack (); >> + return {}; >> + } >> + >> + long length; >> + if (!gdb_py_int_as_long (length_obj.get (), &length)) >> + { >> + gdbpy_print_stack (); >> + return {}; >> + } >> + >> + long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ? >> + gdbarch_max_insn_length (gdbarch) : INT_MAX); >> + if (length <= 0 || length > max_insn_length) >> + { >> + PyErr_SetString (PyExc_ValueError, _("Invalid length attribute.")); > > IWBN to help the user here, say why it is invalid: the length attribute > value X exceeds the architecture's max insn length of Y. Done. > >> + gdbpy_print_stack (); >> + return {}; >> + } >> + >> + if (strlen (string.get ()) == 0) >> + { >> + PyErr_SetString (PyExc_ValueError, _("string attribute must not be empty.")); >> + gdbpy_print_stack (); >> + return {}; >> + } >> + >> + /* Print the disassembled instruction back to core GDB, and return the >> + length of the disassembled instruction. */ >> + info->fprintf_func (info->stream, "%s", string.get ()); >> + return gdb::optional (length); >> +} >> + >> +/* The tp_dealloc callback for the DisassemblerResult type. Takes care of >> + deallocating the content buffer. */ >> + >> +static void >> +disasmpy_dealloc_result (PyObject *self) >> +{ >> + disasm_result_object *obj = (disasm_result_object *) self; >> + delete obj->content; >> + Py_TYPE (self)->tp_free (self); >> +} >> + >> +/* The get/set attributes of the gdb.disassembler.DisassembleInfo type. */ >> + >> +static gdb_PyGetSetDef disasm_info_object_getset[] = { >> + { "address", disasmpy_info_address, nullptr, >> + "Start address of the instruction to disassemble.", nullptr }, >> + { "architecture", disasmpy_info_architecture, nullptr, >> + "Architecture to disassemble in", nullptr }, >> + { "progspace", disasmpy_info_progspace, nullptr, >> + "Program space to disassemble in", nullptr }, >> + { nullptr } /* Sentinel */ >> +}; >> + >> +/* The methods of the gdb.disassembler.DisassembleInfo type. */ >> + >> +static PyMethodDef disasm_info_object_methods[] = { >> + { "read_memory", (PyCFunction) disasmpy_info_read_memory, >> + METH_VARARGS | METH_KEYWORDS, >> + "read_memory (LEN, OFFSET = 0) -> Octets[]\n\ >> +Read LEN octets for the instruction to disassemble." }, >> + { "is_valid", disasmpy_info_is_valid, METH_NOARGS, >> + "is_valid () -> Boolean.\n\ >> +Return true if this DisassembleInfo is valid, false if not." }, >> + {nullptr} /* Sentinel */ >> +}; >> + >> +/* The get/set attributes of the gdb.disassembler.DisassemblerResult type. */ >> + >> +static gdb_PyGetSetDef disasm_result_object_getset[] = { >> + { "length", disasmpy_result_length, nullptr, >> + "Length of the disassembled instruction.", nullptr }, >> + { "string", disasmpy_result_string, nullptr, >> + "String representing the disassembled instruction.", nullptr }, >> + { nullptr } /* Sentinel */ >> +}; >> + >> +/* These are the methods we add into the _gdb.disassembler module, which >> + are then imported into the gdb.disassembler module. These are global >> + functions that support performing disassembly. */ >> + >> +PyMethodDef python_disassembler_methods[] = >> +{ >> + { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble, >> + METH_VARARGS | METH_KEYWORDS, >> + "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\ >> +Disassemble using GDB's builtin disassembler. INFO is an instance of\n\ >> +gdb.disassembler.DisassembleInfo. The MEMORY_SOURCE, if not None, should\n\ >> +be an object with the read_memory method." }, >> + { "_set_enabled", (PyCFunction) disasmpy_set_enabled, >> + METH_VARARGS | METH_KEYWORDS, >> + "_set_enabled (STATE) -> None\n\ >> +Set whether GDB should call into the Python _print_insn code or not." }, >> + {nullptr, nullptr, 0, nullptr} >> +}; >> + >> +/* Structure to define the _gdb.disassembler module. */ >> + >> +static struct PyModuleDef python_disassembler_module_def = >> +{ >> + PyModuleDef_HEAD_INIT, >> + "_gdb.disassembler", >> + nullptr, >> + -1, >> + python_disassembler_methods, >> + nullptr, >> + nullptr, >> + nullptr, >> + nullptr >> +}; >> + >> +/* Called to initialize the Python structures in this file. */ >> + >> +int >> +gdbpy_initialize_disasm >> +(void) > > Parenthesis on the previous line, and remove void? What on earth was I thinking here?? Fixed. I'm still adding some additional tests for the things I've updated in this patch, but I wanted to reach out to get your feedback on the memory-source question. Thanks, Andrew