From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by sourceware.org (Postfix) with ESMTPS id 0DC593858437 for ; Wed, 13 Oct 2021 21:59:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0DC593858437 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-wr1-x42a.google.com with SMTP id o20so13053964wro.3 for ; Wed, 13 Oct 2021 14:59:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=MaBTjOJvMnHYwzT50h8Cm6a/rz9JQeKDHdaJQUJa6YE=; b=GVZfW0kn2slClCA8d6S9rlvGXhKTgWzyh6sGcMYwd3FvIlj7rT8W5QXvfjvCObUZt0 NkI41hFKanbYEb5sg+w+MkSZPRetXx6fBuiTLAdqPnl6OJxZdeRpdRNEn/qprpW/ONWc 03/QuQUBKuxV+DUSwaHbTGkPtC7INHf5hpoupbdgH9JdgZsBtaCxTCe65f+EmSoMT0vF 11jPSDm3tCCsAmnf5d02sTj6AJxWQXShQr4BEyO+K3Ee3SnW/tzYxUflKlyFcQ6IhDm2 hpDvflvBNQBn+4kOT0W6rCB7nCoSpTJjmo6d7OZVowcn8RnZJn5jZqYNgSv5vqcX8iaq UAzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=MaBTjOJvMnHYwzT50h8Cm6a/rz9JQeKDHdaJQUJa6YE=; b=wqJjGq0XhTk88ixvnMe7o8eFyq6NC6ciQH87vMi3oKtQMDozTZuzF01e5WocARBspO FhUvZHWzrI/iWhjRJ7XdbkpBEioq6TMAT2y5YxbcQgQwDaSzBtuiH8FVXpN6tQBmQOIa 7fLQpd3NpkcYXHogybsrj0wxO5zqJiugpQn6u4UVnQkbZM4Ten/pYZ1F/S0yrSlK26id t1wHtGOblhFZRn9hMs8DSFgzT3r56qvXV1MHjUpcVYT6Ia90B5VwKd5cGECigaivz2Oz NLcBOynFBD1Nqm00thDEhyjdX+EU3nkv0AeTK6FKnbsyl+o85CVOIva1mIIH/TC5Diyz G3hQ== X-Gm-Message-State: AOAM530hgf97DXIR35qbQnGI7DfFrmF4gOsZDcRlQQoP4B7jC76FmeCS fY6yHTsKK6jfWSGYJwt6aAagOgH9ZITydQ== X-Google-Smtp-Source: ABdhPJxUN0pW1+f/A1ntt1aVxozap3u2fqw2dtajzHbWOFC6d28sej+F7aPzwqveguHPJhgrL1FVCg== X-Received: by 2002:adf:bb8d:: with SMTP id q13mr2050300wrg.327.1634162359739; Wed, 13 Oct 2021 14:59:19 -0700 (PDT) Received: from localhost (host212-140-123-151.range212-140.btcentralplus.com. [212.140.123.151]) by smtp.gmail.com with ESMTPSA id v191sm532254wme.36.2021.10.13.14.59.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Oct 2021 14:59:19 -0700 (PDT) From: Andrew Burgess To: gdb-patches@sourceware.org Subject: [PATCH 5/5] gdb/python: implement the print_insn extension language hook Date: Wed, 13 Oct 2021 22:59:10 +0100 Message-Id: X-Mailer: git-send-email 2.25.4 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, WEIRD_QUOTING autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Oct 2021 21:59:29 -0000 This commit extends the Python API to include disassembler support, and additionally provides a syntax highlighting disassembler. The motivation for this commit was to provide an API by which the user could write Python scripts that would augment the output of the disassembler. To achieve this I have followed the model of the existing libopcodes disassembler, that is, instructions are disassembled one by one. This does restrict the type of things that it is possible to do from a Python script, i.e. all additional output has to fit on a single line, but this was all I needed, and creating something more complex would, I think, require greater changes to how GDB's internal disassembler operates. It was only once I had a working prototype that I realised I could very easily use this to perform syntax highlighting on GDB's disassembly output, so I've included that too. The new commands added are: set style disassembly on|off show style disassembly which enable or disable disassembly syntax highlighting. The disassembler API is contained in the new gdb.disassembler module, which defines the following classes: DisassembleInfo Similar to libopcodes disassemble_info structure, has read-only attributes: address, string, length, architecture, and can_emit_style_escape. And has methods: read_memory, set_result, and memory_error. Each time GDB wants an instruction disassembled, an instance of this class is passed to a user written disassembler, by reading the attributes, and calling the methods, the user can perform disassembly, and set the result within the DisassembleInfo instance. Disassembler This is a base-class which user written disassemblers should inherit from, just provides base implementations of __init__ and __call__ which the user written disassembler should override. The gdb.disassembler module also provides the following functions: register_disassembler This function registers an instance of a Disassembler sub-class as a disassembler, either for one specific architecture, or, as a global disassembler for all architectures. format_address This wraps GDB's print_address function, converting an address into a string that can be placed into disassembler output. syntax_highlight This adds syntax highlighting escapes to some disassembler output, users can call this from their own custom disassemblers to retain syntax highlighting, this function handles switching syntax highlighting off, or the case where the pygments library is not available. builtin_disassemble This provides access to GDB's builtin disassembler. A common user case that I see is augmenting the existing disassembler output. The user code can call this function to have GDB disassemble the instruction in the normal way, and then the user can tweak the output before returning that as the result. This function also provides a mechanism to intercept the disassemblers reads of memory, thus the user can adjust what GDB sees when it is disassembling. The included documentation provides a more detailed description of the API. --- gdb/Makefile.in | 1 + gdb/NEWS | 42 ++ gdb/data-directory/Makefile.in | 1 + gdb/disasm.c | 5 +- gdb/disasm.h | 13 +- gdb/doc/gdb.texinfo | 14 + gdb/doc/python.texi | 252 +++++++ gdb/python/lib/gdb/disassembler.py | 194 ++++++ gdb/python/py-arch.c | 9 + gdb/python/py-disasm.c | 905 +++++++++++++++++++++++++ gdb/python/python-internal.h | 21 + gdb/python/python.c | 11 +- gdb/testsuite/gdb.base/style.exp | 45 +- gdb/testsuite/gdb.python/py-disasm.c | 25 + gdb/testsuite/gdb.python/py-disasm.exp | 201 ++++++ gdb/testsuite/gdb.python/py-disasm.py | 538 +++++++++++++++ 16 files changed, 2267 insertions(+), 10 deletions(-) create mode 100644 gdb/python/lib/gdb/disassembler.py create mode 100644 gdb/python/py-disasm.c create mode 100644 gdb/testsuite/gdb.python/py-disasm.c create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp create mode 100644 gdb/testsuite/gdb.python/py-disasm.py diff --git a/gdb/Makefile.in b/gdb/Makefile.in index ec5d332c145..3981cc9507c 100644 --- a/gdb/Makefile.in +++ b/gdb/Makefile.in @@ -392,6 +392,7 @@ SUBDIR_PYTHON_SRCS = \ python/py-breakpoint.c \ python/py-cmd.c \ python/py-continueevent.c \ + python/py-disasm.c \ python/py-event.c \ python/py-evtregistry.c \ python/py-evts.c \ diff --git a/gdb/NEWS b/gdb/NEWS index d001a03145d..fd1952a2f59 100644 --- a/gdb/NEWS +++ b/gdb/NEWS @@ -32,6 +32,12 @@ maint show internal-warning backtrace internal-error, or an internal-warning. This is on by default for internal-error and off by default for internal-warning. +set style disassembly on|off +show style disassembly + If GDB is compiled with Python support, and the Python pygments + module is available, then, when this setting is on, disassembler + output will have styling applied. + * Python API ** New function gdb.add_history(), which takes a gdb.Value object @@ -49,6 +55,42 @@ maint show internal-warning backtrace containing all of the possible Architecture.name() values. Each entry is a string. + ** New Python API for wrapping GDB's disassembler: + + - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH). + DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler. + ARCH is either None or a string containing a bfd architecture + name. DISASSEMBLER is registered as a disassembler for + architecture ARCH, or for all architectures if ARCH is None. + The previous disassembler registered for ARCH is returned, this + can be None if no previous disassembler was registered. + + - gdb.disassembler.Disassembler is the class from which all + disassemblers should inherit. Its constructor takes a string, + a name for the disassembler, which is currently only used is + some debug output. Sub-classes should override the __call__ + method to perform disassembly, invoking __call__ on this base + class will raise an exception. + + - gdb.disassembler.DisassembleInfo is the class used to describe + a single disassembly request from GDB. An instace of this + class is passed to the __call__ method of + gdb.disassembler.Disassembler and has the following read-only + attributes: 'address', 'string', 'length', 'architecture', + 'can_emit_style_escape', and the following methods + 'read_memory', 'set_result', and 'memory error'. + + - gdb.disassembler.format_address(ARCHITECTURE, ADDRESS), formats + an address into a string so that the string can be included in + the disassembler output. ARCHITECTURE is a gdb.Architecture + object. + + - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE), + calls GDB's builtin disassembler on INFO, which is a + gdb.disassembler.DisassembleInfo object. MEMORY_SOURCE is + optional, its default value is None. If MEMORY_SOURCE is not + None then it must be an object that has a 'read_memory' method. + *** Changes in GDB 11 * The 'set disassembler-options' command now supports specifying options diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in index 888325f974e..775516a53cc 100644 --- a/gdb/data-directory/Makefile.in +++ b/gdb/data-directory/Makefile.in @@ -69,6 +69,7 @@ PYTHON_DIR = python PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR) PYTHON_FILE_LIST = \ gdb/__init__.py \ + gdb/disassembler.py \ gdb/FrameDecorator.py \ gdb/FrameIterator.py \ gdb/frames.py \ diff --git a/gdb/disasm.c b/gdb/disasm.c index 0c384c778f5..3a0a11ec3bb 100644 --- a/gdb/disasm.c +++ b/gdb/disasm.c @@ -752,12 +752,13 @@ get_all_disassembler_options (struct gdbarch *gdbarch) gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file, - di_read_memory_ftype read_memory_func) + di_read_memory_ftype read_memory_func, + di_memory_error_ftype memory_error_func) : m_gdbarch (gdbarch) { init_disassemble_info (&m_di, file, dis_asm_fprintf); m_di.flavour = bfd_target_unknown_flavour; - m_di.memory_error_func = dis_asm_memory_error; + m_di.memory_error_func = memory_error_func; m_di.print_address_func = dis_asm_print_address; /* NOTE: cagney/2003-04-28: The original code, from the old Insight disassembler had a local optimization here. By default it would diff --git a/gdb/disasm.h b/gdb/disasm.h index f6de33e3db8..eca116c98f8 100644 --- a/gdb/disasm.h +++ b/gdb/disasm.h @@ -41,6 +41,7 @@ struct ui_file; class gdb_disassembler { using di_read_memory_ftype = decltype (disassemble_info::read_memory_func); + using di_memory_error_ftype = decltype (disassemble_info::memory_error_func); public: gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file) @@ -59,11 +60,21 @@ class gdb_disassembler protected: gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file, - di_read_memory_ftype func); + di_read_memory_ftype read_memory_func) + : gdb_disassembler (gdbarch, file, read_memory_func, + dis_asm_memory_error) + { /* Nothing. */ } + + gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file, + di_read_memory_ftype read_memory_func, + di_memory_error_ftype memory_error_func); struct ui_file *stream () { return (struct ui_file *) m_di.stream; } + struct disassemble_info *disasm_info () + { return &m_di; } + private: struct gdbarch *m_gdbarch; diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo index 631a7c03b31..9af415cc018 100644 --- a/gdb/doc/gdb.texinfo +++ b/gdb/doc/gdb.texinfo @@ -26071,6 +26071,20 @@ @item show style sources Show the current state of source code styling. + +@item set style disassembly @samp{on|off} +Enable or disable disassembly styling. This affects whether +disassembly output, such as the output of the @code{disassemble} +command, is styled. The default is @samp{on}. Note that disassembly +styling only works if styling in general is enabled, and if a source +highlighting library is available to @value{GDBN}. + +To highlight disassembly output @value{GDBN} must be compiled with +Python support, and the Python Pygments package must be available, + +@item show style disassembly +Show the current state of disassembly styling. + @end table Subcommands of @code{set style} control specific forms of styling. diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi index 04192f906c8..808934aea73 100644 --- a/gdb/doc/python.texi +++ b/gdb/doc/python.texi @@ -221,6 +221,7 @@ * Architectures In Python:: Python representation of architectures. * Registers In Python:: Python representation of registers. * TUI Windows In Python:: Implementing new TUI windows. +* Disassembly In Python:: Instruction Disassembly In Python @end menu @node Basic Python @@ -557,6 +558,7 @@ related prompts are prohibited from being changed. @end defun +@anchor{gdb_architecture_names} @defun gdb.architecture_names () Return a list containing all of the architecture names that the current build of @value{GDBN} supports. Each architecture name is a @@ -3136,6 +3138,7 @@ particular frame (@pxref{Frames In Python}). @end defun +@anchor{gdbpy_inferior_read_memory} @findex Inferior.read_memory @defun Inferior.read_memory (address, length) Read @var{length} addressable memory units from the inferior, starting at @@ -6075,6 +6078,255 @@ 2 (middle), or 3 (right). @end defun +@node Disassembly In Python +@cindex Python Instruction Disassembly +@subsubsection Instruction Disassembly In Python + +@value{GDBN}'s builtin disassembler can be extended, or even replaced, +using the Python API. The disassembler related features are contained +within the @code{gdb.disassembler} module: + +@deftp {class} DisassembleInfo +Disassembly is driven by instances of this class. Each time +@value{GDBN} needs to disassemble an instruction, an instance of this +class is created and passed to a registered disassembler. The +disassembler is then responsible for disassembling an instruction and +storing the result within the instance of this class. The following +attributes and methods are available: + +@defivar DisassembleInfo address +An integer containing the address at which @value{GDBN} wishes to +disassemble a single instruction. +@end defivar + +@defivar DisassembleInfo string +A string that is the result of the disassembly. If no result has yet +been set then this field contains @code{None}. +@end defivar + +@defivar DisassembleInfo length +An integer that is the length of the disassembled instruction in +bytes, or @code{None} if no result has yet been set for this +instruction. + +When a result has been set then the length will always be a non-zero +positive integer. +@end defivar + +@defivar DisassembleInfo architecture +The @code{gdb.Architecture} (@pxref{Architectures In Python}) for +which @value{GDBN} is currently disassembling. +@end defivar + +@defivar DisassembleInfo can_emit_style_escapes +This is @code{True} if the output stream that the disassembler is +currently printing too can support escape sequences use for colors, +otherwise this attribute is @code{False}. +@end defivar + +@defmethod DisassembleInfo read_memory (length, offset) +This method allows the disassembler to read the bytes of the +instruction to be disassembled. The method reads @var{length} bytes, +starting at @var{offset} from @code{DisassembleInfo.address}. + +It is important that the disassembler read the instruction bytes using +this method, rather than reading inferior memory directly, as in some +cases @value{GDBN} disassembles from an internal buffer rather than +directly from inferior memory. + +Returns a buffer object, which behaves much like an array or a string, +just as @code{Inferior.read_memory} does +(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}). +@end defmethod + +@defmethod DisassembleInfo set_result (length, string) +This method is used to set the result after an instruction has +successfully been disassembled. The @var{length} is the length in +bytes of the instruction, and @var{string} is the text that should be +displayed for the disassembled output. + +The @var{length} must be greater than zero, and @var{string} must be a +non-empty string. + +It is valid to call this method multiple times during the disassembly +of a single instruction, each call replaces the previous result. In +this way it is possible to extend the output of a previous +disassembler. + +If @code{DisassembleInfo.memory_error} has previously been called, +then calling @code{DisassembleInfo.set_result} clears the memory error +from this @code{DisassembleInfo}. +@end defmethod + +@defmethod DisassembleInfo memory_error (offset) +This method marks the @code{DisassembleInfo} as having experienced a +@code{gdb.MemoryError} when trying to access memory of @var{offset} +bytes from @code{DisassembleInfo.address}. + +It is valid to call @code{DisassembleInfo.memory_error} multiple times +for a single instruction disassembly, but only the first memory error +is recorded. + +If @code{DisassembleInfo.set_result} has already been called, then any +result is discarded when @code{DisassembleInfo.memory_error} is +called. +@end defmethod +@end deftp + +@deftp {class} Disassembler +This is a base class from which all user implemented disassemblers +must inherit. + +@defmethod Disassembler __init__ (name) +The constructor takes @var{name}, a string, which should be a short +name for this disassembler. Currently, this name is only used in some +debug output. +@end defmethod + +@defmethod Disassembler __call__ (info) +The @code{__call__} method must be overridden by sub-classes to +perform disassembly. Calling @code{__call__} on this base class will +raise a @code{NotImplementedError} exception. + +The @var{info} argument is an instance of @code{DisassembleInfo}, and +describes the instruction that @value{GDBN} wants disassembling. + +This function must return @code{None}. If this function raises a +@code{gdb.MemoryError} exception then @value{GDBN} will ignore the +exception and fallback to using its builtin disassembler. Raising any +other exception is an error. +@end defmethod +@end deftp + +@defun register_disassembler (disassembler, architecture) +The @var{disassembler} must be a sub-class of @code{Disassembler}. + +The optional @var{architecture} is either a string, or the value +@code{None}. If it is a string, then it should be the name of an +architecture known to @value{GDBN}, as returned either from +@code{gdb.Architecture.name()} +(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from +@code{gdb.architecture_names()} +(@pxref{gdb_architecture_names,,gdb.architecture_names}). + +The @var{disassembler} will be installed for the architecture named by +@var{architecture}, or if @var{architecture} is @code{None}, then +@var{disassembler} will be installed as a global disassembler for use +by all architectures. + +@value{GDBN} only records a single disassembler for each architecture, +and a single global disassembler. Calling +@code{register_disassembler} for an architecture, or for the global +disassembler, will replace any existing disassembler registered for +that @var{architecture} value. The previous disassembler is returned. + +When @value{GDBN} is looking for a disassembler to use, @value{GDBN} +first looks for an architecture specific disassembler. If none has +been registered then @value{GDBN} looks for a global disassembler (one +registered with @var{architecture} set to @code{None}). Only one +disassembler is called to perform disassembly, so, if there is both an +architecture specific disassembler, and a global disassembler +registered, it is the architecture specific disassembler that will be +used. + +@value{GDBN} tracks the architecture specific, and global +disassemblers separately, so it doesn't matter in which order +disassemblers are created or registed, an architecture specific +disassembler, if present, will always be used before a global +disassembler. +@end defun + +@defun format_address (architecture, address) +Returns @var{address} formatted as a string, in a style suitable for +including in the disassembly output of an instruction, for example a +formatted address might look like: + +@smallexample +0x00001042 +@end smallexample + +@var{architecture} is a @code{gdb.Architecture} (@pxref{Architectures +In Python}), which is required to format the addresses correctly. +This can be obtained from @code{DisassembleInfo.architecture}. +@end defun + +@defun syntax_highlight (info) +This function can be used to apply syntax highlighting to the result +already held within @var{info}, a @code{DisassembleInfo}. + +After calling this function the result in @var{info} @emph{might} have +been updated to include syntax highlighting escape sequences. If +syntax highlighting is disabled in @value{GDBN}, or the output stream +doesn't support syntax highlighting, then this function will leave +@var{info} unchanged. + +If @var{info} doesn't have a result set when this function is called +then @var{info} will not be modified. + +This function returns @code{None}. +@end defun + +@defun builtin_disassemble (info, memory_source) +This function calls back into @value{GDBN}'s builtin disassembler to +disassemble the instruction identified by @var{info}, an instance of +@code{DisassembleInfo}. + +After calling this function, if the instruction disassembled +successfully, then @var{info} will have been updated as though +@code{Disassemble.set_result} had been called. The results of the +builtin disassembler can be examined by reading +@code{DisassembleInfo.length} and @code{DisassembleInfo.string}. + +If the builtin disassembler fails then this function will raise a +@code{gdb.MemoryError} exception. + +The optional @var{memory_source} argument has the default value of +@code{None}, in which case, the builtin disassembler will read the +instruction from memory in the normal way. + +If @var{memory_source} is not @code{None}, then it should be an +instance of a class that implements the following method: + +@defmethod memory_source read_memory (length, offset) +This method will be called by the builtin disassembler to fetch bytes +of the instruction being disassembled. @var{length} is the number of +bytes to fetch, and @var{offset} is the offset from the address of the +instruction being disassembled, this address is obtained from +@code{DisassembleInfo.address}. + +This function should return a Python object that supports the buffer +protocol, i.e. a string, an array, or the object returned from +@code{DisassembleInfo.read_memory}. + +The length of the returned buffer @emph{must} be @var{length} +otherwise a @code{ValueError} exception will be raised. + +Alternatively, this function can raise a @code{gdb.MemoryError} +exception to indicate that the read failed, raising any other +exception type is an error. +@end defmethod +@end defun + +Here is an example that registers a global disassembler. The new +disassembler invokes the builtin disassembler, and then adds a +comment, @code{## Comment}, to each line of disassembly output, before +finally applying syntax highlighting to the result: + +@smallexample +class ExampleDisassembler(gdb.disassembler.Disassembler): + def __init__(self): + super(ExampleDisassembler, self).__init__("ExampleDisassembler") + + def __call__(self, info): + gdb.disassembler.builtin_disassemble(info) + if info.string is not None: + tmp = info.string + "\t## Comment" + info.set_result(info.length, tmp) + gdb.disassembler.syntax_highlight(info) + +gdb.disassembler.register_disassembler(ExampleDisassembler()) +@end smallexample + @node Python Auto-loading @subsection Python Auto-loading @cindex Python auto-loading diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py new file mode 100644 index 00000000000..9cf247a89e7 --- /dev/null +++ b/gdb/python/lib/gdb/disassembler.py @@ -0,0 +1,194 @@ +# Copyright (C) 2021 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +"""Disassembler related module.""" + +import gdb +import _gdb.disassembler + +from _gdb.disassembler import * + +# Module global dictionary of gdb.disassembler.Disassembler objects. +# The keys of this dictionary are bfd architecture names, or the +# special value None. +# +# When a request to disassemble comes in we first lookup the bfd +# architecture name from the gdbarch, if that name exists in this +# dictionary then we use that Disassembler object. +# +# If there's no architecture specific disassembler then we look for +# the key None in this dictionary, and if that key exists, we use that +# disassembler. +_disassembly_registry = {} + +# Module global callback. This is the entry point that GDB calls, but +# only if this is a callable thing. +# +# Initially we set this to None, so GDB will not try to call into any +# Python code. +# +# When Python disassemblers are registered into _disassembly_registry +# then this will be set to something callable. +_print_insn = None + + +class Disassembler(object): + """A base class from which all user implemented disassemblers must + inherit.""" + + def __init__(self, name): + """Constructor. Takes a name, which should be a string, which can be + used to identify this disassembler in diagnostic messages.""" + self.name = name + + def __call__(self, info): + """A default implementation of __call__. All sub-classes must + override this method. Calling this default implementation will throw + a NotImplementedError exception.""" + raise NotImplementedError("Disassembler.__call__") + + +def register_disassembler(disassembler, architecture=None): + """Register a disassembler. DISASSEMBLER is a sub-class of + gdb.disassembler.Disassembler. ARCHITECTURE is either None or a + string, the name of an architecture known to GDB. + + DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or + all architectures when ARCHITECTURE is None. + + Returns the previous disassembler registered with this + ARCHITECTURE value. + """ + + if not isinstance(disassembler, Disassembler) and disassembler is not None: + raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler") + + old = None + if architecture in _disassembly_registry: + old = _disassembly_registry[architecture] + del _disassembly_registry[architecture] + if disassembler is not None: + _disassembly_registry[architecture] = disassembler + + global _print_insn + if len(_disassembly_registry) > 0: + _print_insn = _perform_disassembly + else: + _print_insn = None + + return old + + +def _lookup_disassembler(arch): + try: + name = arch.name() + if name is None: + return None + if name in _disassembly_registry: + return _disassembly_registry[name] + if None in _disassembly_registry: + return _disassembly_registry[None] + return None + except: + return None + + +def _perform_disassembly(info): + disassembler = _lookup_disassembler(info.architecture) + if disassembler is None: + return None + return disassembler(info) + + +class StyleDisassembly(gdb.Parameter): + def __init__(self): + super(StyleDisassembly, self).__init__( + "style disassembly", gdb.COMMAND_NONE, gdb.PARAM_BOOLEAN + ) + self.value = True + self._pygments_module_available = True + + def get_show_string(self, sval): + return 'Disassembly styling is "%s".' % sval + + def get_set_string(self): + if not self._pygments_module_available and self.value: + self.value = False + return "Python pygments module is not available" + return "" + + def failed_to_load_pygments(self): + self.value = False + self._pygments_module_available = False + + def __bool__(self): + return self.value + + def __nonzero__(self): + if self.value: + return 1 + else: + return 0 + + +style_disassembly_param = StyleDisassembly() + +try: + from pygments import formatters, lexers, highlight + + _lexer = lexers.get_lexer_by_name("asm") + _formatter = formatters.TerminalFormatter() + + def syntax_highlight(info): + # If we should not be performing syntax highlighting, or if + # INFO does not hold a result, then there's nothing to do. + if ( + not gdb.parameter("style enabled") + or not style_disassembly_param + or not info.can_emit_style_escape + or info.string is None + ): + return + # Now apply the highlighting, and update the result. + str = highlight(info.string, _lexer, _formatter) + info.set_result(info.length, str.strip()) + + class _SyntaxHighlightingDisassembler(Disassembler): + """A syntax highlighting disassembler.""" + + def __init__(self, name): + """Constructor.""" + super(_SyntaxHighlightingDisassembler, self).__init__(name) + + def __call__(self, info): + """Invoke the builtin disassembler, and syntax highlight the result.""" + gdb.disassembler.builtin_disassemble(info) + gdb.disassembler.syntax_highlight(info) + + register_disassembler( + _SyntaxHighlightingDisassembler("syntax_highlighting_disassembler") + ) + +except: + + # Update the 'set/show style disassembly' parameter now we know + # that the pygments module can't be loaded. + style_disassembly_param.failed_to_load_pygments() + + def syntax_highlight(info): + # An implementation of syntax_highlight that can safely be + # called event when syntax highlighting is not available. + # This just returns, leaving INFO unmodified. + return diff --git a/gdb/python/py-arch.c b/gdb/python/py-arch.c index 3e7970ab764..1855f3daab3 100644 --- a/gdb/python/py-arch.c +++ b/gdb/python/py-arch.c @@ -72,6 +72,15 @@ arch_object_to_gdbarch (PyObject *obj) return py_arch->gdbarch; } +/* See python-internal.h. */ + +bool +gdbpy_is_arch_object (PyObject *obj) +{ + gdb_assert (obj != nullptr); + return PyObject_TypeCheck (obj, &arch_object_type); +} + /* Returns the Python architecture object corresponding to GDBARCH. Returns a new reference to the arch_object associated as data with GDBARCH. */ diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c new file mode 100644 index 00000000000..3327e532270 --- /dev/null +++ b/gdb/python/py-disasm.c @@ -0,0 +1,905 @@ +/* Python interface to instruction disassembly. + + Copyright (C) 2008-2021 Free Software Foundation, Inc. + + This file is part of GDB. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include "defs.h" +#include "python-internal.h" +#include "dis-asm.h" +#include "arch-utils.h" +#include "charset.h" +#include "disasm.h" + +/* Implement gdb.disassembler.DisassembleInfo type. An object of this type + represents a single disassembler request from GDB. */ + +struct disasm_info_object { + PyObject_HEAD + + /* The architecture in which we are disassembling. */ + struct gdbarch *gdbarch; + + /* Address of the instruction to disassemble. */ + bfd_vma address; + + disassemble_info *gdb_info; + disassemble_info *py_info; + + /* The length of the disassembled instruction, a value of -1 indicates + that there is no disassembly result set, otherwise, this should be a + value greater than zero. */ + int length; + + /* A string buffer containing the disassembled instruction. This is + initially nullptr, and is allocated when needed. It is possible that + the length field (above) can be -1, but this buffer is still + allocated, this happens if the user first sets a result, and then + marks a memory error. In this case any value in CONTENT should be + ignored. */ + string_file *content; + + /* When the user indicates that a memory error has occurred then this + field is set to true, it is false by default. */ + bool memory_error_address_p; + + /* When the user indicates that a memory error has occurred then the + address of the memory error is stored in here. This field is only + valid when MEMORY_ERROR_ADDRESS_P is true, otherwise this field is + undefined. */ + CORE_ADDR memory_error_address; + + /* When the user calls the builtin_disassembler function, if they pass a + memory source object then a pointer to the object is placed in here, + otherwise, this field is nullptr. */ + PyObject *memory_source; +}; + +extern PyTypeObject disasm_info_object_type + CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object"); + +typedef int (*read_memory_ftype) + (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length, + struct disassemble_info *dinfo); + +/* A sub-class of gdb_disassembler that holds a pointer to a Python + DisassembleInfo object. A pointer to an instance of this class is + placed in the application_data field of the disassemble_info that is + used when we call gdbarch_print_insn. */ + +struct gdbpy_disassembler : public gdb_disassembler +{ + /* Constructor. */ + gdbpy_disassembler (struct gdbarch *gdbarch, struct ui_file *stream, + disasm_info_object *obj); + + /* Get the DisassembleInfo object pointer. */ + disasm_info_object * + py_disasm_info () const + { + return m_disasm_info_object; + } + + /* Mark this class as a friend so that it can call the disasm_info + method, which is protected in our parent. */ + friend class scoped_disasm_info_object; + +private: + /* The DisassembleInfo object we are disassembling for. */ + disasm_info_object *m_disasm_info_object; +}; + +/* Return true if OBJ is still valid, otherwise, return false. A valid OBJ + will have a non-nullptr gdb_info field. */ + +static bool +disasmpy_info_is_valid (disasm_info_object *obj) +{ + if (obj->gdb_info == nullptr) + gdb_assert (obj->py_info == nullptr); + else + gdb_assert (obj->py_info != nullptr); + + return obj->gdb_info != nullptr; +} + +/* Ensure that a gdb.disassembler.DisassembleInfo is valid. */ +#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info) \ + do { \ + if (!disasmpy_info_is_valid (Info)) \ + { \ + PyErr_SetString (PyExc_RuntimeError, \ + _("DisassembleInfo is no longer valid.")); \ + return nullptr; \ + } \ + } while (0) + +/* Mark OBJ as having a memory error at ADDR. Only the first memory error + is recorded, so if OBJ has already had a memory error set then this + call will have no effect. */ + +static void +disasmpy_set_memory_error (disasm_info_object *obj, CORE_ADDR addr) +{ + if (!obj->memory_error_address_p) + { + obj->memory_error_address = addr; + obj->memory_error_address_p = true; + } +} + +/* Clear any memory error already set on OBJ. If there is no memory error + set on OBJ then this call has no effect. */ + +static void +disasmpy_clear_memory_error (disasm_info_object *obj) +{ + obj->memory_error_address_p = false; +} + +/* Clear any previous disassembler result stored within OBJ. If there was + no previous disassembler result then calling this function has no + effect. */ + +static void +disasmpy_clear_disassembler_result (disasm_info_object *obj) +{ + obj->length = -1; + gdb_assert (obj->content != nullptr); + obj->content->clear (); +} + +/* Implement gdb.disassembler.builtin_disassemble(). Calls back into GDB's + builtin disassembler. The first argument is a DisassembleInfo object + describing what to disassemble. The second argument is optional and + provides a mechanism to modify the memory contents that the builtin + disassembler will actually disassemble. Returns the Python None value. */ + +static PyObject * +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw) +{ + PyObject *info_obj, *memory_source_obj = nullptr; + static const char *keywords[] = { "info", "memory_source", nullptr }; + if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords, + &disasm_info_object_type, &info_obj, + &memory_source_obj)) + return nullptr; + + disasm_info_object *disasm_info = (disasm_info_object *) info_obj; + if (!disasmpy_info_is_valid (disasm_info)) + { + PyErr_SetString (PyExc_RuntimeError, + _("DisassembleInfo is no longer valid.")); + return nullptr; + } + + gdb::optional> restore_memory_source; + + disassemble_info *info = disasm_info->py_info; + if (memory_source_obj != nullptr) + { + if (!PyObject_HasAttrString (memory_source_obj, "read_memory")) + { + PyErr_SetString (PyExc_TypeError, + _("memory_source doesn't have a read_memory method")); + return nullptr; + } + + gdb_assert (disasm_info->memory_source == nullptr); + restore_memory_source.emplace (&disasm_info->memory_source, + memory_source_obj); + } + + /* When the user calls the builtin disassembler any previous result or + memory error is discarded, and we start fresh. */ + disasmpy_clear_disassembler_result (disasm_info); + disasmpy_clear_memory_error (disasm_info); + + /* Now actually perform the disassembly. */ + disasm_info->length + = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address, info); + + if (disasm_info->length == -1) + { + /* In an ideal world, every disassembler should always call the + memory error function before returning a status of -1 as the only + error a disassembler should encounter is a failure to read + memory. Unfortunately, there are some disassemblers who don't + follow this rule, and will return -1 without calling the memory + error function. + + To make the Python API simpler, we just classify everything as a + memory error, but the message has to be modified for the case + where the disassembler didn't call the memory error function. */ + if (disasm_info->memory_error_address_p) + { + CORE_ADDR addr = disasm_info->memory_error_address; + PyErr_Format (gdbpy_gdb_memory_error, + "failed to read memory at %s", + core_addr_to_string (addr)); + } + else + PyErr_Format (gdbpy_gdb_memory_error, "failed to read memory"); + return nullptr; + } + + /* Instructions are either non-zero in length, or we got an error, + indicated by a length of -1, which we handled above. */ + gdb_assert (disasm_info->length > 0); + + /* We should not have seen a memory error in this case. */ + gdb_assert (!disasm_info->memory_error_address_p); + + Py_RETURN_NONE; +} + +/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET). Read LENGTH + bytes at OFFSET from the start of the instruction currently being + disassembled, and return a memory buffer containing the bytes. + + OFFSET defaults to zero if it is not provided. LENGTH is required. If + the read fails then this will raise a gdb.MemoryError exception. */ + +static PyObject * +disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw) +{ + disasm_info_object *obj = (disasm_info_object *) self; + DISASMPY_DISASM_INFO_REQUIRE_VALID (obj); + + LONGEST length, offset = 0; + gdb::unique_xmalloc_ptr buffer; + static const char *keywords[] = { "length", "offset", nullptr }; + + if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords, + &length, &offset)) + return nullptr; + + /* The apparent address from which we are reading memory. Note that in + some cases GDB actually disassembles instructions from a buffer, so + we might not actually be reading this information directly from the + inferior memory. This is all hidden behind the read_memory_func API + within the disassemble_info structure. */ + CORE_ADDR address = obj->address + offset; + + /* Setup a buffer to hold the result. */ + buffer.reset ((gdb_byte *) xmalloc (length)); + + /* Read content into BUFFER. If the read fails then raise a memory + error, otherwise, convert BUFFER to a Python memory buffer, and return + it to the user. */ + disassemble_info *info = obj->gdb_info; + if (info->read_memory_func ((bfd_vma) address, buffer.get (), + (unsigned int) length, info) != 0) + { + PyErr_Format (gdbpy_gdb_memory_error, + "failed to read %s bytes at %s", + pulongest ((ULONGEST) length), + core_addr_to_string (address)); + return nullptr; + } + return gdbpy_buffer_to_membuf (std::move (buffer), address, length); +} + +/* Implement DisassembleInfo.set_result(LENGTH, STRING). Discard any + previous memory error and set the result of this disassembly to be + STRING, a LENGTH bytes long instruction. The LENGTH must be greater + than zero otherwise a ValueError exception is raised. STRING must be a + non-empty string, or a ValueError exception is raised. */ + +static PyObject * +disasmpy_info_set_result (PyObject *self, PyObject *args, PyObject *kw) +{ + disasm_info_object *obj = (disasm_info_object *) self; + DISASMPY_DISASM_INFO_REQUIRE_VALID (obj); + + static const char *keywords[] = { "length", "string", nullptr }; + int length; + const char *string; + + if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "is", keywords, + &length, &string)) + return nullptr; + + if (length <= 0) + { + PyErr_SetString (PyExc_ValueError, + _("Length must be greater than 0.")); + return nullptr; + } + + size_t string_len = strlen (string); + if (string_len == 0) + { + PyErr_SetString (PyExc_ValueError, _("String must not be empty.")); + return nullptr; + } + + /* Discard any previously recorded memory error, and any previous + disassembler result. */ + disasmpy_clear_memory_error (obj); + disasmpy_clear_disassembler_result (obj); + + /* And set the result. */ + obj->length = length; + gdb_assert (obj->content != nullptr); + obj->content->write (string, string_len); + + Py_RETURN_NONE; +} + +/* Implement DisassembleInfo.memory_error(). Mark SELF (a DisassembleInfo + object) as having a memory error. Any previous result is discarded. */ + +static PyObject * +disasmpy_info_memory_error (PyObject *self, PyObject *args, PyObject *kw) +{ + disasm_info_object *obj = (disasm_info_object *) self; + DISASMPY_DISASM_INFO_REQUIRE_VALID (obj); + + static const char *keywords[] = { "offset", nullptr }; + LONGEST offset; + + if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L", keywords, + &offset)) + return nullptr; + + /* Discard any previous disassembler result, and mark OBJ as having a + memory error. */ + disasmpy_clear_disassembler_result (obj); + disasmpy_set_memory_error (obj, obj->address + offset); + + Py_RETURN_NONE; +} + +/* Implement gdb.disassembler.format_address(ARCH, ADDR). Formats ADDR, an + address and returns a string. ADDR will be formatted in the style that + the disassembler uses: '0x.... '. ARCH is a + gdb.Architecture used to perform the formatting. */ + +static PyObject * +disasmpy_format_address (PyObject *self, PyObject *args, PyObject *kw) +{ + static const char *keywords[] = { "architecture", "address", nullptr }; + PyObject *addr_obj, *arch_obj; + CORE_ADDR addr; + + if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "OO", keywords, + &arch_obj, &addr_obj)) + return nullptr; + + if (get_addr_from_python (addr_obj, &addr) < 0) + return nullptr; + + if (!gdbpy_is_arch_object (arch_obj)) + { + PyErr_SetString (PyExc_TypeError, + _("architecture argument is not a gdb.Architecture")); + return nullptr; + } + + gdbarch *gdbarch = arch_object_to_gdbarch (arch_obj); + if (gdbarch == nullptr) + { + PyErr_SetString (PyExc_RuntimeError, + _("architecture argument is invalid.")); + return nullptr; + } + + string_file buf; + print_address (gdbarch, addr, &buf); + return PyString_FromString (buf.c_str ()); +} + +/* Implement DisassembleInfo.address attribute, return the address at which + GDB would like an instruction disassembled. */ + +static PyObject * +disasmpy_info_address (PyObject *self, void *closure) +{ + disasm_info_object *obj = (disasm_info_object *) self; + DISASMPY_DISASM_INFO_REQUIRE_VALID (obj); + return gdb_py_object_from_longest (obj->address).release (); +} + +/* Implement DisassembleInfo.string attribute. Return a string containing + the current disassembly result, or None if there is no current + disassembly result. */ + +static PyObject * +disasmpy_info_string (PyObject *self, void *closure) +{ + disasm_info_object *obj = (disasm_info_object *) self; + DISASMPY_DISASM_INFO_REQUIRE_VALID (obj); + + gdb_assert (obj->content != nullptr); + if (strlen (obj->content->c_str ()) == 0) + Py_RETURN_NONE; + gdb_assert (obj->length > 0); + return PyUnicode_Decode (obj->content->c_str (), + obj->content->size (), + host_charset (), nullptr); +} + +/* Implement DisassembleInfo.length attribute. Return the length of the + current disassembled instruction, as set by a call to + DisassembleInfo.set_result. If no result has been set yet, or if a call + to DisassembleInfo.memory_error has invalidated the result, then None is + returned. */ + +static PyObject * +disasmpy_info_length (PyObject *self, void *closure) +{ + disasm_info_object *obj = (disasm_info_object *) self; + DISASMPY_DISASM_INFO_REQUIRE_VALID (obj); + if (obj->length == -1) + Py_RETURN_NONE; + gdb_assert (obj->length > 0); + gdb_assert (obj->content != nullptr); + gdb_assert (strlen (obj->content->c_str ()) > 0); + return gdb_py_object_from_longest (obj->length).release (); +} + +/* Implement DisassembleInfo.architecture attribute. Return the + gdb.Architecture in which we are disassembling. */ + +static PyObject * +disasmpy_info_architecture (PyObject *self, void *closure) +{ + disasm_info_object *obj = (disasm_info_object *) self; + DISASMPY_DISASM_INFO_REQUIRE_VALID (obj); + return gdbarch_to_arch_object (obj->gdbarch); +} + +/* Implement DisassembleInfo.can_emit_style_escape attribute. Returns True + if the output stream that the disassembly result will be written too + supports style escapes, otherwise, returns False. */ + +static PyObject * +disasmpy_info_can_emit_style_escape (PyObject *self, void *closure) +{ + disasm_info_object *obj = (disasm_info_object *) self; + DISASMPY_DISASM_INFO_REQUIRE_VALID (obj); + bool can_emit_style_escape = current_uiout->can_emit_style_escape (); + return PyBool_FromLong (can_emit_style_escape ? 1 : 0); +} + +/* This implements the disassemble_info read_memory_func callback. This + will either call the standard read memory function, or, if the user has + supplied a memory source (see disasmpy_builtin_disassemble) then this + will call back into Python to obtain the memory contents. + + Read LEN bytes from MEMADDR and place them into BUFF. Return 0 on + success (in which case BUFF has been filled), or -1 on error, in which + case the contents of BUFF are undefined. */ + +static int +disasmpy_read_memory_func (bfd_vma memaddr, gdb_byte *buff, + unsigned int len, struct disassemble_info *info) +{ + gdbpy_disassembler *dis + = static_cast (info->application_data); + disasm_info_object *obj = dis->py_disasm_info (); + + /* The simple case, the user didn't pass a separate memory source, so we + just delegate to the standard disassemble_info read_memory_func. */ + if (obj->memory_source == nullptr) + return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info); + + /* The user provided a separate memory source, we need to call the + read_memory method on the memory source and use the buffer it returns + as the bytes of memory. */ + PyObject *memory_source = obj->memory_source; + LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address; + gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory", + "KL", len, offset)); + if (result_obj == nullptr) + { + /* If we got a gdb.MemoryError then we ignore this and just report + that the read failed to the caller. For any other exception type + we assume this is a bug in the users code, print the stack, and + then report the read failed. */ + if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error)) + PyErr_Clear (); + else + gdbpy_print_stack (); + return -1; + } + + /* Convert the result to a buffer. */ + Py_buffer py_buff; + if (!PyObject_CheckBuffer (result_obj.get ()) + || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0) + { + PyErr_Format (PyExc_TypeError, + _("Result from read_memory is not a buffer")); + gdbpy_print_stack (); + return -1; + } + + /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this + scope. */ + Py_buffer_up buffer_up (&py_buff); + + /* Validate that the buffer is the correct length. */ + if (py_buff.len != len) + { + PyErr_Format (PyExc_ValueError, + _("Result from read_memory is incorrectly sized buffer")); + gdbpy_print_stack (); + return -1; + } + + /* Copy the data out of the Python buffer and return succsess.*/ + const gdb_byte *buffer = (const gdb_byte *) py_buff.buf; + memcpy (buff, buffer, len); + return 0; +} + +/* Implement memory_error_func callback for disassemble_info. Extract the + underlying DisassembleInfo Python object, and set a memory error on + it. */ + +static void +disasmpy_memory_error_func (int status, bfd_vma memaddr, + struct disassemble_info *info) +{ + gdbpy_disassembler *dis + = static_cast (info->application_data); + disasm_info_object *obj = dis->py_disasm_info (); + disasmpy_set_memory_error (obj, memaddr); +} + +/* Constructor. */ + +gdbpy_disassembler::gdbpy_disassembler (struct gdbarch *gdbarch, + struct ui_file *stream, + disasm_info_object *obj) + : gdb_disassembler (gdbarch, stream, disasmpy_read_memory_func, + disasmpy_memory_error_func), + m_disasm_info_object (obj) +{ /* Nothing. */ } + +/* A wrapper around a reference to a Python DisassembleInfo object, along + with some supporting information that the DisassembleInfo object needs + to reference. + + Each DisassembleInfo is created in gdbpy_print_insn, and is done with by + the time that function returns. However, there's nothing to stop a user + caching a reference to the DisassembleInfo, and thus keeping the object + around. + + We therefore have the notion of a DisassembleInfo becoming invalid, this + happens when gdbpy_print_insn returns. This class is responsible for + marking the DisassembleInfo as invalid in its destructor. */ + +struct scoped_disasm_info_object +{ + /* Constructor. */ + scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr, + disassemble_info *info) + : m_disasm_info (allocate_disasm_info_object ()), + m_py_disassembler (gdbarch, &m_string_file, m_disasm_info.get ()) + { + m_disasm_info->address = memaddr; + m_disasm_info->gdb_info = info; + m_disasm_info->py_info = m_py_disassembler.disasm_info (); + m_disasm_info->length = -1; + m_disasm_info->content = &m_string_file; + m_disasm_info->gdbarch = gdbarch; + m_disasm_info->memory_error_address_p = false; + m_disasm_info->memory_error_address = 0; + m_disasm_info->memory_source = nullptr; + } + + /* Upon destruction clear pointers to state that will no longer be + valid. These fields are checked in disasmpy_info_is_valid to see if + the disasm_info_object is still valid or not. */ + ~scoped_disasm_info_object () + { + m_disasm_info->gdb_info = nullptr; + m_disasm_info->py_info = nullptr; + m_disasm_info->content = nullptr; + } + + /* Return a pointer to the underlying disasm_info_object instance. */ + disasm_info_object * + get () const + { + return m_disasm_info.get (); + } + +private: + + /* Wrapper around the call to PyObject_New, this wrapper function can be + called from the constructor initialization list, while PyObject_New, a + macro, can't. */ + static disasm_info_object * + allocate_disasm_info_object () + { + return (disasm_info_object *) PyObject_New (disasm_info_object, + &disasm_info_object_type); + } + + /* A reference to a gdb.disassembler.DisassembleInfo object. When this + containing instance goes out of scope this reference is released, + however, the user might be holding other references to the + DisassembleInfo object in Python code, so the underlying object might + not be deleted. */ + gdbpy_ref m_disasm_info; + + /* A location into which the output of the Python disassembler is + collected. We only send this back to GDB once the Python disassembler + has completed successfully. */ + string_file m_string_file; + + /* Core GDB requires that the disassemble_info application_data field be + an instance of, or a sub-class or, gdb_disassembler. We use a + sub-class so that functions within the file can obtain a pointer to + the disasm_info_object from the application_data. */ + gdbpy_disassembler m_py_disassembler; +}; + +/* See python-internal.h. */ + +gdb::optional +gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr, + disassemble_info *info) +{ + if (!gdb_python_initialized) + return {}; + + gdbpy_enter enter_py (get_current_arch (), current_language); + + /* The attribute we are going to lookup that provides the print_insn + functionality. */ + static const char *callback_name = "_print_insn"; + + /* Grab a reference to the gdb.disassembler module, and check it has the + attribute that we need. */ + static gdbpy_ref<> gdb_python_disassembler_module + (PyImport_ImportModule ("gdb.disassembler")); + if (gdb_python_disassembler_module == nullptr + || !PyObject_HasAttrString (gdb_python_disassembler_module.get (), + callback_name)) + return {}; + + /* Now grab the callback attribute from the module, and check that it is + callable. */ + gdbpy_ref<> hook + (PyObject_GetAttrString (gdb_python_disassembler_module.get (), + callback_name)); + if (hook == nullptr) + { + gdbpy_print_stack (); + return {}; + } + if (!PyCallable_Check (hook.get ())) + return {}; + + scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info); + disasm_info_object *disasm_info = scoped_disasm_info.get (); + + /* Call into the registered disassembler to (possibly) perform the + disassembly. */ + PyObject *insn_disas_obj = (PyObject *) disasm_info; + gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (), + insn_disas_obj, + nullptr)); + + if (result == nullptr) + { + if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error)) + { + /* Uncaught memory errors are not printed, we assume that the + user tried to read some bytes for their custom disassembler, + but the bytes were no available, as such, we should silently + fall back to using the builtin disassembler, which is what + happens when we return no value here. */ + PyErr_Clear (); + } + else + { + /* Any other error while executing the _print_insn callback + should result in a debug stack being printed, then we return + no value to indicate that the builtin disassembler should be + used. */ + gdbpy_print_stack (); + } + return {}; + } + else if (result != Py_None) + error (_("invalid return value from gdb.disassembler._print_insn")); + + if (disasm_info->memory_error_address_p) + { + /* We pass -1 for the status here. GDB doesn't make use of this + field, but disassemblers usually pass the result of + read_memory_func as the status, in which case -1 indicates an + error. */ + bfd_vma addr = disasm_info->memory_error_address; + info->memory_error_func (-1, addr, info); + return gdb::optional (-1); + } + + /* If the gdb.disassembler.DisassembleInfo object doesn't have a result + then return false. */ + if (disasm_info->length == -1) + return {}; + + /* Print the content from the DisassembleInfo back through to GDB's + standard fprintf_func handler. */ + info->fprintf_func (info->stream, "%s", disasm_info->content->c_str ()); + + /* Return the length of this instruction. */ + return gdb::optional (disasm_info->length); +} + +/* The tp_dealloc callback for the DisassembleInfo type. Takes care of + deallocating the content buffer. */ + +static void +disasmpy_dealloc (PyObject *self) +{ + disasm_info_object *obj = (disasm_info_object *) self; + + /* The memory_source field is only ever temporarily set to non-nullptr + during the disasmpy_builtin_disassemble function. By the end of that + function the memory_source field should be back to nullptr. */ + gdb_assert (obj->memory_source == nullptr); + + /* The content field will also be reset to nullptr by the end of + gdbpy_print_insn, so the following assert should hold. */ + gdb_assert (obj->content == nullptr); + Py_TYPE (self)->tp_free (self); +} + +/* The get/set attributes of the gdb.disassembler.DisassembleInfo type. */ + +static gdb_PyGetSetDef disasm_info_object_getset[] = { + { "address", disasmpy_info_address, nullptr, + "Start address of the instruction to disassemble.", nullptr }, + { "string", disasmpy_info_string, nullptr, + "String representing the disassembled instruction.", nullptr }, + { "length", disasmpy_info_length, nullptr, + "Length in octets of the disassembled instruction.", nullptr }, + { "architecture", disasmpy_info_architecture, nullptr, + "Architecture to disassemble in", nullptr }, + { "can_emit_style_escape", disasmpy_info_can_emit_style_escape, nullptr, + "Boolean indicating if style escapes can be emitted", nullptr }, + { nullptr } /* Sentinel */ +}; + +/* The methods of the gdb.disassembler.DisassembleInfo type. */ + +static PyMethodDef disasm_info_object_methods[] = { + { "read_memory", (PyCFunction) disasmpy_info_read_memory, + METH_VARARGS | METH_KEYWORDS, + "read_memory (LEN, OFFSET = 0) -> Octets[]\n\ +Read LEN octets for the instruction to disassemble." }, + { "set_result", (PyCFunction) disasmpy_info_set_result, + METH_VARARGS | METH_KEYWORDS, + "set_result (LENGTH, STRING) -> None\n\ +Set the disassembly result, LEN in octets, and disassembly STRING." }, + { "memory_error", (PyCFunction) disasmpy_info_memory_error, + METH_VARARGS | METH_KEYWORDS, + "memory_error (OFFSET) -> None\n\ +A memory error occurred when trying to read bytes at OFFSET." }, + {nullptr} /* Sentinel */ +}; + +/* These are the methods we add into the _gdb.disassembler module, which + are then imported into the gdb.disassembler module. These are global + functions that support performing disassembly. */ + +PyMethodDef python_disassembler_methods[] = +{ + { "format_address", (PyCFunction) disasmpy_format_address, + METH_VARARGS | METH_KEYWORDS, + "format_address (ARCHITECTURE, ADDRESS) -> String.\n\ +Format ADDRESS as a string suitable for use in disassembler output." }, + { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble, + METH_VARARGS | METH_KEYWORDS, + "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\ +Disassemble using GDB's builtin disassembler. INFO is an instance of\n\ +gdb.disassembler.DisassembleInfo. The MEMORY_SOURCE, if not None, should\n\ +be an object with the read_memory method." }, + {nullptr, nullptr, 0, nullptr} +}; + +#ifdef IS_PY3K +/* Structure to define the _gdb.disassembler module. */ + +static struct PyModuleDef python_disassembler_module_def = +{ + PyModuleDef_HEAD_INIT, + "_gdb.disassembler", + nullptr, + -1, + python_disassembler_methods, + nullptr, + nullptr, + nullptr, + nullptr +}; +#endif + +/* Called to initialize the Python structures in this file. */ + +int +gdbpy_initialize_disasm (void) +{ + /* Create the _gdb.disassembler module, and add it to the _gdb module. */ + + PyObject *gdb_disassembler_module; +#ifdef IS_PY3K + gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def); +#else + gdb_disassembler_module = Py_InitModule ("_gdb.disassembler", + python_disassembler_methods); +#endif + if (gdb_disassembler_module == nullptr) + return -1; + PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module); + + /* This is needed so that 'import _gdb.disassembler' will work. */ + PyObject *dict = PyImport_GetModuleDict (); + PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module); + + /* Having the tp_new field as nullptr means that this class can't be + created from user code. The only way they can be created is from + within GDB, and then they are passed into user code. */ + gdb_assert (disasm_info_object_type.tp_new == nullptr); + if (PyType_Ready (&disasm_info_object_type) < 0) + return -1; + + return gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo", + (PyObject *) &disasm_info_object_type); +} + +/* Describe the gdb.disassembler.DisassembleInfo type. */ + +PyTypeObject disasm_info_object_type = { + PyVarObject_HEAD_INIT (nullptr, 0) + "gdb.disassembler.DisassembleInfo", /*tp_name*/ + sizeof (disasm_info_object), /*tp_basicsize*/ + 0, /*tp_itemsize*/ + disasmpy_dealloc, /*tp_dealloc*/ + 0, /*tp_print*/ + 0, /*tp_getattr*/ + 0, /*tp_setattr*/ + 0, /*tp_compare*/ + 0, /*tp_repr*/ + 0, /*tp_as_number*/ + 0, /*tp_as_sequence*/ + 0, /*tp_as_mapping*/ + 0, /*tp_hash */ + 0, /*tp_call*/ + 0, /*tp_str*/ + 0, /*tp_getattro*/ + 0, /*tp_setattro*/ + 0, /*tp_as_buffer*/ + Py_TPFLAGS_DEFAULT, /*tp_flags*/ + "GDB instruction disassembler object", /* tp_doc */ + 0, /* tp_traverse */ + 0, /* tp_clear */ + 0, /* tp_richcompare */ + 0, /* tp_weaklistoffset */ + 0, /* tp_iter */ + 0, /* tp_iternext */ + disasm_info_object_methods, /* tp_methods */ + 0, /* tp_members */ + disasm_info_object_getset /* tp_getset */ +}; diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h index 735328b49c4..d0330c81079 100644 --- a/gdb/python/python-internal.h +++ b/gdb/python/python-internal.h @@ -497,6 +497,8 @@ int gdbpy_initialize_auto_load (void) CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION; int gdbpy_initialize_values (void) CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION; +int gdbpy_initialize_disasm (void) + CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION; int gdbpy_initialize_frames (void) CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION; int gdbpy_initialize_instruction (void) @@ -798,4 +800,23 @@ typedef std::unique_ptr Py_buffer_up; extern bool gdbpy_parse_register_id (struct gdbarch *gdbarch, PyObject *pyo_reg_id, int *reg_num); +/* Implement the 'print_insn' hook for Python. Disassemble an instruction + whose address is ADDRESS for architecture GDBARCH. The bytes of the + instruction should be read with INFO->read_memory_func as the + instruction being disassembled might actually be in a buffer. + + Used INFO->fprintf_func to print the results of the disassembly, and + return the length of the instruction in octets. + + If no instruction can be disassembled then return an empty value. */ + +extern gdb::optional gdbpy_print_insn (struct gdbarch *gdbarch, + CORE_ADDR address, + disassemble_info *info); + +/* Return true if OBJ is a gdb.Architecture object, otherwise, return + false. */ + +bool gdbpy_is_arch_object (PyObject *obj); + #endif /* PYTHON_PYTHON_INTERNAL_H */ diff --git a/gdb/python/python.c b/gdb/python/python.c index d817bd5bf27..3aba565cd11 100644 --- a/gdb/python/python.c +++ b/gdb/python/python.c @@ -190,7 +190,7 @@ const struct extension_language_ops python_extension_ops = gdbpy_colorize, - NULL, /* gdbpy_print_insn, */ + gdbpy_print_insn, }; /* Architecture and language to be used in callbacks from @@ -1852,6 +1852,7 @@ do_start_initialization () if (gdbpy_initialize_auto_load () < 0 || gdbpy_initialize_values () < 0 + || gdbpy_initialize_disasm () < 0 || gdbpy_initialize_frames () < 0 || gdbpy_initialize_commands () < 0 || gdbpy_initialize_instruction () < 0 @@ -2130,6 +2131,14 @@ do_initialize (const struct extension_language_defn *extlang) return true; } + /* Import gdb.disassembler now. The disassembler module provides some + parameters that we want to be available to users from the moment GDB + starts up. */ + PyObject *gdb_disassembler_module + = PyImport_ImportModule ("gdb.disassembler"); + if (gdb_disassembler_module == nullptr) + gdbpy_print_stack (); + return gdb_pymodule_addobject (m, "gdb", gdb_python_module) >= 0; } diff --git a/gdb/testsuite/gdb.base/style.exp b/gdb/testsuite/gdb.base/style.exp index 91d3059612d..7aa51cdfe00 100644 --- a/gdb/testsuite/gdb.base/style.exp +++ b/gdb/testsuite/gdb.base/style.exp @@ -182,12 +182,26 @@ proc run_style_tests { } { gdb_test_no_output "set width 0" - set main [limited_style main function] - set func [limited_style some_called_function function] - # Somewhere should see the call to the function. - gdb_test "disassemble main" \ - [concat "Dump of assembler code for function $main:.*" \ - "[limited_style $hex address].*$func.*"] + # Disassembly highlighting is done by Python, so, if the + # required modules are not available we'll not get the full + # highlighting. + if { $::python_disassembly_highlighting } { + # Check that the header line of the disassembly output is + # styled correctly, the address at the start of the first + # disassembly line is styled correctly, and that there is at + # least one escape sequence in the disassembly output. + set main [limited_style main function] + gdb_test "disassemble main" \ + [concat "Dump of assembler code for function $main:\\r\\n" \ + "\\s+[limited_style $hex address]\\s+<\\+$decimal>:\[^\\r\\n\]+\033\\\[${decimal}\[^\\r\\n\]+.*" ""] + } else { + set main [limited_style main function] + set func [limited_style some_called_function function] + # Somewhere should see the call to the function. + gdb_test "disassemble main" \ + [concat "Dump of assembler code for function $main:.*" \ + "[limited_style $hex address].*$func.*"] + } set ifield [limited_style int_field variable] set sfield [limited_style string_field variable] @@ -312,6 +326,25 @@ proc test_startup_version_string { } { gdb_test "" "${vers}.*" "version is styled at startup" } +# Check to see if the Python highlighting of disassembler output is +# expected or not, this highlighting requires Python support in GDB, +# and the Python pygments module to be available. +clean_restart ${binfile} +if {![skip_python_tests]} { + gdb_test_multiple "python import pygments" "" { + -re "ModuleNotFoundError: No module named 'pygments'.*$gdb_prompt $" { + set python_disassembly_highlighting false + } + -re "ImportError: No module named pygments.*$gdb_prompt $" { + set python_disassembly_highlighting false + } + -re "^python import pygments\r\n$gdb_prompt $" { + set python_disassembly_highlighting true + } + } +} else { + set python_disassembly_highlighting false +} # Run tests with all styles in their default state. with_test_prefix "all styles enabled" { diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c new file mode 100644 index 00000000000..1d89a49c346 --- /dev/null +++ b/gdb/testsuite/gdb.python/py-disasm.c @@ -0,0 +1,25 @@ +/* This test program is part of GDB, the GNU debugger. + + Copyright 2021 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +int +main () +{ + asm ("nop"); + asm ("nop"); /* Break here. */ + asm ("nop"); + return 0; +} diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp new file mode 100644 index 00000000000..f8d6140036d --- /dev/null +++ b/gdb/testsuite/gdb.python/py-disasm.exp @@ -0,0 +1,201 @@ +# Copyright (C) 2021 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +# This file is part of the GDB testsuite. It validates the Python +# disassembler API. + +load_lib gdb-python.exp + +standard_testfile + +if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } { + return -1 +} + +# Skip all tests if Python scripting is not enabled. +if { [skip_python_tests] } { continue } + +if ![runto_main] then { + fail "can't run to main" + return 0 +} + +set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py] + +gdb_test "source ${pyfile}" "Python script imported" \ + "import python scripts" + +gdb_breakpoint [gdb_get_line_number "Break here."] +gdb_continue_to_breakpoint "Break here." + +set curr_pc [get_valueof "/x" "\$pc" "*unknown*"] + +gdb_test_no_output "python current_pc = ${curr_pc}" + +# The current pc will be something like 0x1234 with no leading zeros. +# However, in the disassembler output addresses are padded with zeros. +# This substitution changes 0x1234 to 0x0*1234, which can then be used +# as a regexp in the disassembler output matching. +set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"] + +# Grab the name of the current architecture, this is used in the tests +# patterns below. +set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"] + +# Helper proc that removes all registered disassemblers. +proc py_remove_all_disassemblers {} { + gdb_test_no_output "python remove_all_python_disassemblers()" +} + +# A list of test plans. Each plan is a list of two elements, the +# first element is the name of a class in py-disasm.py, this is a +# disassembler class. The second element is a pattern that should be +# matched in the disassembler output. +# +# Each different disassembler tests some different feature of the +# Python disassembler API. +set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+" +set base_pattern "${addr_pattern}nop" +set test_plans \ + [list \ + [list "" "${base_pattern}\r\n.*"] \ + [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \ + [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, st = None, le = None, ar = ${curr_arch}\r\n.*"] \ + [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, st = nop, le = $decimal, ar = ${curr_arch}\r\n.*"] \ + [list "GlobalEscDisassembler" "${base_pattern}\\s+## style = False\r\n.*"] \ + [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \ + [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \ + [list "SimpleMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \ + [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception : error before setting a result\r\nnop\r\n.*"] \ + [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception : error after setting a result\r\nnop\r\n.*"] \ + [list "MemoryErrorEarlyDisassembler" "${base_pattern}\r\n.*"] \ + [list "MemoryErrorLateDisassembler" "${base_pattern}\r\n.*"] \ + [list "CaughtMemoryErrorEarlyDisassembler" "${addr_pattern}Cannot access memory at address 0x2"] \ + [list "CaughtMemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address 0x2"] \ + [list "CaughtMemoryErrorEarlyAndReplaceDisassembler" "${base_pattern}\\s+## tag = GOT MEMORY ERROR\r\n.*"] \ + [list "SetResultBeforeBuiltinDisassembler" "${base_pattern}\r\n.*"]] + +# Now execute each test plan. +foreach plan $test_plans { + set global_disassembler_name [lindex $plan 0] + set expected_pattern [lindex $plan 1] + + with_test_prefix "global_disassembler=${global_disassembler_name}" { + # Remove all existing disassemblers. + py_remove_all_disassemblers + + # If we have a disassembler to load, do it now. + if { $global_disassembler_name != "" } { + gdb_test_no_output "python add_global_disassembler($global_disassembler_name)" + } + + # Disassemble main, and check the disassembler output. + gdb_test "disassemble main" $expected_pattern + } +} + +# Check that the architecture specific disassemblers can override the +# global disassembler. +# +# First, register a global disassembler, and check it is in place. +with_test_prefix "GLOBAL tagging disassembler" { + py_remove_all_disassemblers + gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)" + gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*" +} + +# Now register an architecture specific disassembler, and check it +# overrides the global disassembler. +with_test_prefix "LOCAL tagging disassembler" { + gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")" + gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*" +} + +# Now remove the architecture specific disassembler, and check that +# the global disassembler kicks back in. +with_test_prefix "GLOBAL tagging disassembler again" { + gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")" + gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*" +} + +# Check that a DisassembleInfo becomes invalid after the call into the +# disassembler. +with_test_prefix "DisassembleInfo becomes invalid" { + py_remove_all_disassemblers + gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)" + gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*" + gdb_test "python GlobalCachingDisassembler.check()" "PASS" +} + +# Test the memory source aspect of the builtin disassembler. +with_test_prefix "memory source api" { + py_remove_all_disassemblers + gdb_test_no_output "python gdb.disassembler.register_disassembler(analyzing_disassembler)" + gdb_test "disassemble main" "${base_pattern}\r\n.*" + gdb_test "python analyzing_disassembler.find_replacement_candidate()" \ + "Replace from $hex to $hex with NOP" + gdb_test "disassemble main" "${base_pattern}\r\n.*" \ + "second disassembler pass" + gdb_test "python analyzing_disassembler.check()" \ + "PASS" +} + +# The syntax highlighting disassembler makes use of the pygments +# module. Try importing the module now, if this fails then we can +# skip the tests that check the syntax highlighting. +gdb_test_multiple "python import pygments" "" { + -re "ModuleNotFoundError: No module named 'pygments'.*$gdb_prompt $" { + set pygments_module_available false + } + -re "ImportError: No module named pygments.*$gdb_prompt $" { + set pygments_module_available false + } + -re "^python import pygments\r\n$gdb_prompt $" { + set pygments_module_available true + } +} + +if { $pygments_module_available } { + # Test the syntax highlighting disassembler. + with_test_prefix "syntax highlighting" { + py_remove_all_disassemblers + save_vars { env(TERM) } { + # We need an ANSI-capable terminal to get the output. + setenv TERM ansi + + clean_restart ${binfile} + + if ![runto_main] then { + fail "can't run to main" + return 0 + } + + gdb_test "source ${pyfile}" "Python script imported" \ + "import python scripts" + + gdb_breakpoint [gdb_get_line_number "Break here."] + gdb_continue_to_breakpoint "Break here." + + gdb_test_no_output "python current_pc = ${curr_pc}" + + gdb_test_no_output "python add_global_disassembler(GlobalColorDisassembler)" + set styled_nop "\033\\\[\[0-9\]+(;\[0-9\]+)?mnop\033\\\[\[^m\]+m" + set styled_address [style "${curr_pc_pattern}" address] + gdb_test "disassemble main" "\r\n=> ${styled_address} <\[^>\]+>:\\s+${styled_nop}\r\n.*" + } + } +} else { + untested "disassemble with styling" +} diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py new file mode 100644 index 00000000000..2cfcb7ceaff --- /dev/null +++ b/gdb/testsuite/gdb.python/py-disasm.py @@ -0,0 +1,538 @@ +# Copyright (C) 2021 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +import gdb +import gdb.disassembler +import struct +import sys + +from gdb.disassembler import Disassembler + +# A global, holds the program-counter address at which we should +# perform the extra disassembly that this script provides. +current_pc = None + + +def remove_all_python_disassemblers(): + for a in gdb.architecture_names(): + gdb.disassembler.register_disassembler(None, a) + gdb.disassembler.register_disassembler(None, None) + + +class TestDisassembler(Disassembler): + """A base class for disassemblers within this script to inherit from. + Implements the __call__ method and ensures we only do any + disassembly wrapping for the global CURRENT_PC.""" + + def __init__(self): + global current_pc + + super(TestDisassembler, self).__init__("TestDisassembler") + if current_pc == None: + raise gdb.GdbError("no current_pc set") + + def __call__(self, info): + global current_pc + + if info.address != current_pc: + return None + return self.disassemble(info) + + def disassemble(self, info): + raise NotImplementedError("override the disassemble method") + + +class GlobalPreInfoDisassembler(TestDisassembler): + """Check the attributes of DisassembleInfo before disassembly has occurred.""" + + def disassemble(self, info): + ad = info.address + st = info.string + le = info.length + ar = info.architecture + + if le is not None: + raise gdb.GdbError("invalid length") + + if st is not None: + raise gdb.GdbError("invaild string") + + if ad != current_pc: + raise gdb.GdbError("invalid address") + + gdb.disassembler.builtin_disassemble(info) + + text = info.string + "\t## ad = 0x%x, st = %s, le = %s, ar = %s" % ( + ad, + st, + le, + ar.name(), + ) + info.set_result(info.length, text) + + +class GlobalPostInfoDisassembler(TestDisassembler): + """Check the attributes of DisassembleInfo after disassembly has occurred.""" + + def disassemble(self, info): + gdb.disassembler.builtin_disassemble(info) + + ad = info.address + st = info.string + le = info.length + ar = info.architecture + + if ad != current_pc: + raise gdb.GdbError("invalid address") + + if st is None or st == "": + raise gdb.GdbError("invalid string") + + if le <= 0: + raise gdb.GdbError("invalid length") + + text = info.string + "\t## ad = 0x%x, st = %s, le = %d, ar = %s" % ( + ad, + st, + le, + ar.name(), + ) + info.set_result(info.length, text) + + +class GlobalEscDisassembler(TestDisassembler): + """Check the can_emit_style_escape attribute.""" + + def disassemble(self, info): + gdb.disassembler.builtin_disassemble(info) + text = info.string + "\t## style = %s" % info.can_emit_style_escape + info.set_result(info.length, text) + + +class GlobalReadDisassembler(TestDisassembler): + """Check the DisassembleInfo.read method.""" + + def disassemble(self, info): + gdb.disassembler.builtin_disassemble(info) + len = info.length + str = "" + for o in range(len): + if str != "": + str += " " + v = bytes(info.read_memory(1, o))[0] + if sys.version_info[0] < 3: + v = struct.unpack ('= len(rb) or (offset + length) > len(rb): + raise gdb.MemoryError("invalid length and offset combination") + + # Return only the slice of the nop instruction as requested. + s = offset + e = offset + length + return rb[s:e] + + def read_memory(self, len, offset): + """Callback used from the builtin disassembler to read the contents of + memory.""" + + info = self._info + assert info is not None + + # If this request is within the region we are replacing with 'nop' + # instructions, then call the helper function to perform that + # replacement. + if self._start is not None: + assert self._end is not None + if info.address >= self._start and info.address < self._end: + return self._read_replacement(len, offset) + + # Otherwise, we just forward this request to the default read memory + # implementation. + return info.read_memory(len, offset) + + def find_replacement_candidate(self): + """Call this after the first disassembly pass. This identifies a suitable + instruction to replace with 'nop' instruction(s).""" + + if self._nop_index is None: + raise gdb.GdbError("no nop was found") + + nop_idx = self._nop_index + nop_length = self._pass_1_length[nop_idx] + + # First we look for an instruction that is larger than a nop + # instruction, but whose length is an exact multiple of the nop + # instruction's length. + replace_idx = None + for idx in range(len(self._pass_1_length)): + if ( + idx > 0 + and idx != nop_idx + and self._pass_1_insn[idx] != "nop" + and self._pass_1_length[idx] > self._pass_1_length[nop_idx] + and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0 + ): + replace_idx = idx + break + + # If we still don't have a replacement candidate, then search again, + # this time looking for an instruciton that is the same length as a + # nop instruction. + if replace_idx is None: + for idx in range(len(self._pass_1_length)): + if ( + idx > 0 + and idx != nop_idx + and self._pass_1_insn[idx] != "nop" + and self._pass_1_length[idx] == self._pass_1_length[nop_idx] + ): + replace_idx = idx + break + + # Weird, the nop instruction must be larger than every other + # instruction, or all instructions are 'nop'? + if replace_idx is None: + raise gdb.GdbError("can't find an instruction to replace") + + # Record the instruction range that will be replaced with 'nop' + # instructions, and mark that we are now on the second pass. + self._start = self._pass_1_address[replace_idx] + self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx] + self._first_pass = False + print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end)) + + # Finally, build the expected result. Create the _check list, which + # is a copy of _pass_1_insn, but replace the instruction we + # identified above with a series of 'nop' instructions. + self._check = list (self._pass_1_insn) + nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx]) + nops = ["nop"] * nop_count + self._check[replace_idx : (replace_idx + 1)] = nops + + def check(self): + """Call this after the second disassembler pass to validate the output.""" + if self._check != self._pass_2_insn: + raise gdb.GdbError("mismatch") + print("PASS") + +# Create a global instance of the AnalyzingDisassembler. This isn't +# registered as a disassembler yet though, that is done from the +# py-diasm.exp later. +analyzing_disassembler = AnalyzingDisassembler("AnalyzingDisassembler") + +def add_global_disassembler(dis_class): + """Create an instance of DIS_CLASS and register it as a global disassembler.""" + dis = dis_class() + gdb.disassembler.register_disassembler(dis, None) + + +# Start with all disassemblers removed. +remove_all_python_disassemblers() + +print("Python script imported") -- 2.25.4